Dataset for Enhancing Prostate Cancer Detection: Multi-class Semantic Segmentation and Grading Score with the DARUN Model

Citation Author(s):
Kasikrit
Damkliang
Prince of Songkla University
Submitted by:
Kasikrit Damkliang
Last updated:
Mon, 01/08/2024 - 22:41
DOI:
10.21227/jy12-2c41
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

Prostate cancer is a major global health challenge, emphasizing the need for better diagnostic methods. This study addresses early detection and advances multi-class semantic segmentation, specifically focusing on discriminating Gleason patterns 3 and 4 in prostate adenocarcinoma tissues. We utilize data science techniques to help improve patient treatment in our current clinical cancer diagnosis. We introduce our publicly available dataset of 100 unique digitized whole-slide images (WSIs) of prostate needle core biopsy specimens stained with hematoxylin and eosin. The pyramidal digitized WSIs were extracted into high-resolution image patches with a size of 256 x 256 pixels at a magnification of 20X for in-depth analysis. The dataset was organized into five-fold cross-validation of training and validation sets. Pixel expansion and computed class weights were applied to handle class imbalance. Our proposed DARUN model architecture incorporates dilated attention and residual convolutional U-Net, enhancing feature map contextual understanding. Extensive hyperparameter fine-tuning optimized training efficiency. Model performance was evaluated using ensemble methods and a Paired t-test. The DARUN models achieved an average Dice coefficient of 0.66 and an accuracy of 0.82 on unseen testing data. Additionally, we performed adenocarcinoma segmentation and grade scoring at the slide level, with pathologist-verified segmented prediction results. An ablation study confirmed the model's generalization and robustness, achieving high Jaccard and Dice coefficients on a separate testing set. Based on a limited dataset, this study suggests the potential of the proposed methodologies and DARUN models as a promising automatic tool for early prostate cancer detection within existing clinical practice.

Instructions: 

Dataset contains 100 slides, each slide consists of image patches (image20x directory) with the size of 256 x 256 pixels in PNG RGB 8 bit depth, and it respective ground truth masks (mask20x directory). In addtion, plot20X contains sanity image and mask plots. This is the directory structure of each slide.

<slide_id>

image20X

mask20X

plot20X