The performance of Deep Learning (DL) segmentation algorithms is routinely determined using quantitative metrics like the Dice score and Hausdorff distance. However, these metrics show a low concordance with humans’ perception of segmentation quality. The successful collaboration of health care professionals with DL segmentation algorithms will require a detailed understanding of experts’ assessment of segmentation quality. Here, we present the results of a study on expert quality perception of brain tumor segmentations of brain MR images generated by a DL segmentation algorithm. Eight expert medical professionals were asked to grade the quality of segmentations on a scale from 1 (worst) to 4 (best). To this end, we collected four ratings for a dataset of 60 cases. We observed a low inter-rater agreement among all raters (Krippendorff’s alpha: 0.34), which potentially is a result of different internal cutoffs for the quality ratings. Several factors, including the volume of the segmentation and model uncertainty, were associated with high disagreement between raters. Furthermore, the correlations between the ratings and commonly used quantitative segmentation quality metrics ranged from no to moderate correlation. We conclude that, similar to the inter-rater variability observed for manual brain tumor segmentation, segmentation quality ratings are prone to variability due to the ambiguity of tumor boundaries and individual perceptual differences. Clearer guidelines for quality evaluation could help to mitigate these differences. Importantly, existing technical metrics do not capture clinical perception of segmentation quality. A better understanding of expert quality perception is expected to support the design of more human-centered DL algorithms for integration into the clinical workflow.
The communication of reliable uncertainty estimates is crucial in the effort towards increasing trust in Deep Learning applications for medical image analysis. Importantly, reliable uncertainty estimates should remain stable under naturally occurring domain shifts. In this study, we evaluate the relationship between epistemic uncertainty and segmentation quality under domain shift within two clinical contexts: optic disc segmentation in retinal photographs and brain tumor segmentation from multi-modal brain MRI. Specifically, we assess the behavior of two epistemic uncertainty metrics derived from i, a single UNet’s sigmoid predictions, ii, deep ensembles, and iii, Monte Carlo dropout UNets, each trained with both soft Dice and weighted cross-entropy loss. Domain shifts were modeled by excluding a group with a known characteristic (glaucoma for optic disc segmentation and low-grade glioma for brain tumor segmentation) from model development and using the excluded data as additional, domain-shifted test data. While the performance of all models dropped slightly on the domain-shifted test data compared to the in-domain test set, there was no change in the Pearson correlation coefficient between the uncertainty metrics and the Dice scores of the segmentations. However, we did observe differences in the performance of two quality assessment applications based on epistemic uncertainty between the segmentation tasks. We introduce a new metric, the empirical strength distribution, to better describe the strength of the relationship between segmentation performance and epistemic uncertainty on a dataset level. We found that failures of the studied quality assessment applications were largely caused by shifts in the empirical strength distributions between training, in-domain, and domain-shifted test datasets. In conclusion, quality assessment tools based on the strong relationship between epistemic uncertainty and segmentation quality can be stable under small domain shifts. Developers should thoroughly evaluate the strength relationships for all available data and, if possible, under domain shift to ensure the validity of these uncertainty estimates on unseen data.
Including uncertainty information in the assessment of a segmentation of pathologic structures on medical images, offers the potential to increase trust into deep learning algorithms for the analysis of medical imaging. Here, we examine options to extract uncertainty information from deep learning segmentation models and the influence of the choice of cost functions on these uncertainty measures. To this end we train conventional UNets without dropout, deep UNet ensembles, and Monte-Carlo (MC) dropout UNets to segment lung nodules on low dose CT using either soft Dice or weighted categorical cross-entropy (wcc) as loss functions. We extract voxel-wise uncertainty information from UNet models based on softmax maximum probability and from deep ensembles and MC dropout UNets using mean voxel-wise entropy. Upon visual assessment, areas of high uncertainty are localized in the periphery of segmentations and are in good agreement with incorrectly labelled voxels. Furthermore, we evaluate how well uncertainty measures correlate with segmentation quality (Dice score). Mean uncertainty over the segmented region (Ulabelled) derived from conventional UNet models does not show a strong quantitative relationship with the Dice score (Spearman correlation coefficient of -0.45 for the soft Dice vs -0.64 for the wcc model respectively). By comparison, image-level uncertainty measures derived from soft Dice as well as wcc MC UNet and deep UNet ensemble models correlate well with the Dice score. In conclusion, using uncertainty information offers ways to assess segmentation quality fully automatically without access to ground truth. Models trained using weighted categorical cross-entropy offer more meaningful uncertainty information on a voxel-level.
Several digital reference objects (DROs) for DCE-MRI have been created to test the accuracy of pharmacokinetic modeling software under a variety of different noise conditions. However, there are few DROs that mimic the anatomical distribution of voxels found in real data, and similarly few DROs that are based on both malignant and normal tissue. We propose a series of DROs for modeling Ktrans and Ve derived from a publically-available RIDER DCEMRI dataset of 19 patients with gliomas. For each patient’s DCE-MRI data, we generate Ktrans and Ve parameter maps using an algorithm validated on the QIBA Tofts model phantoms. These parameter maps are denoised, and then used to generate noiseless time-intensity curves for each of the original voxels. This is accomplished by reversing the Tofts model to generate concentration-times curves from Ktrans and Ve inputs, and subsequently converting those curves into intensity values by normalizing to each patient’s average pre-bolus image intensity. The result is a noiseless DRO in the shape of the original patient data with known ground-truth Ktrans and Ve values. We make this dataset publically available for download for all 19 patients of the original RIDER dataset.
Retinopathy of prematurity (ROP) is a disease that affects premature infants, where abnormal growth of the retinal blood vessels can lead to blindness unless treated accordingly. Infants considered at risk of severe ROP are monitored for symptoms of plus disease, characterized by arterial tortuosity and venous dilation at the posterior pole, with a standard photographic definition. Disagreement among ROP experts in diagnosing plus disease has driven the development of computer-based methods that classify images based on hand-crafted features extracted from the vasculature. However, most of these approaches are semi-automated, which are time-consuming and subject to variability. In contrast, deep learning is a fully automated approach that has shown great promise in a wide variety of domains, including medical genetics, informatics and imaging. Convolutional neural networks (CNNs) are deep networks which learn rich representations of disease features that are highly robust to variations in acquisition and image quality. In this study, we utilized a U-Net architecture to perform vessel segmentation and then a GoogLeNet to perform disease classification. The classifier was trained on 3,000 retinal images and validated on an independent test set of patients with different observed progressions and treatments. We show that our fully automated algorithm can be used to monitor the progression of plus disease over multiple patient visits with results that are consistent with the experts’ consensus diagnosis. Future work will aim to further validate the method on larger cohorts of patients to assess its applicability within the clinic as a treatment monitoring tool.
In the last five years, advances in processing power and computational efficiency in graphical processing units have catalyzed dozens of deep neural network segmentation algorithms for a variety of target tissues and malignancies. However, few of these algorithms preconfigure any biological context of their chosen segmentation tissues, instead relying on the neural network’s optimizer to develop such associations de novo. We present a novel method for applying deep neural networks to the problem of glioma tissue segmentation that takes into account the structured nature of gliomas – edematous tissue surrounding mutually-exclusive regions of enhancing and non-enhancing tumor. We trained separate deep neural networks with a 3D U-Net architecture in a tree structure to create segmentations for edema, non-enhancing tumor, and enhancing tumor regions. Specifically, training was configured such that the whole tumor region including edema was predicted first, and its output segmentation was fed as input into separate models to predict enhancing and non-enhancing tumor. We trained our model on publicly available pre- and post-contrast T1 images, T2 images, and FLAIR images, and validated our trained model on patient data from an ongoing clinical trial.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.