Purpose: Deep learning is a promising technique for spleen segmentation. Our study aims to validate the reproducibility of deep learning-based spleen volume estimation by performing spleen segmentation on clinically acquired computed tomography (CT) scans from patients with myeloproliferative neoplasms.
Approach: As approved by the institutional review board, we obtained 138 de-identified abdominal CT scans. A sum of voxel volume on an expert annotator’s segmentations establishes the ground truth (estimation 1). We used our deep convolutional neural network (estimation 2) alongside traditional linear estimations (estimation 3 and 4) to estimate spleen volumes independently. Dice coefficient, Hausdorff distance, R2 coefficient, Pearson R coefficient, the absolute difference in volume, and the relative difference in volume were calculated for 2 to 4 against the ground truth to compare and assess methods’ performances. We re-labeled on scan–rescan on a subset of 40 studies to evaluate method reproducibility.
Results: Calculated against the ground truth, the R2 coefficients for our method (estimation 2) and linear method (estimation 3 and 4) are 0.998, 0.954, and 0.973, respectively. The Pearson R coefficients for the estimations against the ground truth are 0.999, 0.963, and 0.978, respectively (paired t-tests produced p < 0.05 between 2 and 3, and 2 and 4).
Conclusion: The deep convolutional neural network algorithm shows excellent potential in rendering more precise spleen volume estimations. Our computer-aided segmentation exhibits reasonable improvements in splenic volume estimation accuracy.
Purpose: Deep learning methods have become essential tools for quantitative interpretation of medical imaging data, but training these approaches is highly sensitive to biases and class imbalance in the available data. There is an opportunity to increase the available training data by combining across different data sources (e.g., distinct public projects); however, data collected under different scopes tend to have differences in class balance, label availability, and subject demographics. Recent work has shown that importance sampling can be used to guide training selection. To date, these approaches have not considered imbalanced data sources with distinct labeling protocols.
Approach: We propose a sampling policy, known as adaptive stochastic policy (ASP), inspired by reinforcement learning to adapt training based on subject, data source, and dynamic use criteria. We apply ASP in the context of multiorgan abdominal computed tomography segmentation. Training was performed with cross validation on 840 subjects from 10 data sources. External validation was performed with 20 subjects from 1 data source.
Results: Four alternative strategies were evaluated with the state-of-the-art baseline as upper confident bound (UCB). ASP achieves average Dice of 0.8261 compared to 0.8135 UCB (p < 0.01, paired t-test) across fivefold cross validation. On withheld testing datasets, the proposed ASP achieved 0.8265 mean Dice versus 0.8077 UCB (p < 0.01, paired t-test).
Conclusions: ASP provides a flexible reweighting technique for training deep learning models. We conclude that the proposed method adapts the sample importance, which leverages the performance on a challenging multisite, multiorgan, and multisize segmentation task.
Dynamic contrast enhanced computed tomography (CT) is an imaging technique that provides critical information on the relationship of vascular structure and dynamics in the context of underlying anatomy. A key challenge for image processing with contrast enhanced CT is that phase discrepancies are latent in different tissues due to contrast protocols, vascular dynamics, and metabolism variance. Previous studies with deep learning frameworks have been proposed for classifying contrast enhancement with networks inspired by computer vision. Here, we revisit the challenge in the context of whole abdomen contrast enhanced CTs. To capture and compensate for the complex contrast changes, we propose a novel discriminator in the form of a multi-domain disentangled representation learning network. The goal of this network is to learn an intermediate representation that separates contrast enhancement from anatomy and enables classification of images with varying contrast time. Briefly, our unpaired contrast disentangling GAN(CD-GAN) Discriminator follows the ResNet architecture to classify a CT scan from different enhancement phases. To evaluate the approach, we trained the enhancement phase classifier on 21060 slices from two clinical cohorts of 230 subjects. The scans were manually labeled with three independent enhancement phases (non-contrast, portal venous and delayed). Testing was performed on 9100 slices from 30 independent subjects who had been imaged with CT scans from all contrast phases. Performance was quantified in terms of the multi-class normalized confusion matrix. The proposed network significantly improved correspondence over baseline UNet, ResNet50 and StarGAN’s performance of accuracy scores 0.54. 0.55, 0.62 and 0.91, respectively (p-value<0.0001 paired t-test for ResNet versus CD-GAN). The proposed discriminator from the disentangled network presents a promising technique that may allow deeper modeling of dynamic imaging against patient specific anatomies.
Segmentation of abdominal computed tomography (CT) provides spatial context, morphological properties, and a framework for tissue-specific radiomics to guide quantitative Radiological assessment. A 2015 MICCAI challenge spurred substantial innovation in multi-organ abdominal CT segmentation with both traditional and deep learning methods. Recent innovations in deep methods have driven performance toward levels for which clinical translation is appealing. However, continued cross-validation on open datasets presents the risk of indirect knowledge contamination and could result in circular reasoning. Moreover, “real world” segmentations can be challenging due to the wide variability of abdomen physiology within patients. Herein, we perform two data retrievals to capture clinically acquired deidentified abdominal CT cohorts with respect to a recently published variation on 3D U-Net (baseline algorithm). First, we retrieved 2004 deidentified studies on 476 patients with diagnosis codes involving spleen abnormalities (cohort A). Second, we retrieved 4313 deidentified studies on 1754 patients without diagnosis codes involving spleen abnormalities (cohort B). We perform prospective evaluation of the existing algorithm on both cohorts, yielding 13% and 8% failure rate, respectively. Then, we identified 51 subjects in cohort A with segmentation failures and manually corrected the liver and gallbladder labels. We re-trained the model adding the manual labels, resulting in performance improvement of 9% and 6% failure rate for the A and B cohorts, respectively. In summary, the performance of the baseline on the prospective cohorts was similar to that on previously published datasets. Moreover, adding data from the first cohort substantively improved performance when evaluated on the second withheld validation cohort.
Abdominal multi-organ segmentation of computed tomography (CT) images has been the subject of extensive research interest. It presents a substantial challenge in medical image processing, as the shape and distribution of abdominal organs can vary greatly among the population and within an individual over time. While continuous integration of novel datasets into the training set provides potential for better segmentation performance, collection of data at scale is not only costly, but also impractical in some contexts. Moreover, it remains unclear what marginal value additional data have to offer. Herein, we propose a single-pass active learning method through human quality assurance (QA). We built on a pre-trained 3D U-Net model for abdominal multi-organ segmentation and augmented the dataset either with outlier data (e.g., exemplars for which the baseline algorithm failed) or inliers (e.g., exemplars for which the baseline algorithm worked). The new models were trained using the augmented datasets with 5-fold cross-validation (for outlier data) and withheld outlier samples (for inlier data). Manual labeling of outliers increased Dice scores with outliers by 0.130, compared to an increase of 0.067 with inliers (<0.001, two-tailed paired t-test). By adding 5 to 37 inliers or outliers to training, we find that the marginal value of adding outliers is higher than that of adding inliers. In summary, improvement on single-organ performance was obtained without diminishing multi-organ performance or significantly increasing training time. Hence, identification and correction of baseline failure cases present an effective and efficient method of selecting training data to improve algorithm performance.
Human in-the-loop quality assurance (QA) is typically performed after medical image segmentation to ensure that the systems are performing as intended, as well as identifying and excluding outliers. By performing QA on large-scale, previously unlabeled testing data, categorical QA scores (e.g. “successful” versus “unsuccessful”) can be generated. Unfortunately, the precious use of resources for human in-the-loop QA scores are not typically reused in medical image machine learning, especially to train a deep neural network for image segmentation. Herein, we perform a pilot study to investigate if the QA labels can be used as supplementary supervision to augment the training process in a semi-supervised fashion. In this paper, we propose a semi-supervised multi-organ segmentation deep neural network consisting of a traditional segmentation model generator and a QA involved discriminator. An existing 3-D abdominal segmentation network is employed, while the pre-trained ResNet-18 network is used as discriminator. A large-scale dataset of 2027 volumes are used to train the generator, whose 2-D montage images and segmentation mask with QA scores are used to train the discriminator. To generate the QA scores, the 2-D montage images were reviewed manually and coded 0 (success), 1 (errors consistent with published performance), and 2 (gross failure). Then, the ResNet-18 network was trained with 1623 montage images in equal distribution of all three code labels and achieved an accuracy 94% for classification predictions with 404 montage images withheld for the test cohort. To assess the performance of using the QA supervision, the discriminator was used as a loss function in a multi-organ segmentation pipeline. The inclusion of QA-loss function boosted performance on the unlabeled test dataset from 714 patients to 951 patients over the baseline model. Additionally, the number of failures decreased from 606 (29.90%) to 402 (19.83%). The contributions of the proposed method are threefold: We show that (1) the QA scores can be used as a loss function to perform semi-supervised learning for unlabeled data, (2) the well trained discriminator is learnt by QA score rather than traditional “true/false”, and (3) the performance of multi-organ segmentation on unlabeled datasets can be fine-tuned with more robust and higher accuracy than the original baseline method. The use of QA-inspired loss functions represents a promising area of future research and may permit tighter integration of supervised and semi-supervised learning.
Splenomegaly segmentation on computed tomography (CT) abdomen anatomical scans is essential for identifying spleen biomarkers and has applications for quantitative assessment in patients with liver and spleen disease. Deep convolutional neural network automated segmentation has shown promising performance for splenomegaly segmentation. However, manual labeling of abdominal structures is resource intensive, so the labeled abdominal imaging data are rare resources despite their essential role in algorithm training. Hence, the number of annotated labels (e.g., spleen only) are typically limited with a single study. However, with the development of data sharing techniques, more and more publicly available labeled cohorts are available from different resources. A key new challenging is to co-learn from the multi-source data, even with different numbers of labeled abdominal organs in each study. Thus, it is appealing to design a co-learning strategy to train a deep network from heterogeneously labeled scans. In this paper, we propose a new deep convolutional neural network (DCNN) based method that integrates heterogeneous multi-resource labeled cohorts for splenomegaly segmentation. To enable the proposed approach, a novel loss function is introduced based on the Dice similarity coefficient to adaptively learn multi-organ information from different resources. Three cohorts were employed in our experiments, the first cohort (98 CT scans) has only splenomegaly labels, while the second training cohort (100 CT scans) has 15 distinct anatomical labels with normal spleens. A separate, independent cohort consisting of 19 splenomegaly CT scans with labeled spleen was used as testing cohort. The proposed method achieved the highest median Dice similarity coefficient value (0.94), which is superior (p-value<0.01 against each other method) to the baselines of multi-atlas segmentation (0.86), SS-Net segmentation with only spleen labels (0.90) and U-Net segmentation with multi-organ training (0.91). Our approach for adapting the loss function and training structure is not specific to the abdominal context and may be beneficial in other situations where datasets with varied label sets are available.