Deep convolutional neural networks (CNNs) have in recent years achieved record-breaking performance on many image classification tasks and are therefore well-suited for computer aided detection (CAD). The need for uncertainty quantification for CAD motivates the need for a probabilistic framework for deep learning. The most well-known probabilistic neural network model is the Bayesian neural network (BNN), but BNNs are notoriously difficult to sample for large complex network architectures, and as such their use is restricted to small problems. It is known that the limit of BNNs as their widths increase toward infinity is a Gaussian process (GP), and there has been considerable research interest in these infinitely wide BNNs. Recently, this classic result has been extended to deep architectures in what is termed the neural network Gaussian process (NNGP) model. In this work, we implement an NNGP model and apply it to the ChestXRay14 dataset at the full resolution of 1024x1024 pixels. Even without any convolutional aspects to the network architecture and without any data augmentation, our five layer deep NNGP model outperforms other non-convolutional models and therefore helps to narrow the performance gap between non-convolutional and convolutional models. Our NNGP model is fully Bayesian and therefore offers uncertainty information through its predictive variance that can be used to formulate a predictive confidence measure. We show that the performance of the NNGP model is significantly boosted after low-confidence predictions are rejected, suggesting that convolution is most beneficial only for these low-confidence examples. Finally, our results indicate that an extremely large fully-connected neural network with appropriate regularization could perform as well as the NNGP if not for the computational bottleneck resulting from the large number of model parameters.
Integration of heterogeneous data from different modalities such as genomics and radiomics is a growing area of research expected to generate better prediction of clinical outcomes in comparison with single modality approaches. To date radiogenomics studies have focused primarily on investigating correlations between genomic and radiomic features, or selection of salient features to determine clinical tumor phenotype. In this study, we designed deep neural networks (DNN), which combine both radiomic and genomic features to predict pathological stage and molecular receptor status of invasive breast cancer patients. Utilizing imaging data from The Cancer Imaging Archive (TCIA) and gene expression data from The Cancer Genome Atlas (TCGA), we evaluated the predictive power of Convolutional Neural Networks (CNN). Overall, results suggest superior performance on CNNs leveraging radiogenomics in comparison with CNNs trained on single modality data sources.
Prior research has shown that physicians’ medical decisions can be influenced by sequential context, particularly in cases where successive stimuli exhibit similar characteristics when analyzing medical images. This type of systematic error is known to psychophysicists as sequential context effect as it indicates that judgments are influenced by features of and decisions about the preceding case in the sequence of examined cases, rather than being based solely on the peculiarities unique to the present case. We determine if radiologists experience some form of context bias, using screening mammography as the use case. To this end, we explore correlations between previous perceptual behavior and diagnostic decisions and current decisions. We hypothesize that a radiologist’s visual search pattern and diagnostic decisions in previous cases are predictive of the radiologist’s current diagnostic decisions. To test our hypothesis, we tasked 10 radiologists of varied experience to conduct blind reviews of 100 four-view screening mammograms. Eye-tracking data and diagnostic decisions were collected from each radiologist under conditions mimicking clinical practice. Perceptual behavior was quantified using the fractal dimension of gaze scanpath, which was computed using the Minkowski–Bouligand box-counting method. To test the effect of previous behavior and decisions, we conducted a multifactor fixed-effects ANOVA. Further, to examine the predictive value of previous perceptual behavior and decisions, we trained and evaluated a predictive model for radiologists’ current diagnostic decisions. ANOVA tests showed that previous visual behavior, characterized by fractal analysis, previous diagnostic decisions, and image characteristics of previous cases are significant predictors of current diagnostic decisions. Additionally, predictive modeling of diagnostic decisions showed an overall improvement in prediction error when the model is trained on additional information about previous perceptual behavior and diagnostic decisions.
Our objective is to improve understanding of visuo-cognitive behavior in screening mammography under clinically equivalent experimental conditions. To this end, we examined pupillometric data, acquired using a head-mounted eye-tracking device, from 10 image readers (three breast-imaging radiologists and seven Radiology residents), and their corresponding diagnostic decisions for 100 screening mammograms. The corpus of mammograms comprised cases of varied pathology and breast parenchymal density. We investigated the relationship between pupillometric fluctuations, experienced by an image reader during mammographic screening, indicative of changes in mental workload, the pathological characteristics of a mammographic case, and the image readers’ diagnostic decision and overall task performance. To answer these questions, we extract features from pupillometric data, and additionally applied time series shapelet analysis to extract discriminative patterns in changes in pupil dilation. Our results show that pupillometric measures are adequate predictors of mammographic case pathology, and image readers’ diagnostic decision and performance with an average accuracy of 80%.
Several researchers have investigated radiologists’ visual scanning patterns with respect to features such as total time examining a case, time to initially hit true lesions, number of hits, etc. The purpose of this study was to examine the complexity of the radiologists’ visual scanning pattern when viewing 4-view mammographic cases, as they typically do in clinical practice. Gaze data were collected from 10 readers (3 breast imaging experts and 7 radiology residents) while reviewing 100 screening mammograms (24 normal, 26 benign, 50 malignant). The radiologists’ scanpaths across the 4 mammographic views were mapped to a single 2-D image plane. Then, fractal analysis was applied on the composite 4- view scanpaths. For each case, the complexity of each radiologist’s scanpath was measured using fractal dimension estimated with the box counting method. The association between the fractal dimension of the radiologists’ visual scanpath, case pathology, case density, and radiologist experience was evaluated using fixed effects ANOVA. ANOVA showed that the complexity of the radiologists’ visual search pattern in screening mammography is dependent on case specific attributes (breast parenchyma density and case pathology) as well as on reader attributes, namely experience level. Visual scanning patterns are significantly different for benign and malignant cases than for normal cases. There is also substantial inter-observer variability which cannot be explained only by experience level.
Previously, we have shown the potential of using an individual’s visual search pattern as a possible biometric. That study focused on viewing images displaying dot-patterns with different spatial relationships to determine which pattern can be more effective in establishing the identity of an individual. In this follow-up study we investigated the temporal stability of this biometric. We performed an experiment with 16 individuals asked to search for a predetermined feature of a random-dot pattern as we tracked their eye movements. Each participant completed four testing sessions consisting of two dot patterns repeated twice. One dot pattern displayed concentric circles shifted to the left or right side of the screen overlaid with visual noise, and participants were asked which side the circles were centered on. The second dot-pattern displayed a number of circles (between 0 and 4) scattered on the screen overlaid with visual noise, and participants were asked how many circles they could identify. Each session contained 5 untracked tutorial questions and 50 tracked test questions (200 total tracked questions per participant). To create each participant’s "fingerprint", we constructed a Hidden Markov Model (HMM) from the gaze data representing the underlying visual search and cognitive process. The accuracy of the derived HMM models was evaluated using cross-validation for various time-dependent train-test conditions. Subject identification accuracy ranged from 17.6% to 41.8% for all conditions, which is significantly higher than random guessing (1/16 = 6.25%). The results suggest that visual search pattern is a promising, temporally stable personalized fingerprint of perceptual organization.
Two people may analyze a visual scene in two completely different ways. Our study sought to determine whether human gaze may be used to establish the identity of an individual. To accomplish this objective we investigated the gaze pattern of twelve individuals viewing still images with different spatial relationships. Specifically, we created 5 visual “dotpattern” tests to be shown on a standard computer monitor. These tests challenged the viewer’s capacity to distinguish proximity, alignment, and perceptual organization. Each test included 50 images of varying difficulty (total of 250 images). Eye-tracking data were collected from each individual while taking the tests. The eye-tracking data were converted into gaze velocities and analyzed with Hidden Markov Models to develop personalized gaze profiles. Using leave-one-out cross-validation, we observed that these personalized profiles could differentiate among the 12 users with classification accuracy ranging between 53% and 76%, depending on the test. This was statistically significantly better than random guessing (i.e., 8.3% or 1 out of 12). Classification accuracy was higher for the tests where the users’ average gaze velocity per case was lower. The study findings support the feasibility of using gaze as a biometric or personalized biomarker. These findings could have implications in Radiology training and the development of personalized e-learning environments.
Search involves detecting the locations of potential lesions. Classification involves determining if a detected region is a
true lesion. The most commonly used measure of observer performance, namely the area A under the ROC curve, is
affected by both search and classification performances. The aim was to demonstrate a method for separating these
contributions and apply it to several clinical datasets. Search performance S was defined as the square root of 2 times the
perpendicular distance of the end-point of the search-model predicted ROC from the chance diagonal. Classification
performance C was defined as the separation of the unit-variance binormal distributions for signal and noise sites.
Eleven (11) datasets were fitted by the search model and search, classification and trapezoidal A were computed for each
modality and reader combination. Kendall-tau correlations were calculated between the resulting S, C and A pairs.
Kendall correlation (S vs. C) was smaller than zero for all datasets, and the average Kendall correlation was significantly
smaller than 0 (average = -0.401, P = 8.3 x 10<sup>-6</sup>). Also, Kendall correlation (A vs. S) was larger than zero for 9 out of 11
datasets and the average Kendall correlation was significantly larger than 0 (average = 0.295, P = 2.9 x 10<sup>-3</sup>). On the
other hand average Kendall correlation (A vs. C) was not significantly different from zero (average = 0.102, P = 0.25).
The results suggest that radiologists may learn to compensate for poor search performance with better classification
performance. This study also indicates that efforts at improving net performance, which currently focus almost
exclusively on improving classification performance, may be more successful if aimed at improving search performance.
Jackknife alternative free-response receiver operating characteristic (JAFROC) is a method for measuring human
observer performance in localization tasks. JAFROC is being increasingly used to evaluate imaging modalities because
it has been shown to have greater statistical power than conventional receiver operating characteristic (ROC) analysis,
which neglects location information. JAFROC neglects the non-lesion localization marks ("false positives") on abnormal
images. JAFROC1 is an alternative method that includes these marks. Both methods are lesion-centric in the sense that
they assign equal importance to all lesions; an image with many lesions would tend to dominate the performance metric,
and clinically less significant lesions are treated identically as more significant ones. In this paper weighted JAFROC
and JAFROC1 analyses are described that treat each abnormal image (not each lesion) as a unit of measurement and
account for different lesion clinical significances (weights). Lesion-centric and weighted methods were tested using a
simulator that includes multiple-reader multiple-case multiple-modality location level correlations. For comparison,
ROC analysis was also tested where the rating of the highest rated mark on an image was assumed to be its "ROC"
rating. The testing involved random numbers of lesions per image, random weights, case-mixes (ratio of normal to
abnormal images) and different correlation structures. We found that for either JAFROC or JAFROC1, both lesion-centric
and weighted analyses had correct NH behavior and comparable statistical powers. For either lesion-centric or
weighted analyses JAFROC1 yielded the highest power, followed by JAFROC and ROC yielded the least power,
confirming a recent study using a less flexible single-reader dual-modality simulator. Provided the number of normal
cases is not too small, JAFROC1 is the preferred method for analyzing human observer free-response data. For either
JAFROC or JAFROC1 weighted analysis is preferable.
The directional wavelet used in image processing has orientation selectivity and can provide a sparse representation of
edges in natural images. Multiwavelets offer the possibility of better performance in image processing applications as
compared to the scalar wavelet. Applying directionality to multiwavelets may thus gain both advantages. This paper
proposes a scheme, named multiridgelets, which is an extension of ridgelets. We consider the application of the
balanced multiwavelet transform to the Radon transform of an image. Specifically, we consider its use in the image
texture analysis. The regular polar angle method is employed to realize the discrete transform. Three statistical features:
standard deviation, median, and entropy are computed based on multiridgelet coefficients. Comparative study was made
with the results obtained using 2D wavelets, scalar ridgelets, and curvelets. Classification of the mura defects of the LCD
screen is tested to quantify performance of the proposed texture analysis methods. 240 normal images and 240 simulated
defected images are supplied to train the support vector machine classifier and another 40 normal and 40 defected
images for testing. It concludes that multiridgelets were comparable to or better than curvelets and gave significant
performance than 2D wavelets and scalar ridgelets.
We examined the statistical powers of three methods for analyzing FROC mark-rating data, namely ROC, JAFROC and
IDCA. Two classes of observers were simulated: a designer-level CAD algorithm and a human observer. A search-model
based simulator was used with the average numbers of false positives per image ranging from 0.21 for the human
observer to 10 for CAD. Model parameters were chosen to yield 80% and 85% areas under the predicted ROC curves
for both classes of observers and inter-image and inter-modality correlations of 0.1, 0.5 and 0.9 were investigated. The
area under the FROC curve up to abscissa α (ranging from 0.18 to 6.7) was used as the IDCA figure-of-merit; the other
methods used their well-known figures of merit. For IDCA power increased with α so it should be chosen as large as
possible consistent with the need for overlap of the two FROC curves in the x-direction. For CAD the IDCA method
yielded the highest statistical power. Surprisingly, JAFROC yielded the highest statistical power for human observers,
even greater than IDCA which, unlike JAFROC, uses all the marks. The largest difference occurred for conservative
reporting styles and high data correlation: e.g., 0.3453 for JAFROC vs. 0.2672 for IDCA. One reason is that unlike
IDCA, the JAFROC figure of merit is sensitive to unmarked normal images and unmarked lesions. In all cases the ROC
method yielded the least statistical power and entailed a substantial statistical power penalty (e.g., 24% for ROC vs. 41%
for JAFROC). For human observers JAFROC should be used and for designer-level CAD data IDCA should be used and
use of the ROC method for localization studies is discouraged.
We describe a query-by-content search engine that enables a radiologist to search a large database of diagnostically- proven (`benign' or `malignant') mammographic region of interest (ROIs). The database search is facilitated by a relational map which is a 2D display of all the ROIs in the database. Labeled points on the map represent ROIs in the database. The map is constructed from the output of a neural network that has been trained to cluster the ROIs in the database using a measure of perceptual similarity.