Open Access
8 May 2021 Using deep neural networks and interpretability methods to identify gene expression patterns that predict radiomic features and histology in non-small cell lung cancer
Nova F. Smedley, Denise R. Aberle, William Hsu
Author Affiliations +

Purpose: Integrative analysis combining diagnostic imaging and genomic information can uncover biological insights into lesions that are visible on radiologic images. We investigate techniques for interrogating a deep neural network trained to predict quantitative image (radiomic) features and histology from gene expression in non-small cell lung cancer (NSCLC).

Approach: Using 262 training and 89 testing cases from two public datasets, deep feedforward neural networks were trained to predict the values of 101 computed tomography (CT) radiomic features and histology. A model interrogation method called gene masking was used to derive the learned associations between subsets of genes and a radiomic feature or histology class [adenocarcinoma (ADC), squamous cell, and other].

Results: Overall, neural networks outperformed other classifiers. In testing, neural networks classified histology with area under the receiver operating characteristic curves (AUCs) of 0.86 (ADC), 0.91 (squamous cell), and 0.71 (other). Classification performance of radiomics features ranged from 0.42 to 0.89 AUC. Gene masking analysis revealed new and previously reported associations. For example, hypoxia genes predicted histology (>0.90  AUC). Previously published gene signatures for classifying histology were also predictive in our model (>0.80  AUC). Gene sets related to the immune or cardiac systems and cell development processes were predictive (>0.70  AUC) of several different radiomic features. AKT signaling, tumor necrosis factor, and Rho gene sets were each predictive of tumor textures.

Conclusions: This work demonstrates neural networks’ ability to map gene expressions to radiomic features and histology types in NSCLC and to interpret the models to identify predictive genes associated with each feature or type.



Precision medicine has driven the high-throughput profiling of both molecular and medical imaging data to identify detailed tumor subtypes that better predict survival and treatment outcomes. Radiogenomic studies attempt to integrate two complementary data types to explain tumor imaging patterns using molecular information and vice versa. For example, radiogenomic studies have shown that image features [e.g., appearance of a tumor on computed tomography (CT) or magnetic resonance imaging] predict molecular patterns (e.g., gene expression, gene mutation, or molecular subtypes).16 Radiogenomic studies support the derivation of tumors’ biological states from noninvasive imaging and the correlation of molecular information and imaging phenotypes to better understand cancer heterogeneity. However, radiogenomic studies are often limited by the high dimensionality of the data, the simplifying model assumptions (e.g., linearity), and the lack of validation datasets.7,8

Deep learning techniques have been widely used on molecular and imaging datasets given their ability to handle high-dimensional inputs without feature engineering and to represent nonlinear and hierarchical relationships between model inputs and outputs. Several studies have used deep learning models such as convolutional neural networks, generative adversarial networks, and autoencoders to uncover radiogenomic associations.5,6,9,10 However, while these works report accurate predictions of imaging phenotypes from genomic data, they do not attempt to provide a biological interpretation of what the model has learned. While high classification accuracy is important, the ability to interrogate the model is critical to validating the learned radiogenomic associations.

In our previous work, we addressed the model understanding challenge by presenting methods such as gene masking to interpret trained neural networks.11 We showed that the models were capable of learning radiogenomic associations that were consistent with prior work while also generating new associations for further consideration. A limitation of this prior work11 is that the analysis was performed using a single dataset from glioblastoma patients. We did not demonstrate the generalizability of our approach in other domains. Therefore, the purpose of this study is to investigate the ability of neural networks to model gene expression in a different cancer domain with multiple different histologies or stages, a variety of computationally derived image features (e.g., shape and texture), and an external validation or testing dataset.

Here, we present deep feedforward neural networks to model transcriptomes using two similarly derived radiogenomic datasets recently published in non-small cell lung cancer (NSCLC).12 As one of the few publicly released radiogenomic datasets available, the paper reported radiogenomic associations and provided a basis for comparison. First, we evaluate the ability of neural networks to independently predict two clinical traits (histology and stage) and 101 radiomic features using a transcriptome consisting of 21,766 gene expressions in a training dataset of 262 patients. Next, we demonstrate the generalizability of our neural network models in an independent validation dataset of 89 patients. Finally, we systematically probe the trained neural networks to define specific patterns of gene expression related to a clinical trait or radiomic feature. We compare the models’ learned relationships with previously reported associations.


Materials and Methods



Clinical, imaging, and transcriptomic data were from 351 cases used in a prior study.12 The data consisted of two groups of patients, all of whom were diagnosed with primary tumors, had contrast-enhanced diagnostic CT, and underwent surgical resection. In one group, data were collected from 262 patients treated at the H. Lee Moffitt Cancer Center, Tampa, Florida, from 2006 to 2009 (Dataset1). The remaining 89 patients were treated at the Maastricht University Medical Centre, Maastricht, Netherlands (Dataset2). In this study, Dataset1 and Dataset2 were treated as training and testing datasets, respectively. Patient characteristics are compared between the two datasets in Table 1. The clinical stage referred to pathologic TNM staging and was represented using four categories: I, II, III, or other. Pathologic histology was captured using three categories: adenocarcinoma (ADC), squamous carcinoma (SCC), or “other.” CT scans were interpolated to have a voxel size of 1.0×1.0×1.0  mm3, and radiomic features were generated in prior work by the original study authors13 from regions identified using a semiautomated ensemble segmentation algorithm.14 Radiomic features were extracted from three-dimensional tumor volumes in contrast-enhanced presurgical CT scans to determine histogram statistics; morphology; textures, such as gray-level co-occurrence matrix (GLCM), gray-level run-length matrix (called RLGL), and gray-level size-zone matrix (GLSZM); Laplacian of Gaussian (LoG) transformations; and wavelet decompositions. Transcriptomes consisting of 21,766 gene expression levels were measured for all patients using the same Affymetrix microarray chip platform.

Table 1

Patient characteristics, estimated or replicated from the source.12

Training (n=262)Testing (n=89)
GenderMale100 (38%)59 (66%)
Female124 (47%)28 (32%)
N/A38 (15%)2 (2%)
HistologyADC129 (49%)42 (48%)
Squamous61 (23%)33 (38%)
Other34 (13%)12 (14%)
N/A38 (15%)0 (0%)
StageI123 (47%)39 (44%)
II35 (13%)26 (29%)
III46 (18%)12 (14%)
Other20 (8%)10 (11%)
N/A38 (14%)2 (2%)
Smoking statusCurrent66 (25%)N/A
Former141 (54%)N/A
None17 (7%)N/A
N/A38 (14%)N/A
Tumor sitePrimary224 (86%)87 (98%)
N/A38 (14%)2 (2%)
StatusAlive134 (51%)41 (46%)
Deceased90 (35%)46 (52%)
N/A38 (14%)2 (2%)
Follow-upMedian months3231

In this study, each radiomic feature was a continuous variable that was transformed into binary classes using k-means clustering in the training dataset. This approach was used to separate patients into two groups based on a single radiomic feature. If a radiomic feature was highly imbalanced (e.g., >90% of the patients belong to one group), that feature was removed from the analysis. For example, the radiomic feature surface area to volume ratio was grouped into two clusters: cluster A had a mean ratio of 0.26, and cluster B had a mean ratio of 0.52. Each cluster represented one class. Clustering was performed via KMeans from sklearn. After filtering, 101 radiomic features out of the 636 features generated in the original study remained.12 These included seven types of radiomic features: 10 shape, 22 GLCM, 7 GLSZM, 10 RLGL, 12 stats, 9 LoG, and 31 wavelet features. The full names and class frequencies of the 101 radiomic features are provided in a GitHub repository (see Code Availability). LoG and wavelet features were selected from LoG_sigma_0_5_mm_2D and the HHH wavelet decomposition. Radiomic features with definitions that were not explicitly provided by the source12,13 were not considered during model interpretation. The same cluster boundaries discovered during training were used to binarize the testing data.


Radiogenomic Modeling

Dense feed-forward neural networks were used to map transcriptomes (which we defined as model inputs) to radiomic features (model outputs). Gene expression was standardized by mean subtraction and standard deviation division for each gene. Layers within the radiogenomic models were constructed with three or four hidden layers, where the first hidden layer had either 2000, 4000, or 6000 nodes. The number of hidden nodes was halved with each subsequent hidden layer. Neural networks were trained using cross-entropy loss, nonlinear activation functions, dropout, batch normalization, and early stopping by monitoring loss.

Performance was reported using area under the receiver operating characteristic curve (AUC). Hyperparameters were selected using a grid search based on the model that achieved the highest mean AUC in 10-fold cross-validation (CV) during training. During CV, the mean and standard deviation used for gene standardization were calculated using the training folds for each split; in testing, metrics were based on the entire training dataset. Training performance was averaged across CV folds. Accuracy was calculated based on a decision threshold of 0.5 class probability.

Figure 1(a) shows the overall procedure used to train the radiogenomic neural networks. To reduce the hyperparameter search for each radiomic feature, a grid search was performed using a neural network that predicted all features corresponding to one of the seven radiomic feature groups (previously defined as shape, GLCM, GLSZM, RLGL, stats, LoG, and wavelet). For example, one network would be trained to predict all 22 GLCM features as a multilabel classification task using the patient’s transcriptome as the input. Once a neural network was trained for each radiomic group, the hyperparameters used for the best performing network were then used to train a neural network that predicted a single radiomic feature within that group. Other classifiers, including logistic regression, support vector machines, random forest, and gradient boosted trees, were also implemented as a comparison. Each comparison model was trained to predict a single radiomic feature and evaluated against the neural network. The best performing model for each radiomic feature was then retrained on the entire training dataset to obtain the final model. Final models were evaluated on the testing dataset. Radiomic features with at least 0.70 test AUC were kept for further analysis.

Fig. 1

An overview of this study’s approaches to (a) training and (b) interpretation radiogenomic neural networks.


More details on the architecture of radiogenomic neural networks are shown in Fig. 2. All models and their hyperparameters are listed in Table 2.

Fig. 2

The architecture and hyperparameter tuning of a radiogenomic neural network.


Table 2

NSCLC radiogenomic models and hyperparameters. Grid search was used for selecting hyperparameters.

Model type# ModelsHyperparameterValues
Logistic regression100Penalty typeElastic net
L1 ratio[0–1]
Support vector machines400KernelLinear, poly, rbf, sigmoid
C penaltylog(6)log(6)
Random forest120Trees[50:100:3050]
Split criterionGini, entropy
Max. featuresG, log2(G)
Max. depthNone
Gradient boosted trees150Trees[50:100:1000)
Max. depth[1–3]
Learning rate[0.01:0.1:0.50]
Neural network48Hidden nodesa[6000–250]
Hidden layers3, 4
OptimizerNadam, Adadelta
ActivationSigmoid, relu
Dropout0.4, 0.6
LossBinary cross-entropy
Epochs (patience)500 (200)
G: number of genes in the transcriptome

aFirst hidden layers were either 2000, 4000, or 6000 nodes, where the number of hidden nodes for a layer was halved with each subsequent layer.


Modeling Tumor Histology and Stage

In addition to predicting radiomic features, we trained additional networks to predict histology and stage from the transcriptome. These networks have the same architecture as what was used for radiogenomic modeling. Histology and stage were each modeled as multilabel classification tasks (Table 1). The neural networks used categorical entropy loss and softmax activation in the prediction layer. Training scores were microaveraged across all classes and folds in CV. Test scores and model interpretation methods were based on one class versus all other classes unless otherwise noted. The methods for hyperparameter optimization and model selection were the same as previously described.


Using Interpretability Methods to Identify Gene Expression Patterns

Figure 1(b) shows the gene masking steps used to extract predictive gene expression patterns from trained neural networks. Gene masking was previously defined for radiogenomic neural networks11 and focused on a component of the model’s input using one predefined gene set at a time, a similar process to sensitivity analysis.15 For example, masking of genes related to hypoxia involved taking each patient’s gene expression profile, keeping only the hypoxia genes’ expressions, and setting all other gene expressions to zero (i.e., the input is masked). The masked input was then pushed through the model, and the model’s prediction of a radiomic feature was recorded. This process was repeated for the entire cohort, and classification performance was calculated via AUC and average precision (AP). Gene masking measured the model’s ability to predict radiomic features based on gene expression of a particular gene set, where the higher the performance is, the stronger the association in the cohort is. Masking resulted in “radiogenomic associations” learned by the model. The strength and generalizability of the learned radiogenomic associations were measured for each gene set by their performance (i.e., AUC) in the testing cohort. Predefined gene sets were used for gene masking. These included the “Hallmark” and “Gene Ontology” (GO) biological processes from Molecular Signature Database (MSigDB) v7.0.16 For simplicity, gene sets with a maximum of 500 genes were studied.

Relationships between gene expression and histology were also studied using gene masking. In prior work, a 42-gene signature was used to distinguish ADC from SCC.17 In another similar effort, a 75-probe set signature was found for ADC, SCC, and large cell carcinoma (LCC).18 The gene sets reported in these works were also examined in this work as a point of comparison.




Predicting Histology or Stage from the Transcriptome


Deep neural networks outperform comparison approaches

In predicting histology, deep neural networks outperformed other models by more than 0.10 AUC in 10-fold CV of the training dataset [Fig. 3(a)]. Performance of the histology model remained consistent in the testing dataset, achieving test scores of 0.86 AUC, 0.91 AUC, and 0.71 AUC in predicting ADC, SCC, or other histology, respectively. The neural network achieved an overall microaveraged test AUC of 0.77. In predicting stage, all models fared poorly, with the neural network still achieving the best performance. Given the poor performance in the predicting stage, only the neural network used to predict histology was analyzed using gene masking.

Fig. 3

The ability of models to predict NSCLC histology and stage in (a) training and (b) testing. In training, models were evaluated using 10-fold CV, and models were compared using the mean AUC scores in CV. The top performing model was then retrained on the full training dataset and evaluated on the testing dataset. The testing performance scores are shown for each histology type and stage. (c)–(e) Gene masking of the histology neural network using gene sets from (c) published gene signatures for histology,17,18 (d) Hallmark (top five out of 50),19 and (e) Gene Ontology biological processes (GO.bp, top ten out of 7350).16 In (c)–(e), each column is a type of histology and each row is a gene set used to mask the trained model to inspect how well the model predicted a certain histology type. The color in a cell shows the model’s performance using a gene set to predict a histology type, where red denotes higher AUCs and purple denotes lower AUCs in the testing dataset. (b) The ability of the histology model to predict each histology type and the AUC score is based on using all genes in the gene expression profile. (c)–(e) The ability of a specific gene set in predicting a histology type. For more details on the gene sets used, see Sec. 2. Notation: lcc = gene signature for large cell carcinoma, scc = gene signature for squamous cell carcinoma, adc = gene signature for adenocarciomas, adc versus scc = gene signature for differentiating ADC from SCC.



Gene masking identifies gene sets that predict histology

Gene masking of the histology neural network showed agreement with previously published gene signatures for predicting NSCLC, ADC, and SCC.17,18 As shown in [Fig. 3(c)], the aforementioned gene signatures were also found to be predictive in our histology neural network in the testing dataset. In particular, the gene signature from Ref. 17 resulted in 0.93 test AUC in both ADC and SCC.

Hallmark gene sets were also predictive of histology [Fig. 3(d)]. Gene expression related to hypoxia, coagulation, and KRAS signaling predicted both ADC and SCC (>0.90 test AUC). Similar to the overall performance observed on the testing data, summarized in Fig. 3(b), the histology neural network was driven by accurate predictions in ADC and SCC, where the test AUC was about 0.20 higher than what was obtained when predicting the other histology class. Subsequently, this behavior was reflected in gene masking, where the most predictive gene sets for estimating other histology in testing were inflammatory response (0.73 AUC) and angiogenesis (0.72 AUC). Notably, angiogenesis was more predictive of other histologies (0.73 AUC) than ADC (0.66 AUC) or SCC (0.63 AUC) classes in testing.

The most predictive GO biological processes [Fig. 3(e)] were also associated with angiogenesis, epithelial mesenchymal transition, and hypoxia from the Hallmark gene sets. Of the gene sets considered in gene masking, negative regulation of DNA-binding transcription factor activity from GO had the best overall testing performance with the highest micro-averaged AUC of 0.79; the individual test AUCs were 0.93 in ADC, 0.91 in SCC, and 0.75 in other. The aforementioned gene set consisted of 170 genes, where 156 (91.2%) were in the gene expression profile. A summary of gene expression patterns is given in Table 3. Notably, KRAS is a major gene studied in NSCLC, and the KRAS Hallmark was found to be predictive of ACC and SCC.

Table 3

Summary of predictive gene expression patterns in NSCLC histology.

HistologyTranscriptomic patternTest
ThemeaGene setSourcebAUCAP
ADCSynthesisPhosphatidylcholine biosynthetic processG0.940.93
Phosphatidic acid biosynthetic processG0.940.86
TranscriptionNeg. regulation of dna-binding transcription factor activityG0.930.92
VasculatureNeg. regulation of sprouting angiogenesisG0.930.92
Cell developmentRetina development in camera-type eyeG0.930.87
Cell deathHypoxiaH0.910.88
KRASKRAS signaling downH0.910.84
UVUV response downH0.920.90
E2FE2F targetsH0.920.87
Squamous cellCell developmentMetencephalon developmentG0.950.90
CarcinomaDifferentiationEpidermal cell differentiationG0.940.94
Neuron fate commitmentG0.920.87
CatabolismNeg. regulation of cellular catabolic processG0.940.88
Cell deathCornificationG0.940.93
KRASKRAS signaling downH0.930.93
HormoneEstrogen response lateH0.920.91
CholesterolCholesterol homeostasisH0.920.83
OtherTransportPosttranslational protein targeting to membrane, translocationG0.860.35
Regulation of sodium ion transmembrane transporter activityG0.840.54
AMPA receptorRegulation of AMPA receptor activityG0.850.44
Cell cycleMitotic nuclear envelope reassemblyG0.840.50
UbiquitinationNeg. regulation of protein K63-linked ubiquitinationG0.840.38
Immune systemInflammatory responseH0.730.37

aGene sets were categorized by comparing the top five gene sets in GO and Hallmark with at least 0.70 test AUC.

bfrom MSigDB v7.0, where H = Hallmark, G = Gene Ontology, neg.= negative.


Predicting Radiomic Features from the Transcriptome


Overall performance

Neural networks were overall better at classifying radiomic features than all other models within the training dataset [Fig. 4(a)]. The only exceptions were in gradient boosted trees that had better performance in four radiomic features (differences below 0.026 AUC) and random forest in one radiomic feature (0.012 AUC difference). In testing, neural networks had 0.42 to 0.89 AUC, 0.45 to 0.94 accuracy, and 0.09 to 0.98 AP across all radiomic feature classifications. A subset of 13 radiomic features had at least 0.70 test AUC and was subsequently selected for interpretation. Figure 4(b) shows the neural network’s generalizability to classify the aforementioned 13 radiomic features in the testing dataset.

Fig. 4

Radiogenomic modeling performance (a) between neural networks and other models in the training dataset. Neural network performance (b) in the training and testing datasets for the 13 radiomic features selected for further analysis. Train scores represent the averaged scores of the validation folds during 10-fold CV in the training dataset. The test scores are the model’s performance in the testing dataset after models were retrained on the full training dataset.



Gene masking identifies gene sets that predict radiomic features

Figure 5 shows the top GO gene sets associated with predicting each radiomic feature. Overall, the results of gene masking suggest that the prediction of each radiomic feature was associated with a unique gene expression profile driven by different biological processes. None of the radiomic features had similar scores across all gene sets. Some gene sets were better for predicting one radiomic feature but not another. For example, the top two gene sets for predicting an imaging texture, RLGL_longRunHighGrayLevEmpha, were regulation of syncytium formation by plasma membrane fusion and pyrimidine nucleotide salvage, which had test AUCs above 0.75, but for all other 12 radiomic features these two gene sets were below 0.70 AUC and 0.65 AUC, respectively. Conversely, there were gene sets that could predict multiple radiomic features at once. For example, response to tumor necrosis factor (TNF) was predictive of two other imaging textures, GLCM_entrop2 (0.78 AUC, 0.81 AP) and RLGL_runPercentage (0.76 AUC, 0.76 AP), in testing. Hallmark gene sets were also applied to the radiomic models in gene masking analysis but were not as predictive as the GO gene sets.

Fig. 5

Gene masking of the radiogenomics models with biological processes from GO. The top three gene sets ranked by test AUC for each radiomic feature are shown.


Radiogenomic associations were summarized for radiomic features related to histogram statistics of the tumor mask, the transformation of the mask (LoG features), and textures of tumors in Table 4. For example, the three gene sets that were most predictive of LoG_stats_std were related to post-translational protein modification, epidermis development, and DNA repair. Processes involving the immune system and cardiac system were the top predictors for several radiomic features. Many gene sets were related to cell development, varying from muscle, liver, epidermis, fat cell, and renal gene sets. AKT signaling, a targeted pathway in NSCLC therapy,20 was moderately predictive of RLGL_longRunHighGrayLevEmpha with 0.76 AUC but had an AP of 0.54 in testing. TNF was associated with RLGL_runPercentage and GLCM_entrop2. Rho signaling, associated with tumor suppressor activity and another targeted pathway in NSCLC,21,22 was associated with GLCM_diffEntro (0.75 AUC, 0.78 AP) and GLCM_invDiffmomnor (0.79 AUC, 0.55 AP). While GLCM_invDiffmomnor and invDiffnorm were correlated and clustered together, the two radiomic features had differing gene sets.

Table 4

Summary of predictive gene expression patterns in NSCLC radiomic features.

Radiomic featureTranscriptomic patterntest
ThemeaGene set (from GO)AUCAP
Stats_skewnessCytoskeletonReg. of actin filament-based process0.820.82
AdhesionNeg. Reg. off cell adhesion0.810.81
Neg. Reg. of cell-cell adhesion0.800.77
Immune systemReg. of hemopoiesis0.810.78
Reg. of leukocyte differentiation0.800.77
Stats_rmsTransportReg. of release of sequestered calcium ion into cytosol0.950.65
Sequestering of calcium ion0.930.30
DevelopmentMuscle organ development0.930.46
striated muscle cell differentiation0.930.32
Actin filament-based movement0.920.26
LoG_stats_stdPost-translationalPost-translational protein modification1.001.00
DevelopmentEpidermis development; epidermal cell differentiation1.001.00
DNA repair26-cm DNA double-strand break processing involved in repair via single-strand annealing0.990.79
Cell cycleneg. reg. of mitotic cell cycle0.990.79
LoG_stats_uniformityCell developmentLiver regeneration0.780.64
Epithelial tube morphogenesis0.770.49
TransportProtein transmembrane transport0.770.64
Intracellular protein transmembrane transport0.760.61
CatabolismOrganic acid catabolic process0.750.54
LoG_stats_entropyLocalizationEstablishment of organelle localization0.820.47
Pos. reg. of protein localization to membrane0.790.60
Heart rateReg. of heart rate by cardiac conduction0.770.40
Cell mobilityReg. of actin filament-based process0.770.46
External stimulusCellular response to mechanical stimulus0.770.44
LoG_stats_kurtosisConnective tissueElastin metabolic process0.730.84
Collagen metabolic process0.720.85
SynthesisPos. reg. of receptor biosynthetic process0.730.81
Pos. reg. of hormone biosynthetic process0.730.84
Immune systemResponse-regulating cell surface receptor signaling pathway0.720.78
GLCM_diffEntroMuscleMuscle fiber development0.760.82
Muscle cell differentiation0.750.79
Cardiac ventricle formation0.760.77
BacteriaResponse to molecule of bacterial origin0.760.73
RhoRho protein signal transduction0.750.78
GLCM_invDiffnormCell developmentFat cell differentiation0.830.63
Neg. reg. of cell development0.800.60
Cell respirationReg. of aerobic respiration0.810.64
Immune systemNeg. reg. of lymphocyte activation0.800.63
Nervous systemNeuromuscular process controlling balance0.790.70
GLCM_invDiffmomnorImmune systemReg. of osteoclast differentiation0.810.58
Osteoclast differentiation0.800.58
HomeostasisMulticellular organismal homeostasis(G)0.800.61
Tissue homeostasis0.790.57
RhoReg. of Rho protein signal transduction0.790.55
GLCM_entrop2TNFResponse to TNF0.780.81
MuscleMuscle cell development0.770.80
Ventricular septum morphogenesis0.770.77
Striated muscle cell differentiation0.760.78
Drug responseResponse to xenobiotic stimulus0.760.84
RLGL_shortRunEmphasisLocalizationPos. regulation of establishment of protein localization0.790.77
CatabolismLysine catabolic process0.790.80
Cell deathPos. reg. of autophagy of mitochondrion0.770.78
HormonePos. reg. of insulin secretion0.770.77
Cell mobilityNeuron projection guidance0.750.71
RLGL_longRunHighGrayLevEmphaSyncytiumSyncytium formation0.780.54
Reg. of syncytium formation by plasma membrane fusion0.770.47
SynthesisPyrimidine nucleotide salvage0.760.50
AKTProtein kinase B signaling0.760.54
Renal systemRenal sodium excretion0.760.44
RLGL_runPercentageMuscleSmooth muscle tissue development0.760.75
TNFResponse to TNF0.760.76
TNF-mediated signaling pathway0.750.76
Immune systemMyeloid leukocyte differentiation0.750.76
Renal systemMetanephros development0.740.80

aShown are the top five GO gene sets ranked by AUC and with at least 0.50 AP; reg.= regulation; pos. = positive; neg. = negative.

To compare the results of our neural network with prior reported associations, we performed an analysis using radiogenomic modules. Radiogenomic modules, defined as a set of correlated radiomic features and gene expressions, were previously defined as part of the original study.12 Not all radiomic features were reported in the original study’s radiogenomic modules. In this paper, the radiomic models were masked with the same Reactome gene sets as Grossmann et al.12 using MSigDB v4.0. Table 5 summarizes the overlapping radiogenomic associations found in this study compared with the aforementioned work. The highest agreement was between LoG_stats_entropy and module 13, where three of the pathways in the module were also among the top ten most predictive gene sets in our radiogenomic model. Other comparisons did not have overlapping associations. For example, the authors reported a radiogenomic association between GLCM_diffEntro and the five pathways in module 2, while we found the most predictive pathway in module 2 was ranked 197 out of the 664 Reactome pathways used in gene masking. Thus, our model suggested that many other pathways were more predictive of the radiomic feature than the five pathways in module 2.

Table 5

A comparison of the learned radiogenomic associations extracted from our neural networks and the modules previously identified in the same dataset.12 Each module consisted of a set of Reactome pathways and a set of image features. Shown are the modules that included the radiomic features used in this study. If any module’s set of pathways was ranked among the top 100 in gene masking, the top three pathways were listed.

Radiomic traitReactome pathwayThis studyGrossmann et al.12
Test AUCRankaModule# Pathways
GLCM_diffEntroCross presentation of soluble exogenous antigens endosomes0.5819725
Phase II conjugation0.6991235
Regulation of mitotic cell cycle0.6621
ABCA transporters in lipid homeostasis0.6528
LoG_stats_stdRegulation of ornithine decarboxylase0.941525
Cross presentation of soluble exogenous antigens endosomes0.8289
Antigen processing cross presentation0.80106
Cholesterol biosynthesis0.7515067
Signaling by TGF beta receptor complex0.8381817
Elongation arrest and recovery0.8292
mRNA splicing0.75142
LoG_stats_entropyElongation arrest and recovery0.73178
Mitochondrial protein import0.6374
RNA pol III chain elongation0.58180
Elongation arrest and recovery0.7311326
RNA pol II pre transcription events0.697
Formation of RNA pol II elongation complex0.699
Stats_skewnessAntigen processing cross presentation0.6213925

aAll Reactome pathways were ranked by test AUC.



We demonstrated the ability of deep neural networks to learn associations between radiomic features or clinical traits and gene expression using two NSCLC cohorts. A relatively large training dataset of 262 patients was available. An independent test dataset of 89 patients allowed us to validate the generalizability of our neural network models. We showed that neural networks outperformed other machine learning models. While the overall test AUC was mixed across all 101 radiomic features (test AUC of 0.42–0.89), the thirteen radiomic features selected for gene masking and histology had an average test AUC above 0.70. We interpreted the models using gene masking and identified specific sets of gene expressions that were indicative of a trait or feature. Together, these results suggest that potential biological associations exist to explain the differences among histology classes and CT imaging characteristics of NSCLC patients.

A number of radiomic and radiogenomic studies have been performed with NSCLC patients.10,12,13,2326 A recent study used the same dataset of 89 patients to train models to predict immune cell gene signatures of NSCLC tumors using CT radiomic features.27 Our model attempted to learn associations between high dimensional gene expression profiles and radiomic features. While deep neural networks were used to map CT image patches to tumor gene expression in Li et al.,10 the study did not report specific radiogenomic associations. Most related to our study was Grossmann et al.,12 which provided the source datasets. They used Gene Set Enrichment Analysis and the Iterative Signature Algorithm, a correlation and biclustering method, to define radiogenomic modules by grouping radiomic features with Reactome gene sets (i.e., pathways). The top ten most predictive Reactome pathways for a radiomic feature in our models overlapped with the radiogenomic modules defined by Grossmann et al.,12 but overall agreement was low. Differences in radiogenomic associations between our work and Grossmann et al.12 may relate to differences in methodology. Our study assessed the entire gene expression profile, whereas their study assessed correlations between radiomic features and subsets of genes. Moreover, the number of associations to be found was not predefined in our study, while Grossmann et al.12 specifically assessed 20 radiogenomic modules. Other studies12,24 have shown that their selected radiogenomic associations can differentiate between patients with longer versus shorter survival. We leave survival analysis using the radiogenomic associations found in our neural network models as future work.

We further explored the ability of neural networks to map transcriptomes to other relevant patient image features by training models to predict stage and histology. While neural networks were better at predicting stage and histology in the training dataset compared with other classifiers (based on scores from 10-fold CV), stage was poorly estimated in the testing dataset. However, the histology neural network has 0.77 test AUC when averaged across each histology type.

Several prior works for predicting NSCLC histology using radiomics have been based on differentiating ADC from SCC. For example, using five radiomic features, a study that trained a Naive Bayes model to distinguish ADC from SCC achieved a 0.72 AUC in a test set of 152 patients.28 In another similar study, a radiomic signature was able to distinguish ADC from SCC using a logistic regression model and 129 patients; the authors reported a 0.893 AUC in a test subset of 48 patients.29 More recent work reported logistic models that achieved 0.694, 0.780, 0.800, and 0.923 AUC for clinical, standard CT features, radiomics, or all three, respectively, to predict ADC versus SCC.30 The AUC scores were observed in a cohort of 93 patients but are likely overly optimistic given that there was no validation or test set to evaluate their models. In contrast, our neural network model used gene expression profiles to predict histology. In a test set of 89 patients, our model achieved a 0.86 test AUC when estimating ADC versus all other (SCC and other) histology types and a 0.91 test AUC when estimating SCC versus all other types (ADC and other). One notable study analyzed the association between CT-derived radiomic features and digital pathology-derived pathomic features to differentiate between two NSCLC subtypes.31

Additionally, there is a difference in our models’ performance when predicting histology compared with stage. The difference is likely because staging is based on factors such as tumor size, location, and spread to lymph nodes or metastasis in other sites, which is information that may not be readily seen in the gene expression of the collected tumor tissue. On the other hand, histology classification is based on the molecular and physical characterization of the collected sample, which is likely more related to the transcriptomic profiling of the tumor. Subsequently, we extracted the learned associations between gene expression and histology types with gene masking and compared with two previous studies that report gene expression signatures to predict NSCLC histology.17,18 The studies’ gene signatures were predictive in our models as well, indicating that our neural networks found similar associations as reported in prior work. In our model, other gene sets such as hypoxia (200 genes) and angiogenesis (36 genes) Hallmark gene sets and the neuronal (156 genes) GO gene set, are associated with histology and may have the potential to help automate or standardize the assignment of histology in the future.

This study has several notable limitations. The retrospective datasets used in this analysis were from two different sources with varying imaging protocols, which can add variance to radiomic feature values. A majority of patients were early-stage cancers, and thus late-stage patients were left out of the radiogenomic analysis. While sample size is an inherent concern of radiogenomic analysis, datasets with paired imaging and genomic data are difficult to obtain and limited in sample size. Interpreting the biological significance of radiomic features is challenging, and researchers are currently attempting to understand their correlation with tumor biology and other clinical traits. Radiomic features were binarized using clustering, and radiomic features with a minority class below 10% were removed. The tumor tissue samples used for transcriptome profiling are limited in that only one sample was acquired per patient, which may not fully capture tumor heterogeneity. These factors make it challenging to validate the relationships in radiogenomic models. The reported findings should be interpreted as possible associations and require further clinical or animal studies to validate.

In future work, the issues may be addressed by the standardization of imaging protocols to allow for consistent comparison of image features and maps such as genomic atlases to better characterize whole tumors. A larger sample size could result in a more complete representation of the general population and allow for modeling of radiomic features as continuous outputs. The ranked gene sets are interpreted based on GO descriptions. A more quantitative approach to compare the ranked gene sets between radiomic features, such as semantic similarity of GO terms,32 and a sensitivity analysis of resultant radiogenomic associations are needed. In addition to the transcriptome, there are likely other contributing factors to tumor image features, such as other molecular data (e.g., gene mutations and methylation) and patient covariates (e.g., smoking status). These factors could either be incorporated into the modeling process or used to stratify analyses. Additionally, the impact of knowing such radiogenomic associations at the time of tumor biopsy in relation to survival or treatment projection would be highly beneficial.



In this study, we present deep neural networks for mapping gene expressions to radiomic features or clinical traits in patients with non-small cell lung cancer. Our models are evaluated using public datasets. Neural networks are capable of modeling high-dimensional gene expression to predict tumor image features and lung cancer histology. We further interpret the models through gene masking and report the learned relationships between gene expression and a radiomic feature or histology type. We find that the network is capable of replicating previously reported associations while identifying new associations. The reported associations could be further studied to improve the automated classification of histology, predict specific gene expression profiles of patients presenting with an observable imaging phenotype, and develop a knowledge base of associations between imaging phenotypes to gene expression profiles that would be useful in informing individualized treatment planning.


The authors declare no potential conflicts of interest.


This work is supported in part by the National Institutes of Health [F31CA221061 and T32EB016640 to N.F.S., U01CA196408 and R01CA210360 to D.R.A., W.H.], the National Science Foundation [#1722516 to W.H.], and the Department of Radiological Sciences through the Integrated Diagnostics (IDx) Shared Resource. Computational credit for use of Amazon Web Services was awarded through a partnership with the UCLA Department of Computational Medicine.

Code Availability

Source code for the analysis described in this paper will be made available at



E. Segal et al., “Decoding global gene expression programs in liver cancer by noninvasive imaging,” Nat. Biotechnol., 25 675 –680 (2007). NABIF9 1087-0156 Google Scholar


M. Diehn et al., “Identification of noninvasive imaging surrogates for brain tumor gene-expression modules,” Proc. Natl. Acad. Sci. U. S. A., 105 5213 –5218 (2008). Google Scholar


P. O. Zinn et al., “Radiogenomic mapping of edema/cellular invasion MRI-phenotypes in glioblastoma multiforme,” PLoS One, 6 e25451 (2011). POLNCL 1932-6203 Google Scholar


R. R. Colen et al., “Imaging genomic mapping of an invasive MRI phenotype predicts patient outcome and metabolic dysfunction: a TCGA glioma phenotype research group project,” BMC Med. Genom., 7 30 (2014). Google Scholar


K. Chang et al., “Residual convolutional neural network for the determination of IDH status in low- and high-grade gliomas from MR imaging,” Clin. Cancer Res., 24 1073 –1081 (2018). Google Scholar


R. Ha et al., “Predicting breast cancer molecular subtype with MRI dataset utilizing convolutional neural network algorithm,” J. Digital Imaging, 32 276 –282 (2019). JDIMEW Google Scholar


H. X. Bai et al., “Imaging genomics in cancer research: limitations and promises,” Br. J. Radiol., 89 20151030 (2016). BJRAAP 0007-1285 Google Scholar


M. Avanzo et al., “Machine and deep learning methods for radiomics,” Med. Phys., 47 (5), e185 –e202 (2020). MPHYA6 0094-2405 Google Scholar


P. Korfiatis et al., “Residual deep convolutional neural network predicts MGMT methylation status,” J. Digital Imaging, 30 622 –628 (2017). JDIMEW Google Scholar


S. Li et al., “A novel radiogenomics framework for genomic and image feature correlation using deep learning,” in IEEE Int. Conf. Bioinf. Biomed., 1 –8 (2018). Google Scholar


N. F. Smedley, S. El-Saden and W. Hsu, “Discovering and interpreting transcriptomic drivers of imaging traits using neural networks,” Bioinformatics, 36 (11), 3537 –3548 (2020). Google Scholar


P. Grossmann, “Defining the biological basis of radiomic phenotypes in lung cancer,” Elife, 6 e23421 (2017). Google Scholar


H. J. Aerts et al., “Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach,” Nat. Commun., 5 4006 (2014). NCAOBW 2041-1723 Google Scholar


E. R. Velazquez et al., “A semiautomatic CT-based ensemble segmentation of lung tumors: comparison with oncologists’ delineations and with the surgical specimen,” Radiother. Oncol., 105 (2), 167 –173 (2012). RAONDT 0167-8140 Google Scholar


M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” in Eur. Conf. Comput. Vision, 818 –833 (2014). Google Scholar


A. Liberzon et al., “Molecular signatures database (MSigDB) 3.0,” Bioinformatics, 27 1739 –1740 (2011). BOINFP 1367-4803 Google Scholar


L. Girard et al., “An expression signature as an aid to the histologic classification of non-small cell lung cancer,” Clin. Cancer Res., 22 (19), 4880 –4889 (2016). Google Scholar


J. Hou et al., “Gene expression-based classification of non-small cell lung carcinomas and survival prediction,” PLoS One, 5 e10312 (2010). POLNCL 1932-6203 Google Scholar


A. Liberzon et al., “The molecular signatures database hallmark gene set collection,” Cell Syst., 1 417 –425 (2015). Google Scholar


S. Heavey, K. J. O’Byrne and K. Gately, “Strategies for co-targeting the PI3K/AKT/mTOR pathway in NSCLC,” Cancer Treat. Rev., 40 445 –456 (2014). CTREDJ 0305-7372 Google Scholar


B. Zhang, “Rho GDP dissociation inhibitors as potential targets for anticancer treatment,” Drug Resist. Updates, 9 134 –141 (2006). Google Scholar


M. O’Hayre, M. S. Degese and J. S. Gutkind, “Novel insights into G protein and G protein-coupled receptor signaling in cancer,” Curr. Opin. Cell Biol., 27 126 –135 (2014). COCBE3 0955-0674 Google Scholar


S. Yamamoto et al., “ALK molecular phenotype in non-small cell lung cancer: CT radiogenomic characterization,” Radiology, 272 568 –576 (2014). RADLAX 0033-8419 Google Scholar


H. J. Aerts et al., “Defining a radiomic response phenotype: a pilot study using targeted therapy in NSCLC,” Sci. Rep., 6 33860 (2016). SRCEC3 2045-2322 Google Scholar


S. Rizzo et al., “CT radiogenomic characterization of EGFR, K-RAS, and ALK mutations in non-small cell lung cancer,” Eur. Radiol., 26 32 –42 (2016). Google Scholar


O. Gevaert et al., “Predictive radiogenomics modeling of EGFR mutation status in lung cancer,” Sci. Rep., 7 41674 (2017). SRCEC3 2045-2322 Google Scholar


H. J. Yoon et al., “Deciphering the tumor microenvironment through radiomics in non-small cell lung cancer: correlation with immune profiles,” PLoS One, 15 e0231227 (2020). POLNCL 1932-6203 Google Scholar


W. Wu et al., “Exploratory study to identify radiomics classifiers for lung cancer histology,” Front. Oncol., 6 71 (2016). FRTOA7 0071-9676 Google Scholar


X. Zhu et al., “Radiomic signature as a diagnostic factor for histologic subtype classification of non-small cell lung cancer,” Eur. Radiol., 28 2772 –2778 (2018). Google Scholar


S. R. Digumarthy et al., “Can CT radiomic analysis in NSCLC predict histology and EGFR mutation status?,” Medicine (Baltimore), 98 e13963 (2019). Google Scholar


C. Alvarez-Jimenez et al., “Identifying cross-scale associations between radiomic and pathomic signatures of non-small cell lung cancer subtypes: preliminary results,” Cancers, 12 (12), (2020). Google Scholar


G. Yu et al., “GOSemSim: an R package for measuring semantic similarity among GO terms and gene products,” Bioinformatics, 26 976 –978 (2010). BOINFP 1367-4803 Google Scholar


Nova F. Smedley was a doctoral student in the Medical & Imaging Informatics group within the Department of Bioengineering at UCLA. Under the supervision of Dr. Hsu, her research focused on using machine learning approaches for improving the radiogenomic and prognostic analyses of cancer patients. She was a recipient of a National Institutes of Health fellowship, the Ruth L. Kirschstein Predoctoral Individual National Research Service Award.

Denise R. Aberle is professor and vice-chair of radiological sciences in the David Geffen School of Medicine and professor of bioengineering in the Samueli School of Engineering. She is board-certified in internal medicine and diagnostic radiology. She was the principal investigator of the American College of Radiology Imaging Network component of the National Lung Screening Trial ACRIN-NLST. Her research centers on lung cancer screening, early diagnosis, and prevention and screening implementation.

William Hsu is associate professor of radiological sciences and bioinformatics at the David Geffen School of Medicine at UCLA and associate professor of bioengineering in the Samueli School of Engineering. His research interests include multimodal data integration, machine learning, and imaging informatics. He directs the Integrated Diagnostics Shared Resource, which collects spatially registered radiology–pathology images along with clinical and molecular data to improve cancer screening and diagnosis.

© 2021 Society of Photo-Optical Instrumentation Engineers (SPIE)
Nova F. Smedley, Denise R. Aberle, and William Hsu "Using deep neural networks and interpretability methods to identify gene expression patterns that predict radiomic features and histology in non-small cell lung cancer," Journal of Medical Imaging 8(3), 031906 (8 May 2021).
Received: 21 August 2020; Accepted: 13 April 2021; Published: 8 May 2021 Logo
Cited by 9 scholarly publications.
Get copyright permission  Get copyright permission on Copyright Marketplace
Neural networks

Data modeling


Tumor growth modeling

Signal processing

Performance modeling


Back to Top