External validation of a mammographic texture marker for breast cancer risk in a case–control study

Abstract. Purpose: The pattern of dense tissue on a mammogram appears to provide additional information than overall density for risk assessment, but there has been little consistency in measures of texture identified. The purpose of this study is thus to validate a mammographic texture feature developed from a previous study in a new setting. Approach: A case–control study (316 invasive cases and 1339 controls) of women in Virginia, USA was used to validate a mammographic texture feature (MMTEXT) derived in a independent previous study. Analysis of predictive ability was adjusted for age, demographic factors, questionnaire risk factors (combined through the Tyrer-Cuzick model), and optionally BI-RADS breast density. Odds ratios per interquartile range (IQ-OR) in controls were estimated. Subgroup analysis assessed heterogeneity by mode of cancer detection (94 not detected by mammography). Results: MMTEXT was not a significant risk factor at 0.05 level after adjusting for classical risk factors (IQ-OR=1.16, 95%CI 0.92 to 1.46), nor after further adjustment for BI-RADS density (IQ-OR=0.92, 95%CI 0.76 to 1.10). There was weak evidence that MMTEXT was more predictive for cancers that were not detected by mammography (unadjusted for density: IQ-OR=1.46, 95%CI 0.99 to 2.15 versus 1.03, 95%CI 0.79 to 1.35, Phet 0.10; adjusted for density: IQ-OR=1.11, 95%CI 0.70 to 1.77 versus 0.76, 95%CI 0.55 to 1.05, Phet 0.21). Conclusions: MMTEXT is unlikely to be a useful imaging marker for invasive breast cancer risk assessment in women attending mammography screening. Future studies may benefit from a larger sample size to confirm this as well as developing and validating other measures of risk. This negative finding demonstrates the importance of external validation.


Introduction
high risk who would be potential candidates for risk-reducing surgery or preventive therapy; 4 delineation of populations at moderately enhanced risk who might benefit from enhanced screening; 5 and more recently, identification of populations at sufficiently low risk as not to require screening or risk management. 6 Breast cancer has a relatively well established hormonal aetiology, in addition to a growing body of knowledge on genetic risk factors. 7,8 Although existing risk models have shown a degree of accuracy in prediction [the area under the receiver operating characteristic curve (AUC) ranges around 0.56 to 0.77 for different predictors], it is clear that there is room for improvement. 9,10 One area that offers hope for improved risk assessment is utilization of digital mammographic image features. Mammographic density, which is broadly defined as the amount of radio-opaque tissue, is well known as an important independent risk factor for breast cancer. [11][12][13] Some previous research has tried to improve mammographic density risk assessment by looking at other image features of a mammogram, and computational advances in machine learning are starting to spur more work. [14][15][16][17] A limitation with much of the literature looking at textural or other features from mammograms has been reproducibility. 18 This study was performed to validate a previously developed texture marker as a breast cancer risk feature by testing its use in an independent case-control dataset. 19 The texture marker measures the dispersion of breast density within the mammogram. It was previously found to be associated with breast cancer risk in a case-control study of women in Manchester, UK [odds ratio per standard deviation ðSD-ORÞ ¼ 1.36], where a subgroup analysis suggested it was most predictive for interval cancers that were detected between routine screening rounds (SD-OR ¼ 2.09). 19 We refer to the marker as MMTEXT for the rest of this paper. Our primary objective was to assess whether MMTEXT is a risk factor for breast cancer in Virginia, USA, after allowing for other classical risk factors from a questionnaire, and for BI-RADS breast density. The prespecified hypothesis was MMTEXT is a risk factor after adjusting for classical risk factors.

Study Design
All women, 18 to 89 years of age, diagnosed with breast cancer for the first time at the University of Virginia (UVa) between 2003 and 2013 who had a digital contralateral mammogram at the time of diagnosis were eligible as cases. Case status (invasive breast cancer) was confirmed through chart review. The average time from mammogram to diagnosis of breast cancer is 6 months. All women without a breast cancer diagnosis but identified as having a digital mammogram at UVa during 2003 to 2008 (the more recent being at most 5 years prior to completing the questionnaire, and also one at least 5 years before the questionnaire) were eligible as controls. To ensure a similar age distribution controls were selected based on frequency matching of current age. Risk factor information at the time of questionnaire was retrospectively collected for cases or controls between May 2012 and December 2013, using a self-reported electronic questionnaire that was administered in breast imaging, breast surgery clinic, or medical oncology clinic as previously described. 20 Women who were eligible as cases but not seen at UVa in more than two years from initiation of patient recruitment were sent a letter for either survey completion by mail or Internet through an electronic token. Women were excluded if they had breast augmentation, prior contralateral mastectomy, or bilateral breast cancer at the time of initial diagnosis as these may affect breast density measurement.
UVa is a public institution that provides reduced fee health care based on need, such that women with greater burden of disease and low resources are frequently referred for care. Thus some differences between cases and controls were expected because controls would mostly include women attending regular screening provided by a health plan, but cases might not. As a result we included several demographic factors for inclusion as adjustments in the analysis. These were the concentric geographical area surrounding UVa, health insurance, whether the woman had been assessed for financial assistance, ethnicity, education, and body mass index (BMI); age in 5-year groups was also adjusted following the study design. Classic hormonal and reproductive risk factors from the questionnaire were combined for adjustment using 10-year risk from the Tyrer-Cuzick (version 7.02; note this version does not incorporate breast density). 3 Only women aged 40 to 79 years at mammogram were included in order to reflect risk assessment for women attending screening.
Full field digital mammograms ("for processing") DICOM files from Senographe 2000D, Senographe DS, and Senographe Essential (GE Healthcare, Chicago, Illinois) and Lorad Selenia and Selenia Dimensions (Hologic, Marlborough, Massachusetts) machines were retrieved. Approximately 80% were from GE machines, and data from Hologic were excluded to assess validation of MMTEXT. The reason is that Hologic machines were not used to train MMTEXT, and MMTEXT was higher in Hologic machine (readers are referred to Sec. 5 for regression analysis results showing the impact of different machines). The native resolution of these was 100 μm for the Senographe systems. MMTEXT only uses cranial caudal (CC) views. BI-RADS density category was obtained from clinical records, which were based on the fourth edition lexicon due to the time frame of the study population.
This case-control study was approved by the institutional review boards at the University of Virginia and Sunnybrook Research Institute. The study was compliant with the Health Insurance Portability and Accountability Act. Patients participating on site gave written consent. Patients participating remotely through electronic media were granted waiver of consent.

Risk Marker
MMTEXT was calculated as previously described. 19 Briefly, image resolution was first downsized by three factors (16,32, and 64 using images at the same resolution as previously 19 ) leading to three new images for which each pixel had a much larger physical area than the original (respectively, 16 2 ¼ 256, 1024, and 4096 times greater), and with an "average" intensity in that area. To ensure the resulting images are comparable with potential images with different resolutions, all images were downsized to the same target resolutions as previously. 19 Pixel intensities within the breast were standardized by histogram equalization into 10 bins so that the darkest 10% of all pixels were in bin 1 and the whitest 10% of pixels are in bin 10. A co-occurrence matrix was obtained to give the proportion p k ði; jÞ of pixel bin i ¼ 1; : : : ; 10 next to pixel bin j ¼ 1; : : : ; 10 (in all eight directions) for downsize factor k ¼ 1;2; 3 ( P i P j p k ði; jÞ ¼ 1). MMTEXT was calculated as a weighted summation of the so-called "sum average" t k ¼ P 10 i¼1 P 10 j¼1 ði þ jÞp k ði; jÞ for downsize factors k ¼ 1, 2, 3. On a standardized scale, where the mean and standard deviation of t k are, respectively, zero and unity for each downsize factor k ¼ 1, 2, 3, the weights were 30%, 25%, and 45%, respectively, as earlier. 19 Code to extract MMTEXT from digital mammograms was written by CW using MATLAB software. 21 Only JM had access to the mammograms for this study and was blinded to casecontrol status. JM provided the mammographic texture risk score to ARB for analysis, and a list of mammograms to exclude on the basis of automated quality-control software for the images (e.g., to remove mammograms with a nonstandard view, spot compression).

Statistical Methods
The mean value of MMTEXT from left and right CC views was used for controls, but only the contralateral breast was used for cases to limit bias from a dense area due to cancer. MMTEXT was standardized to unit standard deviation and zero mean in controls. It was assessed as a risk factor after adjustment for differences between cases and controls due to demographic factors, age at mammogram, BMI, estimated risk from the Tyrer-Cuzick model (version 7.02), 3,22 and with or without adjustment for BI-RADS breast density.
Ten-year Tyrer-Cuzick risk was calculated using age at the mammogram, and age at menopause data input was updated accordingly. Other factors in the model were entered following the questionnaire. The only variables that were not included in the Tyrer-Cuzick risk assessment were prior benign breast disease and hormone replacement therapy use, because they were not available.
Spearman correlation was calculated in controls between MMTEXT and standard prognostic variables: age, BMI, 10-year Tyrer-Cuzick risk, and breast density. A generalized additive model was used to show trend lines for age and BMI using tensor splines in controls; 23 association between MMTEXT and BI-RADS density was inspected using boxplots.
ORs and likelihood-ratio χ 2 statistics for MMTEXT were obtained from a logistic regression model that was adjusted for demographic factors, the logarithm 10-year Tyrer-Cuzick risk and optionally BI-RADS density. To compare MMTEXT risk with categorical BI-RADS density, frequency matching of controls was applied, and likelihood-ratio χ 2 trend tests were used. Adjusted receiver operating characteristic curves based on the empirical distribution of errors from linear regression models for density and MMTEXT in controls were used to compute adjusted area under the curves (aAUC), 24 with nonparametric empirical bootstrap confidence intervals.
Heterogeneity of MMTEXT by age, BMI, Tyrer-Cuzick risk, and density was assessed using adjusted logistic regression interaction χ 2 tests. Heterogeneity of MMTEXT by mode of detection (mammography/unknown versus none) was tested using a logistic regression for cases only with specific mode of detection as the outcome, adjusted for demographic factors, Tyrer-Cuzick risk, and optionally BI-RADS density. This subgroup analysis was predefined and tested because of results from the development study. 19 The model fit and assumed linear effect of MMTEXT in the logistic regression was assessed using a generalized additive model with tensor splines for the predictor.
All analyses were undertaken using statistical software R 3.4.1, and with the boot and mgcv packages. [25][26][27] 3 Results The flow of patients is shown in a flow diagram (Fig. 1). Demographic differences were, as expected, apparent between cases and controls ( Table 1): cases were more likely to live further away from UVA than controls and be assessed for financial assistance; controls were more likely than cases to have a higher level of education, private health insurance, and be white. Cases were at a higher risk from classical risk factors (Tyrer-Cuzick risk) after adjustment for age and   Table 3). MMTEXT was negatively correlated with age at mammogram (Spearman correlation ρ ¼ −0.19) and BMI (ρ ¼ −0.37), with an overall nonlinear association shown by Fig. 2. MMTEXT had a small correlation with 10-year risk from the Tyrer-Cuzick model (Spearman ρ ¼ 0.07, P ¼ 0.007). MMTEXT was strongly positively associated with BI-RADS breast density (Fig. 3, Spearman ρ ¼ 0.67).
BI-RADS density was associated with close to a threefold difference in risk between the very dense and fatty categories [OR 2.97 (95%CI 1.58 to 5.57), LR − χ 2 1 15.2 (trend), aAUC 0.54 (95%CI 0.50 to 0.57)] after adjustment for demographic factors and classical risk factors (Table 2). A much less strong risk difference was observed for MMTEXT [OR 1.27

Discussion
This independent study aimed to validate MMTEXT as a risk factor for breast cancer and whether it provides additional information to classical risk factors for risk assessment. Unlike the earlier study, 19 the estimated IQ-OR ¼ 1.16 (0.92 to 1.46) is, however, not statistically significant at 0.05 level. After further adjustment for density the overall predictive  ability of MMTEXT was reduced, indicating that much of the information for risk assessment is related to breast density. There was weak evidence that MMTEXT was more predictive for cancers that were not detected mammography (IQ-OR 1.46, 95%CI 0.99 to 2.15); however, after adjustment for breast density, the predictive ability decreases (IQ-OR ¼ 1.11, 95%CI 0.70 to 1.77). This is the first study to assess external validity of MMTEXT and we were able to adjust for classical risk factors using a validated risk model, which is the most important test of a new biomarker. 28 Image quality was high because full-field digital mammography was used (not scanned film as much previous work 18 ) and comparable with the development study.
This study nevertheless differs from the earlier study 19 in a number of aspects, which might partly explain the different findings. The assessment used a different population than the development sample. A notable difference is that all the cancer cases in this study are invasive, and almost a quarter of cases used for variable selection and model training in the development study were ductal carcinoma in situ (DCIS). Invasive cancers differ from DCIS cancers mammographically, as invasive cancers are most often manifest as noncalcified masses so more subtle or occult compared to DCIS cancers. 29 It is possible that different composition types of cancers affected the results. The demographic factors between the two studies also differ: for example, the percentage of age over 70 years was around 7% in the development study but around 16% in this validation study; the percentage of non-white was also much higher for cases in this study (17% versus 8%).
Although this study failed to find evidence of predictive ability of MMTEXT, BI-RADS was a strong predictor for breast cancer (also in the wider cohort, see Ref. 30). It is arguable that (currently) there is no better measure of mammographic density than visual assessment from an expert, 31 and in our previous analysis of mammographic density for risk assessment we found that BI-RADS density conferred slightly more predictive information than a volumetric method on the same data. 30 Another cohort study has confirmed the association of BI-RADS density  with risk in this case-control study, when used in the combination with classic risk factors during a follow-up of 19 years. 20 As discussed above, our subgroup analysis indicate that there may be potential merit of MMTEXT for cancers that were not detected by mammography. If this can be supported by future study with larger sample, there are potential implications for future clinical value. If true, then potentially MMTEXT might have a role as a marker for risk of interval cancer due to masking from mammography. One area this would be useful is to determine eligibility for supplemental screening modalities, such as ultrasound or magnetic resonance imaging. Inspection of the mathematical formula shows that MMTEXT is associated with breast density because it is maximized when white areas of the image are surrounded by other white areas so that images with breast density widely dispersed on the mammogram will have greater MMTEXT values than those with smaller dense areas. A limitation of visual assessment of breast density is the time and expert resources required. In situations where these are important issues, the fully automatic, objective, and freely available MMTEXT might be considered. 21 MMTEXT currently requires raw mammographic images as it was developed on such images, which may limit its application. Although the same method 19 can also be trained on processed images, making it suitable for processed images, a potential barrier is that manufacturers' proprietary processing algorithms may result in images less comparable between different machines. It would nevertheless be interesting for a future study to apply the algorithm describe previously 19 and test it on processed images.
There are other limitations of the study include the following. First, controls differed from cases due to geography and other socioeconomic and demographic factors, and we needed to adjust for these differences as far as possible in the analysis. Second, self-reported BMI at the time of questionnaire was used, and we had no validation of the self-reported anthropomorphic measures. This is similar to the development study but is expected to have a minimal impact on the overall findings, because it has been seen elsewhere that self-reported measures are likely to be sufficiently accurate. 32 Third, there is a possible survivorship bias because some women died with breast cancer before the questionnaire was available. However, this is unlikely to lead to an overstatement the main findings and is more likely to weaken them, because on average the deceased cases will have been diagnosed at a more advanced stage than those alive and since density is associated with later diagnosis (masking), this bias might be expected to attenuate the predictive ability of MMTEXT. Fourth, it was not possible to include cases who did not respond to the request to complete a questionnaire (n ¼ 47 aged 40 to 79 years). However, if they had been available then the number of cases would only increase by 10%, and it seems unlikely that nonresponse is associated with mammographic density or MMTEXT other than through the factors adjusted for in the analysis such as age and demographics; this issue is also expected to have minimal impact on the main findings. Finally, both the UVa and earlier Manchester studies were predominantly white women and case-control designs. In conclusion, data from this study do not support risk assessment for invasive breast cancers using MMTEXT, a fully automatic digital mammographic texture risk factor based on raw ("for processing") DICOM files. This negative finding demonstrates the importance of external validation. Future studies may focus on developing and validating other measures of risk. MMTEXT may, however, have its potential for cancers not detected primarily due to mammography. Further studies are required to verify this, including longer-term effects in cohort studies.

Appendix
The regression analysis results after adjustment for different machines are shown in Tables 5 and  6 below. The tables include data from women with GE or Hologic machines; data from Hologic machines were excluded in the primary analysis (see Fig. 1).