Optical identification of subjects at high risk for developing breast cancer

Abstract. A time-domain multiwavelength (635 to 1060 nm) optical mammography was performed on 147 subjects with recent x-ray mammograms available, and average breast tissue composition (water, lipid, collagen, oxy- and deoxyhemoglobin) and scattering parameters (amplitude a and slope b) were estimated. Correlation was observed between optically derived parameters and mammographic density [Breast Imaging and Reporting Data System (BI-RADS) categories], which is a strong risk factor for breast cancer. A regression logistic model was obtained to best identify high-risk (BI-RADS 4) subjects, based on collagen content and scattering parameters. The model presents a total misclassification error of 12.3%, sensitivity of 69%, specificity of 94%, and simple kappa of 0.84, which compares favorably even with intraradiologist assignments of BI-RADS categories.

Breast cancer is a leading cause of death in women and a major health burden worldwide: one in eight women in the United States will be diagnosed with breast cancer in their lifetime. 1arly diagnosis (tumor size <1 cm, no lymph node involvement) is a key element for complete response in the treatment of breast cancer with a five-year survival in the range of 93 to 99%. 1 Breast density is a recognized strong and independent risk factor for breast cancer: high breast density involves a four to six times higher risk as compared to low density. 2,3Several U.S. states have already recognized the importance of knowing whether a subject has high breast density, enacting laws that require mammography providers to add such notification in the summary of mammography report.Including breast density into risk prediction models has improved their prediction accuracy.The U.S. Preventive Services Task Force has also suggested the possibility of chemoprevention for women at high risk. 4Thus, improved risk models could be used to better address not only closer screening of high-risk women but also prevention of breast cancer.
At present, breast density is assessed based on the radiological appearance of breast tissue (mammographic density).Thus it is known only at the first mammogram, typically at the age of 40 to 50, depending on the country. 5A tool for its noninvasive estimation would allow the early identification of high-risk women, enabling the design of personalized screening and diagnostic paths.Due to the high incidence of breast cancer and effectiveness of interventions performed at an early stage, any significant improvement in the diagnostic procedure (especially an earlier diagnosis) would have a strong impact on both the number of spared lives and the quality of life.
7][8][9] Also, extensive clinical trials showed that raw data on optical attenuation interpreted using principal component analysis strongly correlate with quantitative mammographic features. 10e have further exploited the potential of diffuse optical spectroscopy operating in the time domain to assess both tissue composition in terms of key constituents (water, lipids, collagen, and hemoglobin) and scattering parameters that are related to the overall structure of tissue at microscopic level and specifically to breast density. 11,12Besides the noninvasive assessment of breast density, 13,14 these pieces of information can contribute to a better understanding of the role of mammographic density in breast cancer risk and may even provide a more specific link than x-ray measures with breast cancer risk.
In this work, we propose the use of time-resolved transmittance spectroscopy to identify noninvasively high breast density subjects who are at high risk for developing breast cancer.
Our portable clinical instrument for time-resolved optical mammography operates in transmittance geometry on the mildly compressed breast.Time-resolved transmittance data are collected at seven red and near-infrared wavelengths (i.e., 635, 680, 785, 905, 930, 975, and 1060 nm), using picosecond pulsed diode lasers as light sources, and two photomultiplier tubes and personal computer boards for time-correlated single photon counting to detect the time distributions of the transmitted pulses.Injection and collection fibers are scanned in tandem over the compressed breast and data are stored every millimeter.Images are routinely acquired from both breasts in cranio-caudal and oblique (45 deg) views.Time-resolved spectral data are interpreted with the solution of the diffusion equation for an infinite homogeneous slab, using a spectrally constrained global fitting procedure to estimate tissue composition in terms of oxyand deoxyhemoglobin, water, lipid, and collagen content, as well as scattering parameters (amplitude a and power b). 15Moreover, for the detection of breast lesions, scattering maps are routinely applied, together with late gated intensity images that are sensitive to spatial changes in the absorption properties.Details on the instrument setup and performances, and on the procedures for data acquisition and analysis, are reported in Ref. 16.
The instrument is presently applied in a clinical study approved by the institutional review board of the European Institute of Oncology.The study has a twofold aim: the optical characterization of malignant and benign breast lesions and the noninvasive assessment of breast density.The present work focuses on the latter aim.Thus, for each subject, all data from the four images (cranio-caudal and oblique views of both breasts) were averaged to provide the average optical properties and breast tissue composition of that subject.Data were collected from 179 patients, recruited between June 2009 and June 2012.Written informed consent was obtained from all of them.For 32 subjects, recent x-ray mammograms were not available; thus they were excluded from further analysis.General patient information for the remaining 147 subjects is as follows: age 52.The dependence of tissue composition and scattering parameters on mammographic density, classified through BI-RADS categories, was investigated.The results essentially confirm what we have observed previously on a more limited number of subjects. 13Based on the Wilcoxon test, there is no statistically significant difference between BI-RADS categories 1 and 2 for any parameters but water, while the difference is highly significant for all parameters but oxygenation level (SO 2 ) in the case of BI-RADS 2 versus 3, and for all parameters but SO 2 and total hemoglobin content (tHb) in the case of BI-RADS 3 versus 4. Specifically, increasing breast density corresponds to progressively increasing average amounts of water and collagen, while the lipid content decreases gradually.An increase in BI-RADS categories is also observed in both scattering amplitude a and slope b, in agreement with differences in microscopic structures expected for fatty and fibroglandular tissue.The blood parameters (i.e., tHb and SO 2 ) are less sensitive, with only tHb showing a slight increase with mammographic density.
We have also investigated the cross-dependence between optically derived tissue parameters.The results obtained on the linear correlation are summarized in Table 1.The strongest (negative) correlation is observed between lipid and water content, but negative correlation is also evident between lipid and collagen content.Both observations are in agreement with what was expected based on breast tissue composition: moving from adipose to fibroglandular breasts, the amount of adipose tissue with high lipid content decreases and is replaced by connective and epithelial tissue, richer in water and collagen.Marked correlation also exists between the scattering amplitude a and the concentrations of all major tissue constituents.Specifically, the correlation is positive for water and collagen, while it is negative for lipid, consistent with the hypothesis that fibroglandular tissue, rich in water and collagen, is mainly responsible for breast tissue scattering.
To develop a procedure for the identification of high-risk women, the mammographic density was dichotomized, comparing subjects in BI-RADS categories 1 to 3 to subjects in category 4, the latter being at significantly higher risk than all the others. 2The p values of the Wilcoxon test showed that tHb, lipid, water, collagen, a, and b are significantly different in the two populations considered (at least p < 0.001), while SO 2 is not.The best regression logistic model for the risk probability chosen via a stepwise variables selection minimizing the Akaike information criterion resulted to be where p i is the probability of belonging to BI-RADS category 4 (high risk) for the i'th subject.Table 2 shows the output of the fitted regression logistic model (point estimates of the coefficients, related standard errors, z-statistics, and p values of testing their significance in the model).The Brier's score, i.e., the mean square difference between outcome and estimated probability, is equal to 0.095.Based on Eq. ( 1) and Table 2, the probability of belonging to the high-risk category depends on collagen concentration and on both scattering parameters.In particular, the strongest dependence occurs for the scattering slope b.Performing in vivo measurements, we have recently observed that high scattering slope corresponds to high collagen content and possibly depends also on its structure. 17Collagen content also shows a significant correlation with the scattering amplitude a, as highlighted in Table 1.Thus, both directly and indirectly, collagen seems to be the most crucial feature for the identification of subjects with high breast density.
The receiver operating characteristic curve for our model is reported in Fig. 1.We classify the subject as a high-risk patient if the estimated probability [Eq.(1)] is greater than 0.5.The corresponding misclassification matrix is reported in Table 3, where "true" refers to risk classification based on mammographic assessment (BI-RADS categories) and "classified" refers to risk as predicted based on logistic regression fitted on optical data.The data reported in Table 3 correspond to a total misclassification error of 12.3%, sensitivity of 69%, and specificity of 94%.A simple kappa of 0.84 is achieved, to be compared with the reproducibility of BI-RADS assignment among radiologists and even intraradiologists.Specifically, an intrarater agreement of 77% is reported in the literature, leading to a simple kappa of 0.58. 18Thus, the classification achieved by optical means appears to be very promising.
In summary, a logistic regression model was fitted to optically derived tissue parameters with the aim of identifying women at high risk for developing breast cancer because of their high breast density.Encouraging preliminary results were obtained, and collagen proved to be the key parameter for the classification, either directly (collagen content) or indirectly (through scattering amplitude and slope).The relevance of collagen is in agreement with what was expected based on breast anatomy and physiology, and opens up the possibility of a more direct estimate of breast density than presently achieved using x-ray mammography, which is mostly sensitive to water content and not directly to collagen.

Fig. 1
Fig.1Receiver operating characteristic curve for the prediction of high risk using Eq.(1).

Table 1
Correlation estimates (95% confidence intervals of Pearson association) of the optically derived parameters.

Table 2
Output of the fitted regression logistic model.