Detection of precancerous lesions in the oral cavity using oblique polarized reflectance spectroscopy: a clinical feasibility study

Abstract. We developed a multifiber optical probe for oblique polarized reflectance spectroscopy (OPRS) in vivo and evaluated its performance in detection of dysplasia in the oral cavity. The probe design allows the implementation of a number of methods to enable depth resolved spectroscopic measurements including polarization gating, source–detector separation, and differential spectroscopy; this combination was evaluated in carrying out binary classification tasks between four major diagnostic categories: normal, benign, mild dysplasia (MD), and severe dysplasia (SD). Multifiber OPRS showed excellent performance in the discrimination of normal from benign, MD, SD, and MD plus SD yielding sensitivity/specificity values of 100%/93%, 96%/95%, 100%/98%, and 100%/100%, respectively. The classification of benign versus dysplastic lesions was more challenging with sensitivity and specificity values of 80%/93%, 71%/93%, and 74%/80% in discriminating benign from SD, MD, and SD plus MD categories, respectively; this challenge is most likely associated with a strong and highly variable scattering from a keratin layer that was found in these sites. Classification based on multiple fibers was significantly better than that based on any single detection pair for tasks dealing with benign versus dysplastic sites. This result indicates that the multifiber probe can perform better in the detection of dysplasia in keratinized tissues.


Introduction
The early detection of oral cavity cancer can greatly reduce morbidity rates as the 5-year survival rate associated with localized stage increases to 83% from 36% for the disease that has a distant spread. 1,2 Even in highly developed nations where dental exams are prevalent, most cases of oral cancer are not detected until large, symptomatic lesions exist and the disease has advanced beyond the organ site when treatment options are limited and less effective. 1 Suspicious lesions are typically biopsied following visual inspection and physical palpitation. Unfortunately, current screening techniques are limited by various confounding factors that can mask oral cancer progression. For example, benign inflammatory conditions appear very similar to premalignant and malignant lesions, making it difficult for even highly trained physicians to differentiate them. Additionally, one of the biggest risk factors in oral cancer is the synergistic effect of alcohol and tobacco, which exposes the entire lining of the oral cavity causing some of the lesions to span large regions of the cavity. 3 Since biopsies can be taken only from a few suspicious regions, there is a high probability of sampling errors in oral cancer detection and diagnosis. Furthermore, after treatment, oral cancer patients require monitoring for potential development of secondary tumors that involves routine biopsies of their oral cavity taken over the course of many years-a painful invasive procedure that is also prone to sampling errors. 4 Therefore, there is an evident need for a noninvasive method that would provide a real-time feedback to facilitate directed biopsies and to aid in the detection and monitoring of premalignant lesions in the oral cavity.
Optical techniques have emerged as promising tools in addressing challenges associated with the detection of oral cancer. [5][6][7][8] Fluorescence imaging and spectroscopy have been used to detect changes in tissue fluorescence associated with endogenous fluorophores, such as reduced nicotinamide adenine dinucleotide (NADH) and flavin adenine dinucleotide (FAD), in the epithelium and collagen/elastin in the stroma. [9][10][11][12][13] It was shown that the fluorescence of the epithelial cells increases and the fluorescence of the stroma decreases with neoplasia. 14 The net result of these changes in situ is an overall decrease in fluorescence intensity with blue light excitation. 15 This property has led to the development of VELscope (LED Dental Inc., White Rock, British Columbia)-a commercial device that is used as an adjuvant device in the visualization of oral cavity precancerous and cancerous lesions. Although very useful, this device has a relatively low specificity due to false positive loss of fluorescence in the areas of inflammation. 16,17 Therefore, there is an ongoing effort to improve specificity in the examination of the oral cavity. Optical coherence tomography (OCT) has been extensively studied in imaging of oral tissue for the evaluation of oral carcinogenesis. OCT can provide highresolution images of the entire thickness of the oral epithelium but requires interpretation by a trained histopathologist in assessment of live images. 6,18 Raman spectroscopy has been investigated as a diagnostic tool to discriminate cancerous from normal oral tissue by characterizing chemical and molecular tissue composition. 19,20 Although Raman spectroscopy can be very informative, Raman signals are very weak, making implementation of this technology quite challenging. Elastic scattering spectroscopy (ESS) has also shown potential in detecting quantitative morphological and structural information of the oral tissue. [21][22][23] However, significant challenges are associated with ESS interpretation, including a low signal-to-noise ratio.
There are multiple factors that complicate sensitive detection of malignancies in the oral cavity, including high levels of keratinzation and variable epithelial thickness due to benign inflammatory conditions. Furthermore, neoplastic changes of epithelia and the underlying stroma in the oral cavity are associated with concurrent alterations in optical signals from different depths. 14,15,24,25 These complications require the development of depth-sensitive approaches that would allow simultaneous evaluation of optical signals from various depths in tissue. However, acquiring depth-dependent alterations in optical signatures associated with an oral malignancy and, especially, separating signals from the epithelium and stroma is a challenging task. Indeed, the avascular epithelium is thin and optically transparent, whereas the stroma underneath contains a dense network of collagen and elastin fibers, various cells, and blood vessels causing a strong scattering and hemoglobin absorption that dominates optical signals. Several methods have been implemented to improve signal collection from the superficial epithelial layer using depth-sensitive spectroscopy; these approaches can provide quantitative morphological and architectural information associated with oral precancer from targeted depths in the epithelium and stroma and, thus, can become useful clinical tools in the detection and monitoring of oral cancer. To this end, a number of probe designs with optical fibers normal to the tissue surface were evaluated with depth selectivity achieved by applying a variable aperture, 26,27 variable source-detector separations, [27][28][29] and differential path length spectroscopy. 30 However, these designs were limited in their ability to isolate the relatively weak optical signatures of the epithelium from the dominating stromal signal. An improvement in the detection of photons originating in the superficial epithelial layer was demonstrated using optical probes with a spherical lens 10,25,31 or by positioning fibers in an oblique orientation. [32][33][34][35][36] In addition to changing probe geometry, our group and others developed polarization gating to separate epithelial signals from the diffuse background of the underlying stroma. [37][38][39][40][41][42][43] This approach is based on a combination of linear polarized illumination and collection of scattered light with polarization parallel and perpendicular relative to the illumination. Detected photons that travel a short distance within a sample undergo a single or a small number of scattering events, thus, maintaining their original polarization state. Conversely, photons that propagate deeper undergo many scattering events and, as a result, have random polarization. Thus, detection of photons that preserve their original polarization state provides a method to isolate the epithelial scattering. Encouraging clinical results have demonstrated the potential of polarization gating spectroscopy in the detection of colonic carcinogenesis and pancreatic adenocarcinoma. 2,44 To further improve depth selectivity in reflectance spectroscopy, our group combined an oblique collection fiber geometry with polarization gating in a method termed oblique polarized reflectance spectroscopy (OPRS). 24,32 Evaluation of this method in a pilot clinical trial in the oral cavity showed 90% sensitivity and 86% specificity in the separation of normal tissue from high-grade dysplasia and carcinoma. After these promising pilot clinical studies, we hypothesized that the performance of the probe can be significantly improved if a new probe design accounts for variations in thicknesses of the epithelium and the keratin layers that are very common in patients with malignancies in the oral cavity. To test this hypothesis, we designed a compact OPRS probe for simultaneous collection of polarized reflectance spectroscopic signals from multiple depths using multiple angle-polished beveled detector fibers (BF). 45 The probe's simple design uses a small number of components and no moving parts, which simplifies manufacturing and has low production costs. Encouraging results obtained with multilayer, tissue-mimicking phantoms, and in vivo measurements of normal oral mucosa demonstrated feasibility of the multifiber OPRS probe to provide depth-resolved measurements within tissue. 45 Here, we evaluated the ability of this multifiber OPRS technology in the detection of dysplastic changes in patients with malignancies in the oral cavity.

Experimental System
A schematic of the OPRS system is shown in Fig. 1(a). An illumination fiber (100-μm core diameter, NA ¼ 0.12) delivered light to tissue from a 20-W tungsten halogen broadband light source (Ocean Optics, HL2000HP-FHSA). Then, light scattered from the tissue was collected by detection fibers (100-μm core diameter, NA ¼ 0.12) and was delivered to an imaging spectrometer (PI Acton SpectraPRO SP-2356, Pixis 2KB) equipped with a 150 g∕mm grating optimized for the visible wavelength region (500-nm blaze). To enable a modular design wherein multiple probes could be easily interchanged, proximal ends of illumination and detection fibers were connected using low insertion loss (ca. 0.15 dB) FC/APC adapters to coupling fibers leading to either the light source (the illumination fiber) or the imaging spectrometer (detection fibers). At the spectrometer, detection fibers were assembled in a vertical array using a slit ferrule for alignment with the spectrometer's entrance slit. Light from all collection fibers was simultaneously dispersed by the spectrometer's grating onto an imaging CCD. An image was produced with the vertical dimension corresponding to different detection fibers and the horizontal axis displaying scattering spectra collected by the fibers thus simultaneously capturing spectra from all detection fibers in a single image. The entire system was housed on a wheeled cart for mobility [ Fig. 1 A schematic of the distal end of the probe for OPRS is shown in Fig. 1(c). All fibers had a silica core with n ¼ 1.458 and diameter of 100 μm; fluorine-doped cladding with n ¼ 1.455 and diameter-110 μm; and an NA of 0.12 (CeramOptec Industries, Inc., WF 100/110 P12). Obliquely oriented collection fibers are referenced according to their distance from the illumination fiber-BF1 is separated from the illumination fiber by one flat spacer fiber; BF2 is separated from the illumination fiber by two fibers, etc. Two flat tip fibers were positioned on each side of the illumination fiber as spacers to accommodate the gap between two pieces of the polarizing film that was used for polarized light illumination/detection as described below. The detection fibers (BFs) were polished at 40 deg with respect Journal of Biomedical Optics 065002-2 June 2017 • Vol. 22 (6) to the sample surface using a custom-made polishing puck. We have reported previously that this angle provides the optimum combination of depth selectivity and collection efficiency. 45 In this multifiber design, collection cones of detection fibers and the illumination beam overlap at progressively increasing depth in tissue as the distance between the illumination and detection fibers increases thus allowing control over the sampling depth of tissue optical properties. The diameter of the fiber array of the constructed probe was ca. 1.6 mm. The outer probe diameter was made 7.6 mm for handling convenience in applications for detection of precancers in the oral cavity. An artistic rendering and an image of the distal end of the OPRS probe is shown in Figs. 1(d) and 1(e), respectively, to illustrate the beveled probe design following angle polishing. Once the fibers were polished, Journal of Biomedical Optics 065002-3 June 2017 • Vol. 22 (6) two pieces of 150-μm thick polarizing film (n ¼ 1.458) with 0.0002% extinction transmittance of cross-polarized light were glued to the distal end of the fibers using optically transparent and biocompatible epoxy (Epo-Tek 301-2). The two polarizing film pieces were positioned with their polarization axis perpendicular to each other as shown in Fig. 1(f), which allowed one half of the detection fibers to collect light copolarized with the illumination light polarization and the other half to collect cross-polarized light. Here, the scattered light collected with polarization parallel to the illumination is termed parallel and the cross-polarized light is defined as perpendicular. The fiber assembly with polarizing films was secured inside steel tubing with the outer diameter of 7.6 mm using biocompatible epoxy glue (Epo-Tek 301-2). Then, the steel tubing was carefully bent at an angle of 45 deg for easy positioning inside the oral cavity. Finally, a 200-μm thick quartz window (n ¼ 1.54) was glued to the polarizing film for protection of the probe during sterilization and to achieve an optimum overlap between illumination and collection cones at tissue surface.

Collection of Clinical Data
In vivo spectra were collected with informed consent from 28 patients who were 18 years old and over and who were referred to the Department of Integrative Oncology at the British Columbia Cancer Agency (BCCA) with lesions in the oral cavity suspicious for dysplasia or carcinoma. The spectroscopic measurements followed standard oral cavity examinations by a physician. The spectra were collected from all sites suspicious for dysplasia and from contralateral (whenever possible) normal sites. The abnormal sites were biopsied and were histologically confirmed as benign, mild dysplasia (MD), or severe dysplasia (SD). Benign sites appeared abnormal during clinical examination but were histologically diagnosed as normal. Several (2 to 3) normal site measurements were taken for each patient. Depending on the size of an abnormal lesion, one to several measurements per lesion were obtained. All measurements were taken with room lights turned off to minimize background signal. Calibration spectra were acquired before each patient evaluation using a diffuse reflectance substrate standard (SRS-99, Labsphere, Inc.) and a background signal was measured using minimally reflecting black substrate (SRS-02, Labsphere, Inc.). Measurements from three patients were removed due to errors during data collection process associated with user mishandling of the probe or system malfunctioning.

Preprocessing of Oblique Polarized Reflectance Spectroscopy Spectra
The mathematical equation that was used in spectra preprocessing is given below and it includes the following steps. First, the background signal from the minimally reflecting black substrate (I dark ) was subtracted from the collected raw spectra (I meas ) to remove residual environmental light. Background signal from the environment was minimized by carrying out measurements in a dark room. Next, the spectral responses of the source, fibers, and detector were accounted for by normalizing the background corrected spectra by the spectrum from the diffuse reflectance substrate standard (I whte ). Because the diffuse reflectance standard is not a perfect depolarizer, 46 the perpendicular signal of the reflectance standard was multiplied by a ratio of the parallel to the perpendicular component called the depolarization coefficient (D) to account for this effect. To minimize background from residual room light, background signals (I dark ) were first subtracted from the parallel and perpendicular spectra. In addition, collection areas (A) at the tissue interface for each detection fiber was determined in Zemax (Zemax, LLC, Kirkland, Washington) and was used to correct for the trend that fibers farther from the illumination fiber collect scattered photons from larger areas. To correct for the variations in the collection efficiency of detection fibers, the spectra were divided by the power throughput of each fiber (P). The relative power throughputs of detection fibers (P par and P per ) were determined by connecting each collection fiber's promixal end to the illumination light source and measuring the power at the fiber's distal end with a power meter (371R Optical Power Meter, Graseby Optronics). Also, differences in the collection time (t) that was used during data acquisition were accounted for. In summary, the normalization scheme to produce comparable OPRS data was as E Q -T A R G E T ; t e m p : i n t r a l i n k -; s e c 2 . 3 ; 3 2 6 ; 5 6 5 where I k ðλÞ and I ⊥ ðλÞ are the normalized parallel and perpendicular spectra, respectively.

Quantifying Penetration Depth of Collection Fibers
To quantify the depth in a turbid media from which collection fibers collect scattering signal, an experiment was adapted from a method developed in Refs. 38 and 47. Briefly, a glass container with a 2-cm thick layer of optically transparent cured polydimethylsiloxane (PDMS) at the bottom to eliminate any back-reflections was filled with 20% Intralipid (Sigma-Aldrich, I141) to mimic stromal scattering (μ 0 s ¼ 2 mm −1 ) 15,48 at 600 nm. This wavelength was used as it was the central wavelength in our range of interest (450 to 750 nm). The concentration of Intralipid in water required to simulate the target scattering properties was determined by measuring transmission of phantoms with varying Intralipid concentrations. Then, using the relationship μ s ¼ − lnðTÞ L , where L is the path length, a plot of Intralipid scattering properties as a function of Intralipid concentration was obtained (see Appendix). During measurements, the OPRS probe was initially placed in contact with the transparent PDMS layer. Then, the probe was moved away from the PDMS at 50-μm increments until a distance of 2500 μm was reached. OPRS measurements were carried out at each increment. Spectra from each collection fiber were processed as described in Sec. 2.3 and integrated in the wavelength range 450 to 750 nm. The integrated intensities were plotted as a function of Intralipid thickness.

Standardization of Clinical Oblique Polarized Reflectance Spectroscopy Measurements
Standardization with respect to the normal tissue was carried out to account for the varying anatomy across patients. The standardization approach was adapted from a method by Rajaram Journal of Biomedical Optics 065002-4 June 2017 • Vol. 22 (6) et al. 49 wherein it was applied to account for intersubject variations in the skin. A standardization factor was used to account for divergence of scattering from normal sites obtained from a given patient from the overall average intensity of all collected normal spectra. The standardization factor for each patient (S i ) was calculated by dividing the integrated intensity of the average spectrum of all collected normal spectra (N mean ) by the integrated intensity of a normal spectrum from each patient (N i ): The S i factor for each patient was used to normalize intensity of all measured normal and abnormal spectra as follows: M i ðλÞ 0 ¼ S i × M i ðλÞ, where M i ðλÞ 0 and M i ðλÞ represent the standardized and originally measured spectra, respectively. Note that M i ðλÞ represents any measured I i ðλÞ par and I i ðλÞ per spectra. Figure 2 shows OPRS spectra before and after standardization. The standardization scheme retained differences in the spectral shape of the measurements from the normal and abnormal lesions while removing large variations in the overall magnitude caused by interpatient variations and by various anatomical locations in the oral cavity.

Spectral features
The following eight binary classification tasks were evaluated: (1) normal versus MD, (2) normal versus SD, (3) normal versus MD and SD combined, (4) MD versus SD, (5) benign versus MD, (6) benign versus SD, (7) benign versus MD and SD combined, and (8) benign versus normal. The list of all OPRS features considered in data analysis is shown in Table 1. It includes the spectral mean, which is the average intensity taken across the entire wavelength range (450 to 750 nm), along with the intensity at the most discriminatory wavelengths that were extracted for the following spectra: parallel (k)-collected by fibers BF1, BF2, and BF3; perpendicular (⊥)-fibers BF1 per , BF2 per , and BF3 per ; diffuse (kþ ⊥)-sums of signals collected by symmetrically positioned fibers collecting parallel and perpendicular spectra; polarization gated (k− ⊥)-differences between parallel signals and corresponding perpendicular signals; parallel/perpendicular (k∕ ⊥)-ratios of parallel and perpendicular signals collected by symmetrically positioned fibers; parallel and perpendicular differentials-differences of signals collected by two adjacent fibers; ratios of differential signals; and ratio of polarization gated signals. All these features were extracted from preprocessed and standardized spectra. In addition, the spectra were normalized by the area under the curve (AUC) to emphasize spectral shape differences between various diagnostic categories. For each binary classification task, the most discriminatory wavelength was selected as the wavelength with the highest Welch's t-statistic value. The Welch's t-statistic at a given wavelength calculates the absolute difference in mean spectra across patients between the two diagnostic classes relative to the amount of interpatient spectral variation that is observed within the two classes at that wavelength. Another feature that was extracted for all detection fibers was the ratio of the intensities at 576 to 610 nm; this ratio reflects the magnitude of hemoglobin absorption in the scattering spectra. 45

Selecting the Most Discriminatory Spectral Features
A total of 120 spectral features were extracted from measured OPRS spectra. These features include 27 spectral means, 27 intensities at the most discriminatory wavelengths, and 6 intensity ratios at 576∕610 nm for the total of 60 features for each unnormalized and normalized spectra with the final count of 120 features; this translates to (2 120 − 1) different possible feature combinations in a two-class classification problem. An exhaustive investigation of all feature combinations would require vast Journal of Biomedical Optics 065002-5 June 2017 • Vol. 22 (6) amounts of processing time, and the available data sample size is likely to be insufficient for investigating high-dimensional feature spaces. Therefore, it was necessary to reduce the number of features prior to training classifiers for differentiating between different diagnostic classes. Feature selection was used to identify the most diagnostically relevant features and to reduce redundancies by eliminating features that are closely related to each other. We preferred feature selection over feature extraction (such as principal component analysis) to retain the physical significance of features used for diagnostic classification. Maximum relevance minimum redundancy (mRMR) was employed as the approach for feature selection as mRMR produces a subset of features with the highest relevance (i.e., highest discrimination between the two diagnostic classes) and minimal redundancy (i.e., minimum correlation between features). mRMR has low computational complexity and produces features with smaller classification errors as compared to those obtained from other feature selection strategies. 50 Given that sample size is limited for some of the diagnostic classification tasks, leave-one-out cross validation (LOOCV) strategy was used to reduce overtraining. The appropriate set of features from mRMR was calculated as the minimum feature set beyond which the performance on training data does not significantly improve by inclusion of additional features. Upon selecting the most diagnostically relevant features, their performance in discriminating between different diagnostic classes was evaluated by determining their classification accuracy. Random forest classifiers were used in this study owing to their advantages when dealing with small sample sizes and high-dimensional feature space. 50 The area under the nonparametric receiveroperating characteristic (ROC) curves (AUC) was calculated to quantify the performance of selected features when combined by the random forest classifier. The random forest classifications along with the determination of the discriminatory wavelengths and feature selection were performed in the R software using the caret and mRMRe packages (R Development Core Team, Vienna, Austria).

Testing for Overtraining
To reduce the risk of overtraining associated with the limited number of data in the clinical trial, we employed LOOCV, which leaves out a single subject to generate the random forest models. To further test for overtraining, a permutation test was employed wherein the pathology definition (normal, benign, etc.) was randomly rearranged while preserving the number of cases in each diagnostic category. 51 The mean and standard error of the AUCs were obtained from shuffling the pathology definitions 100 times. These values were then compared with the AUC values obtained with correctly assigned diagnostic results to the data.

Depth Penetration of Oblique Polarized Reflectance Spectroscopy Collection Fibers
Penetration depths of detection fibers of the OPRS probe were evaluated using an Intralipid phantom simulating stromal scattering. The relationship μ s ¼ − lnðTÞ L between the reduced scattering coefficient, μ 0 s (g ¼ 0.752) 52 at 600 nm, and transmission of Intralipid phantoms at concentrations of 0.05%, 0.1, 0.15%, 0.2%, and 0.25% was used to plot the dependence of the Differential ratio, (differential, k /differential, ⊥) (mean and x nm) Polarization gated ratio, (mean and x nm) scattering coefficient on the Intralipid concentration (see Appendix). The fit was found to be in good agreement with previously published results. 52,53 Using the derived linear fit, the Intralipid concentration of 2.6% in water was found to yield the target μ 0 s of 2 mm −1 . Integrated intensities of scattering signals collected by detection fibers were plotted as a function of phantom thickness as shown in Fig. 3. The signals followed a common trend with an initial increase followed by a saturation. As expected, the intensity profiles of detection fibers positioned further away from the illumination fiber were shifted toward greater depths [ Fig. 3(b)]. Also, the perpendicular signals appeared at greater depths as compared to the parallel ones as the result of a gradual depolarization of linear polarized excitation with depth; this trend became less pronounced with increased separation between illumination and detection fibers. The detected signals achieved 90% value of the saturated signal at 750, 900, and 1100 μm for parallel detection fibers BF1, BF2, and BF3, respectively, and at 1200, 1250, and 1400 μm for perpendicular fibers BF1per, BF2per, and BF3per, respectively [ Fig. 3(c)]. The polarization gating (BF1 − BF1per, BF1 − BF2per, and BF3 − BF3per) reduced the interrogation depth in the scattering medium as compared to individual collection fibers [ Fig. 3(c)].

Sample Distribution of Clinical Measurements
Results of 93 in situ measurements from 25 patients were analyzed. The distribution of anatomical sites measured is shown in    Fig. 4, illustrate morphological changes associated with the progression of dysplasia in the oral cavity. It is important to note a high degree of similarity between anatomy of benign and MD categories. MD is marked by early dysplastic changes limited to the lower third of the epithelium in the basal and parabasal layers that are not present in the benign category. However, both the benign and the MD images show a well-defined superficial keratinization layer that complicates diagnosis.
To evaluate the prevalence of keratinization, mean thicknesses of the epithelium and keratin layers for benign, MD, and SD diagnostic categories were measured in scanned H&E stained slides of biopsied samples using Panoramic Viewer (3DHISTECH) (Fig. 5). Mean epithelial thicknesses (epithelium + keratin) for benign, MD, and SD were ca. 426, 504, and  701 μm, respectively, indicating thickening of the epithelial layer with dysplasia progression. Lesions in the oral cavity often experience hyperkeratosis due to chronic irritation, which is reflected in an increased degree of keratinization. 54

Polarized Reflectance Spectra
Averaged parallel (k), perpendicular (⊥), diffuse (kþ ⊥), and polarization gated (k− ⊥) reflectance spectra for each diagnostic category are shown in Fig. 6(b). Qualitative analyses of the spectra uncovered substantial differences between diagnostic categories. These differences were wavelength dependent and were quite substantial in certain narrow wavelength regions that prompted analysis of the most discriminatory wavelengths and wavelength regions that are described below. Overall, average benign spectra tended to have the highest intensity as compared to other diagnostic categories likely due to a high degree of keratinization associated with benign oral cavity lesions. The mean abnormal spectra (MD and SD) exhibited larger intensities than the normal spectra that can be attributed to higher scattering due to keratinization of these sites and to morphological changes associated with dysplasia, such as increased nuclear size, hyperchromasia, and pleomorphism. 55 The spectra were also normalized by the AUC to emphasize shape differences between diagnostic categories (Fig. 7). It is interesting to note that the normalization revealed small but distinct spectral differences between all diagnostic categories in the parallel spectra; however, in the perpendicular spectra, MD and SD categories are virtually indistinguishable, whereas normal and benign spectra are clearly separated. Figure 8 shows a diagram that summarizes the most discriminatory wavelengths that provide the maximum separation between each binary diagnostic classification class. Each tickmark on the x-axis represents a spectral feature of interest listed in Table 1-a total of 27 features for each binary classification;

Diagnostically Significant Spectral Features
the corresponding discriminatory wavelengths are identified by dots along the y-axis. The analysis was carried out for unnormalized [ Fig. 8(a)] and AUC normalized spectra [ Fig. 8(b)]. The full list of discriminatory wavelengths can be found in Appendix, Tables 7 and 8.
To determine spectral regions with the most diagnostic relevance, the entire wavelength range (450 to 750 nm) was divided into 20 nm spectral bands and the frequency of appearance of discriminatory wavelengths in each band was evaluated (Table 3). For both unnormalized and area normalized spectra, wavelengths associated with hemoglobin absorbance between 510 and 610 nm appeared frequently in discrimination between normal and benign versus dysplasia (MD and SD) indicating that blood absorption plays an important role in discrimination of histologically normal and abnormal tissue. The wavelength band of 450 to 469 nm was also prominent in all diagnostic categories for unnormalized features and especially in the separation of benign from MD and SD. This trend can be attributed to a strong superficial scattering of light in the blue spectral range from the surface keratin layer. Indeed, average spectra for the benign category shown in Fig. 6 generally exhibited a steeper, more negative slope in the wavelength range 450 to 500 nm as compared to other diagnostic categories. However, the wavelength band of 450 to 469 nm did not appear frequently for the area-normalized spectra indicating that this wavelength region was strongly associated with signal amplitude rather than spectral shape. Prevalent discriminatory wavelengths for the discrimination of MD from SD were not associated with hemoglobin absorption indicating that this classification was more sensitive to differences in scattering. Indeed, the wavelength region above 600 nm is highly significant in binary tasks associated with keratinized tissues, such as benign, MD, and SD. It could be associated with better penetration of red-NIR light through the keratinized layer in these sites that results in the collection of more diagnostically relevant information.
After determining the most discriminatory wavelengths, we carried out selection with mRMR that included all features listed Fig. 7 Polarized reflectance spectra normalized to the AUC; all collected spectra for each diagnostic category were first normalized by the AUC and then averaged.
Journal of Biomedical Optics 065002-9 June 2017 • Vol. 22 (6) in Table 1 for all detection fibers to identify the best combination of spectral features for each diagnostic classification task (Table 4). Features selected by the mRMR algorithm included scattering signals from all detection fibers as well as features associated with polarization gated and diffuse scattering spectra; this implies that scattering signals collected from various depths were found to be diagnostically relevant. The discrimination of normal tissue from dysplasia (SD, MD, and MD and SD combined) and the benign category produced excellent results with AUC values close to 1 (Table 4 and Fig. 9). However, the discrimination of the more clinically challenging classification of the benign from dysplasia was more challenging as reflected by significantly lower AUCs. The poorer performance in discriminating the benign and dysplastic categories could be attributed to keratinization confounding the diagnosis. Indeed, an increase in scattering is expected for MD and SD cases due to alterations in epithelial morphology, such as increased nuclear size, hyperchromasia, and pleomorphism along with keratin formation. However, benign sites are also associated with an increased scattering signal due to a high degree of keratinization (Fig. 4). A strong scattering by the keratin layer can create a significant background scattering, thus, interfering with detection of scattering signatures of dysplasia. After conducting the global analysis with all detection fibers, we addressed the question of whether the multifiber design of this OPRS probe improves diagnostic performance as compared to any given single fiber pair of the probe. Similar to the multifiber analysis, the most diagnostically relevant spectral features and the corresponding AUCs for each parallel/perpendicular pair of detection fibers were determined using feature selection with mRMR (Table 5). With global analysis, including all detection fibers, the separation of normal from MD and SD resulted in excellent AUCs for all fiber pairs with no significant difference between different pairs or their combination. However,   Table 4 Features found to be important in discriminating between diagnostic categories using mRMR and the corresponding AUC.
Features identified with a wavelength (nm) refer to intensity at the most discriminatory wavelength, and those labelled with norm are associated with normalized spectra. No wavelengths are listed for spectral features associated with the spectral mean-the average intensity taken across the entire wavelength range. discriminations of the benign from MD and from SD categories significantly worsened as compared to the combination of all fibers. These results were further supported by the statistical overtraining analysis described below.

Check for Overtraining
LOOCV together with a permutation test wherein the data were randomly assigned a diagnostic class was used to check for overtraining. The results from the permutation test are shown in Fig. 10. The mean AE standard deviation of the AUCs for the data with randomly shuffled diagnoses are shown as black lines and are compared with real AUCs from Tables 4 and 5 obtained for the correctly classified dataset. The real AUCs in classifications of the normal from the benign and dysplastic cases were well above the errors bars of random permutation tests confirming statistical significance of data analyses. However, there was no significant difference between data analyses that included all fibers or any separate individual parallel/ perpendicular detection fiber pair. The benign versus MD and SD classification tasks were also statistically significant in the case of the global analysis that included all detection fibers, but it was not the case for any of the individual detection fiber pairs. The real AUCs of the MD versus SD classification task overlap with the error bars of the shuffled AUCs that indicates overtraining of the dataset. Therefore, the classification task failed to discriminate MD and SD sites. We also performed analyses using a dataset that did not include features extracted from area-normalized spectra (Appendix, Fig. 12). It is interesting to note that no statistical significance was achieved in discrimination of the benign from MD and SD categories using only unnormalized data.

Discussion and Conclusions
The challenge of distinguishing malignant lesions from benign, inflammatory conditions in the oral cavity using optical methods is well known. For example, there are two commercially available systems based on tissue reflectance-Vizilite Plus (DenMat, Lompoc, California) and Microlux DL (AdDent, Danbury, Connecticut)-that utilize an acetic acid wash to enhance scattering from abnormal lesions due to the effect of acetowhitening. 56,57 The sensitivity of Vizilite Plus was reported to be 100%; however, the specificity is poor ranging from 0% to 14%. 58 Microlux DL fared better with a sensitivity of 78% and a specificity of 71%, but its positive predictive value was just 0.37 in a clinical trial of patients with oral white lesions. 59 A number of research groups evaluated spectroscopic techniques in diagnosis of malignancies in the oral cavity with various degree of success. Amelink et al. 60 developed optical fiber probes for a quantitative assessment of oral tissue with differential path-length spectroscopy (DPS); they found that DPS can discriminate nondysplastic and dysplastic leukoplakias, which are characterized by a high level of keratinization, with 91% sensitivity and 80% specificity. Müller et al. 22 reported a sensitivity of 64% and a specificity of 90% in discriminating dysplastic from cancerous cases in a clinical study that combined fluorescence, diffuse reflectance, and light scattering spectroscopy. In a study of a ball lens coupled optical probe that collected autofluorescence and diffuse reflectance of oral tissue from different depths, Schwarz et al. 10 observed a sensitivity of 82% and specificity of 87% when discriminating normal combined with MD versus moderate dysplasia plus cancer cases in 119 nonkeratinized sites; however, the sensitivity and specificity were down to 79% and 80%, respectively, when a dataset of 114 keratinized sites was analyzed.
There have been a number of very interesting clinical studies in the literature reporting the use of physical models in the detection of neoplastic changes in human patients. For example, Perelman's group used an algorithm for extraction of nuclear sizes from scattering spectra collected using endoscopic polarized scanning spectroscopy (EPSS) in patients with Barrett's esophagus to identify early malignant changes. 61 Backman et al. 62 developed an elegant method called low-coherence enhanced backscattering spectroscopy to quantify morphological changes in epithelial tissue associated with carcinogenesis; this technology showed sensitivity of 88% and 71% in advanced adenomas and nonadvanced adenomas, respectively. 63 Furthermore, it was demonstrated that quantitation of changes in blood supply using physical models of polarization gated spectroscopy can differentiate pancreatic adenocarcinoma from normal tissue with 92% sensitivity and 86% specificity. 2 However, despite been very promising these technologies are better suited for unkeratinized thin epithelial tissue. Alternatively, physical models developed for diffuse reflectance spectroscopy are designed for an assessment of deeper located tumors such as breast lesions. 64 These models were used to evaluate changes in tissue concentration of oxy-and deoxyhemoglobin, water, and lipid; 64 this technology showed promise in monitoring of response to breast cancer neoadjuvant therapy. 65 However, physical models, which can accurately consider the influence of a strong scattering from keratinized tissues, are still lacking.
In our group's previous clinical trial using the OPRS fiber probe with a single pair of parallel/perpendicular detection fibers, we obtained statistically significant discrimination between the benign category and SD lesions with 100% sensitivity and 85%   (6) specificity and between the benign and MD sites with 92% sensitivity and 69% specificity. 24 We also observed a sensitivity of 90% and specificity of 86% in the separation of normal from SD and a sensitivity of 75% and specificity of 73% in the separation of normal from MD. 24 The multifiber OPRS probe presented here was inspired by our phantom experiments in well-defined multilayer scattering phantoms mimicking dysplasia of the epithelial tissue, which showed that a multifiber probe design can improve depth resolved spectroscopic measurements in a scattering media; 45 however, these phantom studies did not take into account a strong surface scattering by a keratin layer. The sensitivity and specificity values of OPRS for all diagnostic classification tasks are listed in Table 6. ROC thresholds were chosen such that both high sensitivity and specificity were produced. The discrimination of normal from benign, MD, SD, and MD/SD combined generated sensitivity and specificity values above 95%. These results are a big improvement over our first clinical trial. It is interesting to note that there was no significant difference in the performance of all fibers and any particular fiber pair in these classification tasks. As expected, discrimination of benign from dysplasia resulted in lower specificities and sensitivities values. However, here the analyses including multiple detection fibers performed significantly better as compared to individual fiber pairs in discriminating benign from MD and benign from SD where sensitivity and specificity of 71%/93% and 80%/93%, respectively, were achieved. Furthermore, in these two classification tasks, the overtraining test was passed only when an analysis was performed using data from all detection fibers as shown in Fig. 9. Therefore, the multifiber OPRS probe design improved discrimination of benign from dysplastic lesions in the oral cavity as compared to a single fiber pair. It is also important to note that normalizing reflectance spectra to the AUC was critical in achieving statistically significant classification of benign versus MD and SD categories as can be seen from comparing Fig. 10 Permutation test to check for overtraining. Collected spectra were randomly assigned diagnoses while keeping the overall distribution of diagnostic categories the same; samples were randomly shuffled 100 times. The mean and standard deviation of the AUC obtained from the shuffling are presented and compared to the real AUCs shown as stars (all fibers combined), dots (BF1 fiber pair), squares (BF2), and circles (BF3). permutation tests for normalized (Fig. 10) and unnormalized (Appendix, Fig. 12) data. The decrease in the performance in discrimination between benign, MD, and SD diagnostic categories is most likely associated with an overwhelming scattering from the epithelial keratin layer that is present in all of these lesions (Fig. 5). Furthermore, variations in thickness of the keratin layer result in a high variability of scattering spectra in benign category even after the standardization procedure as can be seen in Fig. 2. These thickness variations are unpredictable and do not have a diagnostic value that significantly complicates optical detection of dysplasia in keratinized lesions. The same observation was previously made in studies of dysplastic and nondysplastic oral leukoplakias by Amelink et al. 60 Here, we used intrinsic data collected by the multifiber OPRS probe to show that multiple detection fibers improved results of binary classification tasks, which include benign lesions. It is interesting to note that there were noticeable differences in the performance of the multifiber OPRS probe and the one by the single detection fiber pair OPRS probe evaluated in our previous pilot clinical trial; 24 these include significantly better discrimination between normal and all other diagnostic categories in the current study and a better performance of the previously reported probe in discriminating benign and SD. The observed differences in the performance are likely associated with changes in probe design as well as with variations between patient groups that were evaluated in these two studies.
Moreover, these results likely reflect challenges in achieving an adequate depth resolution of the spectroscopic measurements. Indeed, the axial extent of overlap volumes between the probe's illumination and collection cones ranged from 750 to 1100 μm and from 1200 to 1400 μm for parallel and perpendicular detection fibers, respectively, [ Fig. 3(c)] that is significantly larger than the measured epithelial thickness, which was varying from ca. 426 to 701 μm in progression from benign to SD (Fig. 5). Consequently, there was a strong contribution of the stromal scattering in collected spectra that was confounded by a strong keratin scattering in the case of benign and dysplastic sites.
It is important to note that our standardization and normalization scheme does not account for the diverse anatomy of the oral cavity. McGee et al. 66 evaluated the impact of anatomy on classification accuracy in a study of 710 spectra from nine different anatomic sites in the oral cavity and found that significant dissimilarities in spectral features exist between different anatomic sites. In a later study wherein McGee et al. used anatomic site specific algorithms, a sensitivity and specificity of only 53% and 70%, respectively, were achieved in the discrimination of benign from dysplasia when data were grouped together regardless of anatomic site. 44 However, the sensitivity and specificity improved to 92% and 67%, respectively, when only sites from the lateral surface of the tongue were considered. A similar trend was observed with a sensitivity of 94% and specificity of 60% when samples from the floor of the mouth and ventral surface of the tongue, anatomic sites found to have similar spectral properties, were analyzed. In future OPRS studies, samples could be stratified by an anatomic site, which could improve diagnostic accuracy. However, due to the limited sample size in this study, stratification was not possible.
There are also some limitations that need to be taken into account with respect to our study. Although every effort was made to carry out consistent spectral measurements, the pressure applied to the probe was not controlled in a quantitative way. Although some studies indicated that probe pressure does not significantly alter spectroscopic measurements, 67 other reports showed that a firm pressure can result in measurable spectral changes. 68 Therefore, the lack of precise control could have added an experimental variable to our data. In addition, it is conceivable that more than one wavelength per spectral feature could be meaningful in the discrimination of different classes. However, an in-depth investigation would entail considering all 301 wavelengths (from 450 to 750 nm) as possible features and conducting feature selection to narrow down to a small set of wavelengths that best discriminates any two diagnostic classes. While this would be a very interesting question to investigate, the sample size in this study was not enough for such an investigation because of a high probability of over-training.
In summary, the multifiber OPRS probe design allows simultaneous implementation of a number of approaches to enhancement of depth sensitivity of scattering measurements in a turbid media including polarization gating, varying source-detector separation and differential spectroscopy. Our study has shown that this combination can discriminate benign and dysplastic lesions in the context of highly keratinized epithelium of the oral cavity. However, a strong scattering from the keratin layer that is characterized by a high degree of interpatient variations remains a significant challenge. A number of modifications can be envisioned for multifiber OPRS in order to further improve its performance in keratinized tissues including the optimization of probing depth of detection fibers and implementation of physical models of light interactions with various tissue morphologies. 34,37,[69][70][71] However, the results of this clinical study and our previous phantom experiments 45 indicate that multifiber OPRS could be better suited for epithelial tissues with a low amount of keratinization such as cervix, colon, or bladder. There are also interesting developments in depthresolved spectroscopic measurements in tissue including spectroscopic OCT 72,73 and depth-resolved reflectance spectroscopy using elliptically polarized light. [74][75][76] These emerging methods could offer new solutions to optical detection of dysplasia in keratinized tissues.

Appendix
The section includes the graph from which the concentration of Intralipid required to produce the target scattering coefficient of 2 mm −1 at 600 nm was obtained (Fig. 11); Tables 7 and 8 with a list of the best discriminatory wavelengths for all unnormalized and normalized, respectively, spectral features analyzed in this study; and Fig. 12 with the results of the permutation test for unnormalized spectral features.

Disclosures
Authors have no financial interests associated with the results of this study.