Cervical cancer is one of the leading causes of cancer deaths in women and is known to be caused by human papillomavirus (HPV) infection.1 A small percentage of HPV-infected women will develop cervical intraepithelial neoplasia (CIN). CIN lesions are divided into three classes of increasing grades. While the majority of low-grade CIN lesions (CIN 1) regress spontaneously, high-grade CIN (HGCIN, CIN 2 and CIN 3) tends to progress to invasive cancer.2–5
Cervical cancer progress is relatively slow and screening programs based on Papanicolaou cytological smear analysis (Pap test) have led to a remarkable decrease in the incidence and mortality of cervical cancer. In a systematic review, Nanda et al. found that the sensitivity of the Pap test ranges from 30 to 87% and the specificity ranges from 86 to 100%.6
Abnormal Pap smears are usually followed by colposcopic examinations involving visual examination under magnification, including the application of acetic acid and Lugol’s iodine solution. Iodine solution stains glycogen-containing normal squamous epithelium, while application of acetic acid leads to acetowhitening of abnormal areas. Colposcopy is the basis for correct identification of an atypical transformation zone, for definition of the grade of an underlying lesion, for targeted biopsy in the case of HGCIN, and for excisional therapy.7
Different grading indexes have been created to discriminate between normal, minor change, major change, and cancer.8–12 Due to subjectivity of this procedure, the actual diagnostic accuracy of colposcopy depends on the experience of the practicing physician, resulting in inter- and intraobserver variabilities.13–16 According to a meta-analysis, colposcopy has a high mean sensitivity of 85%, but a low mean specificity of 69% to distinguish normal and CIN 1 from HGCIN lesions and cancer.17 Benign changes (metaplasia, inflammation) and dysplasia may have similar colposcopic appearances, and biopsies are, therefore, required to confirm the diagnosis.
Avoiding unnecessary biopsies is a major reason for the development of optical sensing technologies, which have the advantage of being noninvasive and objective due to automated data analysis. In addition, the real-time capability could enable a one-stop “see and treat” scenario, which is considered as a great potential benefit for the cervical cancer management. Particularly in some areas in developing countries, diagnostic strategies requiring fewer visits would be more cost-effective.18
A number of ex vivo and in vivo studies of native tissues have previously shown the potential of Raman spectroscopy for detection of precancers in a diagnostic setting.19–27 The reported sensitivities and specificities for the discrimination of CIN from non-CIN tissue range from 73 to 100% and 79 to 98%, respectively. This wide range in results is believed to be due to various differences in the study methods, such as differences in sampling volumes, study designs, spectra preprocessing, classification and model validation methods, or the types and number of tissue classes to be separated, including disease threshold (CIN 1 or CIN 2).
Previous in vivo studies on precancers were performed with fiber probes and abnormal and normal measurement sites were selected using colposcopy.20,21,24–27 In this study, spectra were acquired by point-wise scanning over the surface of cone biopsies with an average of 200 spectra per sample. In order to allow correlation of Raman spectroscopy and histopathology in a spatially resolved manner, the histopathological results of cross-sectional tissue slices were assigned to measurement locations on the native sample surface. This procedure exhibits a diagnostic challenge for the pathologist due to heterogeneity and deformations of the tissue under investigation. Using this experimental approach, our aim was to investigate if the spatial extension of HGCIN can be predicted with Raman spectroscopy within the variety of different non-HGCIN lesions that might be present on the cervix. This might include tissue regions that have been identified as normal, major changes, minor changes, or unspecific as well as miscellaneous findings by colposcopy. While inflammation and metaplasia have been studied previously,19–23,28 Raman classifications of other tissue changes [e.g., erosion, atypical immature metaplasia (AIM), etc.] have not yet been reported.
The results of ex vivo Raman spectroscopic raster scanning might be useful for future Raman diagnostic imaging in vivo. A fast, noninvasive determination of the spatial extension of larger HGCIN (CIN 2 and CIN 3) lesions prior to excision would be very useful to ensure that the lesion margins are completely removed while conserving as much normal tissue as possible in order to avoid overtreatment of the patient and its possible negative impact on future childbearing.29 Furthermore, an in vivo imaging method with sufficient spatial resolution could ensure that small lesions would not be missed due to undersampling. This may occur with point-wise fiber probe measurements as the most prominent areas of colposcopic changes do not always coincide with the areas of greatest histologic abnormalities, and this is the reason why some colposcopists even recommend random biopsies.30,31 Therefore, in vivo Raman analysis covering a large area of the cervix seems to be desirable. Such in vivo setups have already been using fluorescence and reflectance spectroscopy,32–34 but as yet not for Raman spectroscopy.
In this study, the accuracy of the spectra classification was calculated depending on different types of non-HGCIN tissue pathologies included in the training of the algorithm. Spectra classification merged with the histological map was used to gain insights into the performance of the method for different pathologies. Several data reduction and classification methods were compared and confidence intervals for sensitivity and specificity were estimated with bootstrapping.
Material and Methods
Patients with suspected precancer or cancer due to previous cytologies, colposcopies, and biopsies underwent a loop electrical excision procedure of the cervix according to clinical routine. Nonpregnant patients aged 18 to 45 (presumed premenopausal) were included in the study after signing informed consent forms permitting ex vivo spectroscopic measurements from the cervix samples. From the total of 34 recruited patients, 9 patients were excluded from the analysis due to different experimental or technical errors. Other samples were excluded from the data set used for training and evaluating the classification method because of varying conditions in the optimization phase of the study () and because of concurrent adenocarcinoma in situ (). Measurements of 13 to 16 patients were included into the quantitative data analysis. The number of patients who contributed to the data set depended on the class definition as described in Table 1. The study protocol was approved by the ethics committee of the Charité University Clinic Berlin.
Different class definitions A to E used to obtain the results presented in Figs. 5 and 6 are shown. The tissue types that were included in either class 1 or 2 of the binary spectra classification are indicated in the second and third columns. For improved clarity, the study-specific tissue categories #1 to 8 are included in parentheses, which refer to the description of tissue categories in Sec. 2.4. Please note that depending on the class definitions, different numbers of patients (see fourth column) and different numbers of spectra (see fifth and sixth columns) contributed to the analyzed data set.
|Definition||Class 1||Class 2||# patients||# class 1 spectra||# class 2 spectra|
|A||(1) normal squamous epithelium (iodine positive)||(4) HGCIN||13||214||303|
|B||(1)+(2) normal tissue as in A plus nonabnormal tissue (squamous and columnar epithelium, metaplasia, cervicitis)||(4) HGCIN||15||841||303|
|C||(2) nonabnormal tissue||(4) HGCIN||15||627||303|
|D||(1)+(2) as in B||(4)+(5) HGCIN, HGCIN-borders||16||841||1514|
|E||All–(4)–(5)–(8)=(1)+(2)+(3)+(6)+(7) all spectra assigned to histopathology (normal, nonabnormal, abnormal, CIN1, erosion) except HGCIN and HGCIN-borders||(4) HGCIN||16||1216||303|
A Raman tissue scanner was constructed to scan large tissue samples (). The setup was based on an iHR320 Spectrograph (HORIBA Jobin Yvon GmbH, Bensheim, Germany) with ruled grating (600 ) and a back illuminated deep depletion CCD detector (Synapse) cooled to . A distributed feedback diode laser (Eagleyard Photonics, Berlin, Germany) with the center wavelength at 784 nm at a laser power of 70 mW was used. Due to the irregular shape of the samples, an inverted setup, was chosen. Here the sample surface could be pressed gently onto a measurement window to maintain a constant position of the sample surface in the focus of the laser and detection optics. For the measurement window, either 1.5-mm-thick calcium fluoride (Korth Kristalle GmbH, Altenholz, Germany) or 1-mm-thick silica (Siegert Consulting e. K., Aachen, Germany) was used. The focus of laser and detection optics was set to the interface of the measurement window and the sample surface. The working distance was 9 mm.
The laser light was coupled from a single-mode fiber into a 600-μm multimode fiber (LEONI Fiber Optics, Berlin, Germany), which was collimated with an achromatic lens (), filtered by a clean-up filter (Semrock, Rochester, New York), reflected by a dichroic mirror (Semrock), and focused by an aspheric lens [, numerical aperture , Asphericon] on the sample surface. The resulting illumination spot had a diameter of 160 μm, as measured by knife-edge measurements.
For detection of the backscattered light, the aspheric lens was used together with two achromatic lenses ( and ) to image Raman scattered light onto a circular detection fiber bundle consisting of 31 tightly packed fibers (LEONI Fiber Optics) with a core diameter of 100 μm and an . A detection spot size of 200 μm was achieved, as measured by knife-edge measurements. A long-pass filter (RazorEdge, Semrock) in the collimated beam was used to block the laser line. The round detection fiber bundle was converted into a stack of 100-μm fibers, which was imaged onto the spectrograph slit. In order to match the fiber’s NA (0.22) to the spectrograph’s NA (0.12), the detection fiber stack was magnified and a spectrograph slit width of 200 μm was employed to minimize the light loss due to the spectrometer coupling. The resulting spectral resolution was measured as (at full width half maximum) at this optical configuration using argon emission lines of a mercury arc lamp.
A motorized xy-translation stage with integrated measuring system (Märzhäuser Wetzlar GmbH & Co. KG, Wetzlar, Germany) was used for moving the tissue and measurement window relative to the fixed optical components. Prior to Raman mapping, the stage was used to drive the sample into the focus of a color camera (The Imaging Source Europe GmbH, Bremen, Germany) in order to obtain an image of the sample surface for documentation.
Labview 8.6 (National Instruments, Austin, Texas) software was written for automated control of the laser, spectrometer system, and xy-translation stage. The total time for one tissue scan depends on the combination of selected parameters, such as integration time, density of measurement points, region of interest, etc., which may be adapted according to the sample type and size. This setup is mobile and light-tight and allows measurements to be performed within illuminated environments in the clinic. Monte Carlo simulations35 with optical parameters of cervical tissue36,37 were used to calculate the depth-dependent sensitivity of our measurement using the diameters of the excitation and detection spot on the tissue surface as well as the NA of the optics as input parameters. Figure 1(b) shows that our optical layout detected 65% (80%) of the total signal (integrated sensitivity) of HGCIN from a depth (300 μm), the thickness of cervical epithelium. Therefore, it was assumed that a sufficient fraction of the detected signal originates from regions close to the basal membrane, where cervical precancers originate. The Raman spectroscopy setup is illustrated in Fig. 1(a).
The orientation and the position of the epithelial surface of the samples were indicated by the surgeons using thread marks and pinning the samples onto cork boards. Each cervical sample was placed onto the measurement window within a self-adhesive silicone rubber frame, which allowed a sample to be immersed in physiological saline (0.9% NaCl) to preserve tissue hydration during measurements. Background spectra were measured with saline solution (without sample). Each sample was loaded with balance weights in order to ensure even attachment of the cervix epithelium to the measurement window.
The sample surface was xy-scanned and spectra were recorded at discrete measurement points with spectrum per within the chosen region of interest (area up to ). The number of spectra recorded for each sample depended on tissue dimensions.
The choice of exposure times and accumulations depended on the autofluorescence level and the total measurement time available. The total measurement time was limited to 1 h, which was the time we were allowed to keep the samples in saline before fixation. The origin of the fluorescence could not be determined within the study; however, intraoperative application of disinfectant and iodine could be excluded. Since the fluorescence of the samples varied considerably, the exposure time was varied from 0.1 to 1 s and the number of spectra accumulations varied from 10 to 30. The acquisition parameters were adapted individually to a subjective optimum in order to scan the whole sample. A roughly spaced test scan was performed in order to determine the average fluorescence level of every sample. In principle, the number of accumulations should have been selected to maintain a constant integration time. However, problems with the speed of the CCD shutter forced us to limit the number of accumulations to 30; otherwise the total measurement time would have been unacceptably high and have led to total exposure times of 2 to 10 s.
In addition, the instrument response was measured daily using a fiber-coupled calibrated halogen lamp (Ocean Optics, Dunedin, Florida) by placing the cosine corrector at the end of the fiber into a customized holder on the measurement window.
Immediately after Raman measurements, the samples were fixed in 10% phosphate buffered formaldehyde solution for histopathological analysis.
Histopathological Mapping and Its Correlation to Raman Measurement Sites
The measurement locations were documented on the sample image, which had been taken prior to the measurement. The accuracy of the measurement position within this image was estimated to be using self-made test objects. The pathologist was provided with this image and the fixed sample, which often showed a slight shape deformation as compared to the native sample. According to routine procedures, each specimen was dissected into appropriate segments and then embedded in paraffin wax. Histological sections were then prepared in series, stained with hematoxylin-eosin, and analyzed by microscopy. The histopathological findings from the tissue cross-sections of the cone biopsy were documented by the pathologist on the image of the native cervix surface, which included the Raman measurement points to yield histopathological maps. The spatial uncertainty of this procedure was estimated to be 1 to 3 mm, depending on the circumstances. Examples of such histopathological documentations can be seen in Fig. 7, where the class membership prediction is overlaid with the histopathological documentations. Each spectrum was then labeled by the spectroscopist with one of the (study-specific) tissue categories 1 to 8 (described below) using a semiautomated program.
Please note that this histopathological mapping process is more challenging than annotating the histopathological results of punch biopsies to measurements at biopsy sites. As it was not possible to engage more than one pathologist in our study, the pathologist involved analyzed each of the samples twice to improve precision of the result documentation. The histopathology results of all patients included in the study were discussed at intraclinical conferences. The pathologist supporting this study (coauthor W.K.) specializes in cytology and gynecological morphology and has worked in this field for over 30 years.
For the spectra classification, the following study-specific tissue categories were assigned to individual spectra. Some categories include multiple tissue types (2,4,5,8) for the sake of a simpler procedure for the assignment of histopathology to measurement sites.
1. Normal squamous epithelium (with visible dark iodine stain from the intraoperative application prior to the measurement, without histopathological signs of abnormality within central areas of homogenous/continuous pathology)
2. Nonabnormal tissue (including squamous epithelium without continuous dark iodine stain, columnar epithelium, metaplasia, and cervicitis, without histopathological signs of abnormality in a continuous area)
3. CIN 1 (spectra from within central areas of homogenous tissue pathology)
4. HGCIN (CIN 2, CIN 3, or AIM p16 positive, including micro invasions, spectra from within central areas of homogenous tissue pathology)
5. CIN-borders (specified by the CIN grade if CIN was not spatially homogeneous/continuous on a scale larger than the spatial uncertainty of the method. Spectra within 1 to 2 mm of the borders of larger CIN lesions or spectra within 3 mm distance of a small CIN lesion were also annotated as CIN-border for the same reason. Thus, such spectra originate either from neighboring pathologies or from a mixed or intermediate pathology)
6. Erosion (without signs of CIN)
7. Abnormal epithelium (areas with abnormalities except CIN)
8. Unassigned spectra (spectra that could not be assigned to histopathology)
The accuracy of the lateral position and extension of pathologies in the histopathological map was estimated to be between 1 and 3 mm (depending on the circumstances). Therefore, only spectra from within central areas of homogenous tissue pathology were assigned to the pathologies. Please note that this procedure significantly reduced the number of CIN spectra available for the quantitative evaluation of the method because most of the HGCIN lesions were not homogeneously high grade, but were focal, only present inside crypts or mixed with CIN 1, metaplasia, or other non-high-grade pathologies. Therefore, the label “CIN-border” was assigned to spectra from areas in which CIN was not spatially homogeneous/continuous as our procedure of correlating histopathology to measurement sites only had an estimated precision of 1 to 3 mm, and therefore, the location of small focal lesions (of sizes smaller than the spatial precision of the method) could not be assigned to measurement positions.
Spectra within 1 to 2 mm of the borders of larger CIN lesions or spectra within 3 mm distance of a small CIN lesion were also annotated as CIN-border for the same reason. Thus, these CIN-border spectra originated from either neighboring pathologies or a mixed or intermediate pathology.
AIM was tested for enhanced expression of the tumor suppressor gene p16. Lesions that were p16 positive were classified as HGCIN equivalent lesions according to Ref. 38, as it is common practice in the cooperating hospital. For the classification, AIM p16 positive was, therefore, included in the HGCIN category (if in the vicinity of HGCIN) or the HGCIN-border category (if mixed with HGCIN and other categories within the uncertainty limits of the spatial correlation of histopathology and measurement position). Since AIM p16 positive was always interspersed with or in close vicinity to CIN or erosion in the samples of our study, separate evaluation of the classification of AIM p16 lesion was not possible.
The CIN 1 to 3 classification was used instead of the 2001 Bethesda system terminology [low grade squamous intraepithelial lesion (LGSIL) and high grade squamous intraepithelial lesion (HGSIL)]39 to differentiate cervical lesions because LGSIL also contains HPV-associated changes other than CIN 1. Our spatially resolved documentation of histopathological analysis was, however, limited to CIN in most samples, whereas HPV-associated changes, such as abnormal epithelium, were indicated without spatial extension. If abnormalities were present, the tissue was not assigned to the nonabnormal category.
Discrimination of HGCIN from LGCIN (CIN 1) and other low-grade lesions (HPV-induced abnormalities) is important due to different treatment strategies. Unfortunately, only a few spectra could be measured from a continuous CIN 1 area and only one sample showed erosion without colocalized focal CIN (though present in one third of all samples), so that a separate training and classification of spectra from CIN 1, erosion could not be performed.
Spectra remained unassigned when the documented measurement site was within 1 mm of a sharp border of an iodine stain next to an abnormal area, on visible blood residues, or in small areas without histopathological specifications.
Preprocessing of Spectra
All spectra were checked for CCD-signal saturation, and if positive, they were removed from data analysis. Cosmic spikes were detected by their discontinuity and replaced by the average of neighboring values.
Before classification, all components of the spectrum that did not result from Raman scattering inside the sample, but from the measurement window or other optical components, were removed as far as possible.
As the measured spectrum contained a large fluorescence background, the sensitivity spectrum of the instrument lead to a distinctive oscillatory pattern. The pattern was mainly given by the transmission spectra of long-pass filters in the instrument. Therefore, an instrument response correction was performed by dividing all raw spectra by the sensitivity spectrum measured with a halogen calibration-lamp.
The background or baseline consists of tissue fluorescence as well as Raman and fluorescence signals of the instrument. A background function , which contains a third-order polynomial, was defined for the fluorescence correction. As instrument background correction, also contained a spectrum measured without sample , which had to be scaled because elastic backscattering of excitation light into the instrument varied from sample to sample and, therefore, lead to varying amplitudes of the instrument’s Raman and fluorescence signals. For each spectrum the coefficients of were determined by a linear least-square fit. The residual of the fit was considered the tissue Raman spectrum.
It is well-known that the Raman signal intensity depends on the optical properties of a sample, which may vary.40 For this reason, the Raman spectrum was normalized using the standard normal variate algorithm.41 Finally, the spectrum is smoothed by a Savitzky-Golay filter (second order, ).
The aim of the Raman-spectroscopic measurement was to discriminate between two classes of tissue according to the histopathological evaluation (Table 1). Therefore, five procedures (see Table 2) for binary classification described in the following were investigated.
List of procedures used for spectra classification.
|Nr. of classification procedure||Description of classification procedure|
|1||PCA plus LR|
|2||PCA plus KNN|
|3||WT plus LR|
|4||WT plus KNN|
After preprocessing, described in Sec. 2.5, the spectra were transformed using either principal component analysis (PCA) or wavelet transformation (WT) using a translation invariant form of the set of Haar wavelets.42
The large number of spectral features was reduced in two steps: Following the PCA, the set of 20 principal components with the largest variance were retained, and for WT, were selected the features with the largest average difference between the two classes.
The remaining 20 to 40 features were further reduced by feature selection in combination with the classification algorithm. A wrapper method with a forward selection or backward elimination process was used.43 The wrapper compares subsets of features by evaluating the accuracy of the classification algorithm for each subset. Application of logistic regression44 or k-nearest neighbor analysis (KNN)45 leads to four combinations of transformation and classification algorithms.
The fifth procedure evaluated is partial least square discriminant analysis (PLS-DA),46 which was applied to the preprocessed but not transformed spectra.
To evaluate the performance of the five classification procedures, sensitivity (percentage of correctly classified spectra of class 2) and specificity (percentage of correctly classified spectra of class 1) were calculated using a leave-one-patient-out cross-validation where all data of each patient were used in turn for validation, while the remaining data were used for the calibration process. To prevent overfitting, that it was found essential that all steps of the feature selection be performed for each validation step (i.e., for each patient left out) separately. For PLS-DA, feature selection corresponds to the optimal number of principal components used.
The significance of sensitivity and specificity was tested with a randomization test. Here the class label of each spectrum is defined by chance using random numbers. Training and validation is done for a large number of randomizations. The corresponding set of results yields distributions for sensitivity and specificity under the null hypothesis that there is no systematic dependence between the spectral data and the histopathological classification. If the values for sensitivity and specificity resulting from the histopathological classification are unlikely to be drawn from these distributions, one concludes that a significant correlation exists between the histopathological classification and the spectral data. For balanced accuracy values of at least 65 to 70%, it could be proven with the randomization test that these values are based on a significant correlation () between the spectra and the histopathological result. This means that these values do not occur by chance.
Confidence intervals for sensitivity and specificity were estimated with the bootstrap method47 by resampling the data patient-wise.
Evaluation of the Classification Performance Depending on Included Tissue Types
As this study was aimed at detection of HGCIN from neighboring non-HGCIN tissue areas within one suspicious larger tissue area, different spectra of the same cone biopsy sample (one sample per patient) were classified into class 1 (non-HGCIN tissue) or class 2 (HGCIN tissue) depending on histopathology at the respective measurement site. Please note that a cone biopsy sample is typically larger (few ) than a punch biopsy sample (few ) and, thus, may contain multiple types and pathologies of tissue.
Since a variety of non-HGCIN tissue types exist (e.g., metaplasia, inflammation, columnar epithelium, low-grade abnormalities), classification performance might depend on the tissue types included in the non-HGCIN class, which might be one reason for the variation of results of previous studies.
In order to evaluate the classification performance depending on the tissue types included in the analysis, the tissue categories 1 to 7 were included in different definitions of class 1 and class 2, called class definitions A to E, as described in Table 1. By this approach we expected to include a larger variety of non-HGCIN tissues per patient (intrapatient variability) than fiber probe measurements, which acquire few spectra at the most suspicious and normal-looking sites under colposcopic examination. The use of two classes results from the application of binary classifiers. Each class in Table 1 contains spectra from all patients that had measurement sites that fulfilled the criteria of the respective class definition.
In class definition A, only normal squamous epithelium was included in class 1, which was identified by a dark iodine stain (plus the absence of abnormal areas in the histopathological maps). This class definition was expected to have the best classification result because there are anatomical differences in addition to the neoplastic changes.48
Class definition B includes different types of nonabnormal tissue, including areas of columnar epithelium as well as inflammation and metaplasia in addition to squamous epithelium. In class definition C, the squamous epithelium, which was used in A, was left out in order to investigate a possible increase of classification performance due to the inclusion of clinically normal sites, as shown previously for fluorescence and reflectance spectroscopy.48 In class definition D, the large number of CIN-border spectra was added to class 2, which means that non-HGCIN tissue might have been included in the HGCIN class because the CIN-border spectra could not be assigned to a distinct tissue pathology due to the limited spatial precision of the histopathological mapping. In class definition E, different non-HGCIN spectra, including those of CIN 1, erosion, and low-grade abnormalities, were included in class 1, which corresponded to all measured (unsaturated) spectra except those of HGCIN, HGCIN-borders, and spectra that remained unassigned to histopathology.
Results and Discussion
Influence of Preprocessing on Extracted Raman Spectra
The influence of corrections for instrument background, tissue fluorescence, or instrument response is shown in Fig. 2. Without the instrument response correction, the oscillation of the spectra due to the filter transmission is strong. Without instrument background correction (a polynomial for fluorescence correction was subtracted to make results comparable), large peaks of silica obscure the tissue Raman signal. With both corrections, the resulting preprocessed spectra look similar to cervical spectra in the literature (see discussion in Sec. 3.2) with few exceptions. The dip at is likely to be an artifact present only in our data, which may be due to scaled subtraction of the instrument background spectra that were measured with saline in the sample holder. Since the amount of saline in the sampling volume without sample was much larger than with a sample, this may result in a negative concentration/contribution of water in the preprocessed spectra. Another possible explanation is an insufficient instrument response correction because the dip is also present both in the spectra without background and without instrument response correction, see Fig. 2(a). The similarity of the preprocessed spectra, despite the characteristic difference in background spectra of either silica or CaF2 measurement window [see Fig. 2(b)], suggests that our method could remove most of the specific spectral features of the measurement window.
Raman Spectral Variation Within and Between Different Tissue Types
In Fig. 3(a), mean preprocessed Raman spectra of different groups of tissue are shown. The mean spectra of class 1 in class definitions A and B (consisting of either iodine positive squamous epithelium only or other nonabnormal tissue types) are grouped separately from the mean spectra of class 2 in class definitions A and D (consisting of either HGCIN or HGCIN plus HGCIN-border tissue) in the range of 450 to , 900 to , 1200 to , and 1530 to .
The normal squamous epithelium showed the largest differences to all other spectra, possibly due to anatomical differences. Squamous epithelium with dark iodine staining was mostly found decentral in CIN-containing samples, while the CIN lesions were predominantly found in central areas of cone biopsies within the transformation zones. The similarity of HGCIN spectra and HGCIN plus three times as many HGCIN-border spectra suggests that these measurement sites were either partially HGCIN (which is likely due to the chosen procedure of spectra assignment) or the adjacent tissue showed some biochemical features of CIN as it was previously suggested in Ref. 24.
Spectra of erosion were similar to spectra of HGCIN plus HGCIN-borders, except for the large peak at . This is likely due to the fact that erosion often coexisted with focal CIN, and these spectra were consequently assigned to HGCIN-border spectra as well. Spectra of erosion without colocalized CIN were found only in one sample and are, therefore, not shown here. For all the spectra shown in Fig. 3(a), the standard deviation of a single spectrum was high compared to the differences of mean spectra, see Fig. 3(b), where the standard deviation is shown for the two most different spectra.
For a better visualization of spectral differences, difference mean spectra are shown in Fig. 4. The spectra contributing to the mean spectra were chosen according to the class definitions in Table 1. As expected from the anatomical differences between squamous epithelium and the transformation zone where neoplasia typically originates, the largest spectral differences were found for normal squamous epithelium and HGCIN. The trend of the spectral differences was similar in all four cases. A Raman signal decrease of HGCIN tissue and borders compared to normal and benign tissue was observed at 450 to 500, 850 to 870, 900 to 960, 1090 to 1170, 1330 to 1410, and 1590 to 1. A Raman signal increase of HGCIN tissue and borders compared to normal and benign tissue was observed at 500 to 670, 1200 to 1320, 1410 to 1450, and 1530 to . The large difference at has been previously assigned to glycogen,49 which is characteristic for normal squamous epithelium. The large difference around was difficult to assign due to the potential influence of a background removal artifact. The peaks at and were previously reported and were assigned to glycogen49,50 as well as to collagen.50,51
A broad difference around has not been found in the literature. The difference around has been previously found21,23,25,27,28,50,51 (assigned to proteins, DNA, and lipids), however, with an opposite trend, while in Ref. 49, an increase of this spectral region was also found in cancer (assigned to collagen).
A decrease at around for precancer has been observed, which may be assigned to glycogen (at , Ref. 49). Others, however, detected an increase of the region in precancer or cancer around 1330 to (Refs. 20, 23, 25, and 27) and at 1350 to (Refs. 23 and 25), but also an increase at 1350 to is reported in Refs. 20 and 51.
The sharp decrease at has also been reported;45 however, opposite trends were also observed.27 The spectrally broad decrease of precancer/cancer at around has been previously reported in many studies,21,23,25,27,51 but also opposite results were reported.20 We believe that partially inconsistent spectral trends in the literature may be due to different sampling volumes or spectra preprocessing.
Diagnostic Accuracy for the Separation of HGCIN from non-HGCIN Depending on the Tissue Types Included in the Classification Procedure
In order to evaluate the classification performance depending on the included tissue types, separate training and evaluation was performed for each of the different class definitions described in Table 1. In addition, five classification procedures (see Table 2) were compared for each of the class definitions.
Figure 5 shows that the balanced accuracy for each class definition A to E varies among different classification procedures 1 to 5. The variation between different classification methods is smallest for class definition A. Despite the large number of spectra, the statistical uncertainty of individual balanced accuracy values was large. Using patient-wise bootstrapping resulted in a standard deviation (1 sigma) of 9% for a value obtained with the class definition A and 12% with class definition B, respectively (individual error bars in Fig. 5 were not shown to improve clarity). This large variability in classification performance might be due to the large interpatient variability of the small number of patients and the unbalanced distribution of HGCIN and non-HGCIN spectra in each sample. Consequently, it was concluded that differences among the different classification procedures were not significant in our study.
To compare the performance for different class definitions A to E, the average balanced accuracies of the five classification methods were used, which were 87% (A), 70% (B), 67% (C), 61% (D), and 69% (E). Judging the significance of these average balanced accuracies based on the statistical uncertainty of the single values obtained by bootstrapping seems not to be trivial, since it cannot be assumed that the results of the different classification methods are statistically uncorrelated. However, it was assumed that a realistic estimate of the uncertainty is between the uncertainty of a single value and the standard error of the mean value (1 to 4%).
The highest balanced accuracy was obtained when HGCIN was discriminated against normal squamous epithelium. A major result was that the balanced accuracy decreased when the heterogeneity of class 1 (non-HGCIN tissue) increased (class definitions B and E). The result for class definition E is relevant because this case is the most representative of a real imaging scenario, where HGCIN should be detected within all other non-HGCIN tissue sites. A slight decrease was observed from class definition B to C, when clinically normal (iodine positive) tissue was excluded from analysis (class definition C). The lowest accuracy was obtained when the HGCIN-border spectra are included into the HGCIN class, which is not surprising because the fraction of HGCIN-border spectra, which were actually non-HGCIN spectra, was expected to decrease the diagnostic performance.
Our very good Raman-based differentiation of HGCIN from normal squamous epithelium only (class definition A) does not represent a realistic estimate of diagnostic performance of Raman spectroscopy in a clinical setting. This is because only two special types of tissue are included in the analysis, which (in the absence of tissues with similar colposcopic appearance) could be also distinguished by application of acetic acid. Furthermore, it has been shown previously for fluorescence and reflectance spectroscopy of the cervix that the common practice of including clinically normal squamous sites in the analysis leads to artificially improved performance in distinguishing high-grade lesions from clinically suspicious non-high-grade lesions because underlying differences in tissue anatomy can have a confounding effect on spectroscopic parameters.48 Therefore, the decrease in performance when squamous epithelia (without dark iodine stain), columnar epithelium, metaplasia, and cervicitis were included into the non-HGCIN tissue (class definition B), and the further decrease in performance when the normal iodine positive tissue from class definition A is left out in class 1 (class definition C), suggests that our Raman analysis might have been influenced by confounding differences in anatomy as well. Including metaplastic tissue into the analysis seems fundamental since metaplasia is a benign transformation of columnar to squamous epithelium in the transformation zone, where most of the dysplastic lesions originate. Also, benign changes (such as metaplasia or inflammation) can have a similar colposcopic appearance as precancer, which is the reason why a Raman-based separation from precancer would be an improvement over colposcopy. Therefore, class definition B was chosen to produce maps of the predicted class memberships, which are merged with the histopathological mapping for a spatially resolved evaluation of the prediction accuracy, see Fig. 7.
The interpretation of the maps was challenged by the large statistical variance of the diagnostic performance (see Fig. 6). Different methods gave different mapping results and no method proved to be the best for all cases. In many cases, it was impossible to judge agreement with histopathology objectively due to the limited precision of the spatial correlation, especially in regions with focal CIN. Furthermore, repetitive measurements with different window materials give conflicting results despite the fact that the data measured on both window types were included in the model training. Despite these limitations, there were promising results, such as those shown in Fig. 7(a), where it is shown that the HGCIN region was predicted well, but misclassifications occurred in a region with koilocytosis. In Fig. 7(b), misclassifications are shown in a region with follicular cervicitis. In the only sample where erosion was present without isles of focal CIN, all methods classified erosion predominantly as HGCIN (data not shown). In principle, if the statistical basis was significantly better, the interpretation of the predicted class memberships of individual measurement locations would be justified. This would allow evaluation of the performance of the method for determination of the spatial extension of HGCIN or the performance for less frequent pathologies.
The diagnostic performances achieved in our study partially cover the large range of results from previously published studies,20–27 which is not surprising given the large differences in sensitivities and specificities we obtained by varying methods and groups of tissue pathologies included in either class 1 or 2 for the analysis (class definitions A to E) and by doing resampling with bootstrapping. In general, results of individual studies are difficult to compare due to the different experimental and data analysis methods used as well as due to the differences in tissue pathologies included or disease thresholds used. When comparing the average performance of all classification procedures using class definition B shown in Fig. 6, we obtained a lower sensitivity (69%) and specificity (71%) than all known previous studies. However, the results of other studies with balanced accuracies of are still within the range of the standard deviation of different classification procedures or the error of an individual method estimated by bootstrapping.
There may be various reasons for a decreased performance of our approach compared to previous studies. For example, the excellent results in Ref. 23 for the discrimination of HGCIN from low-grade CIN, metaplasia, and normal tissues have been obtained using a multiclass classifier, which was assumed to be beneficial for discrimination of HGCIN from multiple other tissue pathologies. Since the KNN method we used does not make assumptions on the distribution function of features, it should not suffer from an increased variance in the data due to the assignment of several tissue types to one of two classes. Furthermore, we can only speculate that preselection of measurement sites by colposcopy as in the previous fiber probe studies favors the measurement of tissue sites with distinct appearances in contrast to our raster scan method, with which we expected to collect spectra with a larger variance. Another possible influence on the classification performance is the interindividual variance. It was found to be larger than the intraindividual variance in this study and is captured by a smaller sample size than in other studies. In addition, we assume that our method of correlating histopathological mapping to measurement sites was more susceptible to registration/assignment errors than a study measuring on biopsy sites.
Conclusion and Outlook
Based on our results, improvements for future studies are highly desirable. Most importantly, due to the strong interpatient variability and the resulting large statistical uncertainty of the obtained diagnostic performance, a future study would benefit from larger patient numbers. However, the detailed histopathology required in our study design did not allow us (for capacity reasons) to include as many patients as it would have been possible with a typical fiber probe based study in which the reference is often colposcopy for normal spectra or histopathology of few abnormal sites per patient. Based on our obtained confidence interval, we estimated that in order to reduce the variability in spectra classification such that a 95% confidence interval of 10% () is reached, we would require patients. A larger study might be able to show whether the spatial extension of high-grade CIN can be robustly determined within all the low-grade and benign tissue pathologies that might be present on the cervix.
Some technical improvements would also be beneficial. For example, a tool for a better spatial correlation of histopathological mapping and measurement position could increase the number of spectra that can be evaluated quantitatively.
As the dominating source of noise was the very high background, which led to a strong influence of baseline and background correction, the benefits of instrumental changes such as higher laser wavelength or the use of shifted excitation or modulated Raman spectroscopy52,53 should be investigated.
Since the application of fluorescence and reflectance spectroscopy can achieve reasonable sensitivities and specificities as well,54 a multimodal approach might help to further improve the diagnostic accuracy. Designs of multimodal probes/setups have been reported and the performance of a combination of Raman with UV-visible and near-infrared (NIR)-excited fluorescence spectroscopy was investigated for other applications.55–57 For our data we did not see an improvement in diagnostic accuracy when the strength and slope of the NIR-excited autofluorescence background as well as the Rayleigh peak of the laser line were used as additional input for the classification algorithm.
Cantor et al.54 modeled the cost-effectiveness of fluorescence spectroscopy in a see-and-treat scenario. They found that if spectroscopy was able to achieve sensitivities of at least 84% and specificities of at least 76%, there would indeed be huge savings in health care dollars from “biopsies avoided” and in being able to “see-and-treat” more accurately with a loop electrical excision procedure. This is within the range of both our and previous results, which is why further activities toward translation into clinical routine use are desirable. However, huge improvements in colposcopy have been reported in a recent study.7 Detection of HGCIN with a combined sensitivity of 78% and specificity of 93% has been achieved using inner border sign, ridge sign, and the newly defined rag sign, without considering the conventional colposcopic criteria. These pathognomonic signs, introduced in the latest terminology of the International Federation for Cervical Pathology and Colposcopy (2011), are objective, present or absent, easy to see, easy to learn, with little space for subjectivity, and easier to use than a grading index.7 Consequently, a higher diagnostic performance as proposed by Cantor et al. might be required for a new technology such as Raman spectroscopy for entering routine clinical diagnostics in the future.
Previous in vivo studies have shown the diagnostic performance of colposcopy guided fiber probe based Raman spectroscopy.20–27 Our study was aimed at the development of a wide-area analysis of the cervix. One advantage of such an approach compared to the use of fiber probes is, assuming sufficient spatial resolution is provided, that small lesions would not be missed due to undersampling. Since the colposcopic information is not used to select measurement sites, such a method could, in principle, be used without colposcopy. Therefore, further studies aimed at the development of in vivo Raman imaging would seem to be desirable.
This project was cofinanced by the Senate of Berlin and by the European Union (European funds for regional development, FKZ 10147189). We thank Günter Cichon and Achim Schneider (Charité University Hospital, Department of Gynecology, Campus Benjamin Franklin) and all staff members for their activ support of the study. We would like to thank our colleagues Stefan Ey, Ihar Shchatsinin, Stefan Baar, and Lutz Krebs for their contributions to software implementation and the instrumental setup. We also thank our colleagues Daniela Schädel for useful advice and Caroline Reid and Lesley Hirst for proofreading the manuscript. Furthermore, we would like to thank the reviewers for their helpful comments.
Carina Reble graduated from the Physics Department of the Ruprecht-Karls University in Heidelberg in 2007. Since 2008 she has been a research scientist at LMTB in Berlin. She is a PhD student at the Technical University Berlin. Her degree work is concerned with Raman spectroscopic methods for analysis of skin and cervix.
Ingo Gersonde received his PhD degree in physics from Freie Universität Berlin, Germany. At the LMTB Berlin he is involved in Raman spectroscopy and remission spectroscopy. His work also includes chemometric and classification methods for analysis of spectra and sensor data.
Cathrin Dressler received her diploma in biology from Freie Universität Berlin in 1989 and joined the department of laser medicine at the Benjamin-Franklin University hospital as assistant scientist. After receiving her PhD in 1996 she joined the company Laser- und Medizin-Technologie GmbH, Berlin, as research scientist. Working in various co-operate research projects she focused on developing new nearfield- and farfield-microscopy technologies as well as nanotechnology and Raman spectroscopy.
Jürgen Helfmann received his PhD degree in physics from Freie Universität Berlin, Germany. At the LMTB Berlin he heads the resorts of medical technology and spectroscopy for diagnosis and sensor development. Working for 27 years alternately for the Institute of Medicine/Technical Physics & Laser Medicine at the Charité and the Laser Medizin Zentrum, now the LMTB, he gained extensive experience in various fields of biomedical optics.
Wolfgang Kühn is a pathologist and a physician for gynecology and obstetrics. He is specialized in colposcopy and cervical diseases and was professor and head of the department of cytology and gynecological morphology at the Charite University Clinic Berlin from 1991 to 2011. Since his retirement he works in a private medical office and lab. He is the vice chairman of the German Society for Colposcopy and Cervical Pathology and published numerous papers and books.
Hans Joachim Eichler is head of the Laser-Group at the Institute of Optics and Atomic Physics at the TU Berlin. He was previously an advisor to the German Ministry of Education and Research (BMBF) and a member of the technical staff in Bell Labs Holmdel, USA, working on nonlinear optics of silicon and semiconductor lasers. From 2001 to 2013, he acted as CTO and CEO of Laser- und Medizin-Technologie GmbH, Berlin. He has published numerous journal papers on optics and books on lasers and photonics.