Cervical cancer is the second most common cancer in women worldwide, and 80% of cases occur in the developing world where the fewest resources exist for management.1 Most cases of cervical cancer can be prevented through screening programs aimed at detecting precancerous lesions. Screening for cervical neoplasia using the Papanicolaou (Pap) smear, followed by colposcopy, biopsy, and treatment of neoplastic lesions has dramatically reduced the incidence and mortality of cervical cancer in every country in which organized programs have been established.2 This is due to the fact that the precursors, denoted as cervical intraepithelial neoplasia (CIN) or squamous intraepithelial lesions (SIL) can take 3 to 20 yr to develop into cancer. However, due to the lack of resources and infrastructure, 238,000 women die each year of cervical cancer; more than 80% of these deaths occur in developing countries.1, 3 We are interested in applying optical technologies to replace expensive infrastructure for cervical cancer screening in the developing world.
The use of direct visual inspection (DVI), visual inspection with acetic acid (VIA), and visual inspection with Lugol’s iodine are being explored as alternatives to Pap smear and colposcopic examination in many developing countries. 4, 5, 6, 7 Recent reviews of the performance of these methods found that they have sufficient sensitivity and specificity, when performed by trained professionals, to serve as viable alternatives to Pap screening in low-resource settings.8, 9, 10 Table 1 shows the results of a review by Sankaranarayanan, 11 which favorably compares these methods to human papillomavirus (HPV) testing and conventional cytology.
Results of pooled analyses from Sankaranarayanan 11
|Screening Methodology||Number of Studies Reviewed||Number of Patients||Percent of Patients with HG and Greater||Number of Patients within HG Range||Sensitivity Pooled||Sensitivity Range||Specificity Pooled||Specificity Range|
|HPV HC II||4||18,085||7||6–9||0.67||0.46–0.81||0.94||0.92–0.95|
Because DVI relies on visual interpretation, it is crucial to define objective criteria for the positive identification of a lesion and to train personnel to correctly implement a program based on these criteria. Denny noted that restricting the definition of a positive VIA test to a well-defined acetowhite lesion significantly improved specificity, while reducing sensitivity.10 In a series of 1921 women screened in Peru, Jeronimo found that the DVI positivity rate dropped from 13.5% in the first months to 4% during subsequent months of a 2-yr study; the drop in positivity rate was hypothesized to be due to a learning curve for the evaluator.12 Bomfim-Hyppolito investigated the use of cervicography as an adjunct to DVI. A simple Sony digital camera was used to photograph the cervix before and after the application of acetic acid.13 Photographs were later interpreted by an expert colposcopist. The addition of cervicography improved both sensitivity and specificity. However, this approach prevents the implementation of see-and-treat strategies in low-resource settings because of the need for expert review.
Recently, optical techniques have been investigated as an alternative detection method in a quantitative and objective manner. Several studies have demonstrated that optical spectroscopy has the potential to improve the screening and diagnosis of neoplasia. Ferris studied multimodal hyperspectral imaging for the noninvasive diagnosis of cervical neoplasia.14 They reported a sensitivity of 97% and a specificity of 70%. Huh 15 measured the performance of optical detection of HGSIL using fluorescence and reflectance spectroscopy, finding a sensitivity of 90% and a specificity of 70%.
As another promising application of optical techniques for cervical cancer screening, a number of studies investigated whether digital image processing techniques could be used to automate the interpretation of colposcopic images. 16, 17, 18, 19, 20 Craine and Craine17 introduced a digital colposcopy system for archiving images and visually assessing features in the images. Shafi 18 and Cristoforoni 19 used a digital imaging system for colposcopy, which enables image capture and simple processing. To assess the various colposcopic features, the acquired images were manually analyzed by an expert. By examining the relationship between colposcopic features and histology outcomes, Shafi ’s study provided information about which features are most useful to the expert observer. Image interpretation in these early studies mainly relied on experts’ qualitative assessment of colposcopic images and provided limited quantitative analysis. Li developed a computer-aided diagnostic system using colposcopic features such as acetowhitening changes, lesion margin, and blood vessel structures.20 They prototyped image processing algorithms for detection of those features and showed promising preliminary results. However, the diagnostic performance of the system has not been reported.
Recently, advances in consumer electronics have led to inexpensive, high-dynamic-range charge-coupled device (CCD) cameras with excellent low light sensitivity. At the same time, advances in vision chip technology enable high-quality image processing in real time. These advances may enable the acquisition of diagnostically useful digital images of the cervix in a relatively inexpensive way, with or without magnification. Moreover, automated analysis algorithms based on modern image processing techniques have the potential to replace clinical expertise, which may reduce the cost of screening. The purpose of this study is to explore whether digital colposcopy, combined with recent advances in camera technology and automated image processing, could provide an inexpensive alternative to Pap screening and conventional colposcopy.
Several challenges for diagnostic digital colposcopic image analysis remain. First, previous studies have investigated only a few features and have not taken advantage of the mature field of image analysis and computer-automated techniques. Second, previous studies have compared image features of selected normal and abnormal areas of the cervix, but have not applied the approach to an entire image to identify whether lesions are present. Finally, previous studies have used biopsies from selected areas as the gold standard. A gold standard is necessary for the entire field of view to address the issue of lesion localization.
In this pilot study, we explored the potential of an approach to automated image analysis of white-light colposcopy images that addresses the three challenges just discussed. First, our approach employed automated image analysis techniques including image registration, pattern recognition, clustering, and classification. Second, our algorithm can identify high-grade precancerous tissue areas from an entire image. Third, a gold standard for the entire cervical image was constructed using a whole cervix specimen acquired from a loop electrosurgical excision procedure (LEEP), which was intensively sectioned.
Materials and Methods
A multispectral digital colposcope (MDC) was developed to acquire reflectance images of the entire cervix with white-light illumination. The MDC consists of a commercially available, tilt-stand colposcope (Model 1DL, Leisegang, Germany) with a video-rate color CCD camera and frame grabber for image acqusition. Details of this instrument can be found in papers by Benavides 21 and Park 22 The colposcope produces stereoscopic vision at magnification. Reflectance images were captured using an inexpensive, commercially available, video rate, color CCD camera (CV S3200 Rev. B, JAI, Japan).
The study protocol was reviewed and approved by the institutional review boards at the University of Texas M.D. Anderson Cancer Center, the University of Texas Health Science Center in Houston, the Lyndon B. Johnson County Hospital and the Harris County Health District, the British Columbia Cancer Agency, Rice University, and the University of Texas at Austin. Eligible patients were at least 18 yr old, not pregnant, and were referred to the colposcopy clinic with an abnormal Pap smear. Written informed consent was obtained from each participant, and all patients underwent a history, complete physical exam, Pap smear, cultures for gonorrhea and chlamydia, pancolposcopy (including cervix, vagina, vulva, and perianal area), and colposcopically directed biopsies. Only patients with HG lesions who were scheduled at a previous visit were eligible for the study. Following colposcopic examination, but prior to the treatment with an LEEP, white-light reflectance images were acquired with the MDC from each patient at baseline. A second set of reflectance images was acquired following the application of acetic acid (6%) on the cervix for . Acetic acid enhances the differences in appearance between normal and dysplastic tissue. The patients then underwent an LEEP procedure. After the LEEP, for histopathology map construction, the margin boundary of LEEP was carefully marked on one of the acquired MDC images. The specimen was inked and cut by the study pathologist. Each ectocervix was cut into 12 pieces. At each trial site, only the expert study pathologists reviewed all the hematoxylin and eosin (H&E) stained slides jointly, noting areas of CIN, squamous epithelium, columnar epithelium, and transformation zone. Standard diagnostic criteria were used as the gold standard by the study pathologists to classify histopathologic samples as HG SIL, low-grade SIL, or negative for dysplasia.23 The histopathologic slides were scanned, preserving real tissue size, and then the images were uniformly enlarged by 15% to account for shrinkage by formalin,24 under the assumption of uniform shrinkage. The histopathology images were then reconstructed into a 3-D “pathology map” delineating areas of intraepithelial neoplasia using 3-D image Visualization (Able Software, Lexington, Massachusetts). When the map was constructed, the ratio between the actual size of each pixel of MDC-measured images and that of the scanned images was calculated to establish the appropriate mapping between the MDC image and the histopathology map. The position of the LEEP specimen boundary and the transformation zone were also incorporated to correlate between MDC images and the histopathology images. After construction, pathology maps were reviewed by the study clinicians and pathologists and compared to the white light reflectance MDC images.
Image preprocessing was performed to remove impulse noise spikes, specular reflection, and regions obscured by the presence of blood on the tissue surface. To properly register images obtained before and after acetic acid application, an automatic registration algorithm was developed. In the registration algorithm, the location of the cervical os was used as a reference point.
Our approach to automated digital image analysis for screening for cervical neoplasia can be divided into two major steps. First, tissue regions with similar optical properties are clustered together. Second, classification algorithms are used to determine whether these regions contain neoplastic tissue. These steps are now described in detail.
Classification of Localized Image Areas
To identify optical parameters that are most useful to determine whether localized image areas contain neoplastic tissue, the original red-green-blue (RGB) color intensity feature space was extended and features for image analysis were selected from this extended feature space. The extended feature space is composed of five categories of features: (1) raw RGB intensity values, (2) ratios of individual RGB intensity values, (3) differences in individual RGB intensity values, (4) differences in RGB intensity ratios, and (5) changes in gray-scale intensity values following application of acetic acid. For each of the pre- and post-acetic-acid images, the ratios of all possible pairs of features were calculated. Acetic-acid-induced differences in RGB intensity values, ratios, and gray-scale values were also calculated. Five features were selected as potentially useful for classification: intensity values for red, green, and blue channels; the ratio of intensity in the green channel to that in the red channel; and the changes in gray-scale intensity values.
Given these five features, we next developed classifiers to estimate the tissue type of a localized region. We designed an ensemble classifier, wherein a number of different classifiers are applied and then the final classification is based on the most highly predictive of the individual classifiers. The four classifiers in the ensemble are a linear classifier with Euclidian distance, a linear classifier with Mahalanobis distance, a -nearest neighbor (KNN) classifier with eight neighbors, and a support vector machine with a linear kernel.
The classifiers were trained using leave-one-patient-out cross-validation. In this method, the algorithm is trained on all data excluding one patient and then applied to the held-out patient; then the process is repeated successively for all patients in the data set.25
This is very useful in small studies, but may overtrain the algorithm. For each patient, image regions were identified corresponding to histopathologically proven areas of normal epithelium or low-grade squamous intraepithelial lesions (LGSILs) and HGSILs. Because an HGSIL is treated and an LGSIL is followed, HGSILs were considered to be abnormal, while areas of normal tissue and LGSIL were considered to be normal. Training data were obtained using a window-based approach wherein -pixel windows were selected from areas of normal tissue or minor atypia, LGSIL, and HGSIL within an image. For each window, the five features were calculated for each pixel. Then the mean, the standard deviation, the 95th percentile, and entropy of each feature were calculated for the -pixel window.
The -pixel windows forming the training set were sampled in three different ways to assess the effect of training data on the resulting classifier performance. In the first approach (training data set 1) abnormal areas were selected from regions that were both pathologically HGSIL and in which the acetic acid intensity change was visually obvious. In the second approach (training data set 2), abnormal areas were selected from pathologically HGSIL regions in which the acetic acid intensity change was visually less apparent. In the third approach (training data set 3), abnormal data were selected based only the pathology map and without reference to the colposcopic images. The classifier performance was tested using the method of cross-validation and evaluated using receiver-operator characteristic curve (ROC) analysis. The classifier designed with training set data was then used to classify entire images as described below.
The previous section described how features useful for classification were identified, and how classifiers were developed for local image regions. However, for truly automatic image interpretation, procedures must be developed to automatically segment the image into locally similar regions.
Local regions for analysis were segmented using a variant of the -means clustering algorithm. After segmenting the image into local regions using the cluster algorithm, image classification was performed using the algorithm developed in the previous section. For each cluster, windows of pixels were randomly selected from the cluster and, based on the classification results, a probability of abnormality was calculated and assigned to the cluster. The abnormal regions were then defined as those clusters for which the probability of abnormality exceeds a preset threshold value; the threshold was chosen in such a way that the selected probability yields a Pareto optimal point from the ROC curve analysis. The ROC curve analysis is described in Sec. 3.
To assess the classification results of each image, a grid-based approach for calculating sensitivity and specificity was used. In this approach, an image is divided into grids and the classification results of the pixels in each grid of an image are compared with the corresponding pathology map. If the number of misclassified pixels is less than 20% of the number of pixels in the grid, then the grid point is considered to be correctly classified. Using this approach, the sensitivity and the specificity of an image are defined as follows:
To assess the overall sensitivity and specificity of the automated image analysis procedure, the threshold probability resulting in the most desirable sensitivity and specificity values was determined in the Pareto sense.26 This threshold probability was applied to all patients’ image classification results. If an image contains an area of HGSIL whose radius is greater than , then the patient is classified as a HG patient. Using this approach, the sensitivity and specificity are defined as follows:
Pre- and post-acetic-acid white-light images were measured from 29 patients. Histopathology examination revealed 21 patients (73%) with CIN 2 or 3 lesions in the ectocervix, 2 patients (6%) with CIN 1 lesions, and 6 patients (21%) with no evidence of CIN. These eight patients each had a previous biopsy showing HGSIL.
Figure 1 illustrates the training data acquisition. As illustrated in the figure, abnormal data for the training set were sampled from pixels in HGSIL regions, and normal data for the training set were sampled from histopathologically normal or LGSIL regions.
Figure 2 illustrates the change in the ratio of the intensity in the green channel to that of the red channel induced by acetic acid for the training set data identified as histologically normal and abnormal. The application of acetic acid does not substantially change the green-to-red intensity ratio measured from normal tissue, as shown in both the histogram and the box plot. However, the green-to-red intensity ratio increases for abnormal samples following application of acetic acid, as shown in both the histogram and the box plot. Following application of acetic acid, this parameter shows considerable separation between the normal and abnormal samples, and this intensity ratio was one of the features selected for use in the classification algorithm.
As shown the box-and-whisker plots in Fig. 2, for both normal and abnormal samples, the data dispersion is large and does not change much before and after acetic acid application. However, for normal samples, the median values before and after acetic acid application are similar, while abnormal samples show a considerable increase in the median value after application of acetic acid. Based on this observation, the 95th percentile values are used as a diagnostic feature for image classification.
The performance of the proposed multiclassifier for localized image areas with three different training data sets was analyzed and the resulting ROC curves are shown in Fig. 3 . As shown in Fig. 3, training data set 1 yielded the best results; these data were sampled from those regions where both the pathology map indicated HG disease and the colposcopic images showed significant changes after addition of acetic acid. The performance of training data set 3, where image regions were chosen based only on the pathology map, is better than that of training data set 2, where abnormal areas were selected from pathologically abnormal regions in which the acetic acid intensity change was visually less apparent. For each ROC curve, the point with the most desirable sensitivity and specificity values was selected among Pareto-optimum points.26 The selected point for training data set 1 corresponds to a sensitivity of 91% and a specificity of 80%. The optimum point for training data set 2 provided a sensitivity of 82% and a specificity of 69%, and that of training data set 3 corresponds to a sensitivity of 87% and a specificity of 83%.
Figure 4 shows before and after acetic acid images for a patient with CIN 1, 2, and 3, together with the pathology map and the image classification result. The high-probability regions in the disease probability map correlate well with regions of CIN 2 and 3 in the histopathology map. Figure 5 shows similar data for a patient with a completely normal ectocervix. The highest probability of abnormality in the probability map is 17%, and for any threshold value greater than 17%, all regions are diagnosed as normal. Figure 6 shows the performance of image classification for 29 patients using the ROC curve. The classification result of each patient was assessed using a grid-based approach. Using the average ROC curve, the threshold probability (0.68) with the most desirable average sensitivity (82%) and specificity (73%) values was determined in the Pareto sense.26
Figure 7 shows the results of the automated image diagnosis after applying the threshold probability (0.68) for 29 patients with a normal or LGSIL or HGSIL histopathology map. Patients 1 to 8 had a normal or LGSIL histopathology map (eight patients). For patients 1 to 7, the diagnosis based on automated image analysis matches the pathology report. However, for patient 8, there is a discrepancy between the automated image analysis result and the pathology report; the automated image analysis identifies a considerable abnormal area, but the pathology map indicates that there is no abnormal tissue present.
Patients 9 to 29 shown in Fig. 7 demonstrated to have CIN 2 or 3 by LEEP (21 patients). For 15 patients where the classification algorithm yielded a true positive result, the area of HGSIL identified from the classification algorithm correlates well with that from the histopathology map (patients 9 to 23). Had the algorithm been used to determine whether treatment was indicated, it would have resulted in a correct decision. For six patients, the classification algorithm yielded an incorrect result (patients 24 to 27) or inconclusive result (patients 28 and 29). In four of these patients, the automated image analysis procedure did not identify areas of HGSIL later identified in the LEEP. In two patients, areas of HGSIL were identified in the image analysis procedure, but they did not correspond to the locations where HGSIL was later documented in the histopathology map. The resulting sensitivity and specificity values are 79% and 88, respectively, for 27 patients, not including the two inconclusive results.
This pilot study shows promising agreement between our white-light image analysis results and histopathology with a sensitivity of 79% and a specificity of 88%. This is comparable to the performance of conventional colposcopy, where the sensitivity and specificity have been reported27, 28, 29 to range from 87 to 96% and 34 to 85%. The reported sensitivity and specificity values were obtained without considering the inconclusive results of two patients. If they are considered to be false negative, the sensitivity is diminished to 71%. If they are considered to be true positives, the sensitivity increases to 81%.
Several other image processing approaches have been explored to aid in colposcopic image interpretation. Intensity change due to acetic acid application has been utilized in many diagnostic image analysis studies. Pogue 30 analyzed acetowhitening of HGSIL lesions of six patients with biopsy confirmed CIN 2/3. They showed the temporal and spatial changes of green to red intensity ratio as an important acetowhitening feature to separate cervical lesions from normal regions using time-course reflectance images. However, they did not report diagnostic performance using this feature. Balas31 assessed the alteration in the light-scattering properties of the cervix using the maximum contrast between acetic acid responsive and nonresponsive areas for 16 patients. The time course of the intensity of backscattered light was adopted as an important diagnostic feature. The study showed the diagnostic potential of this technique; however, the quantitative diagnostic performance of the method was not reported. In our study, we developed a gold-standard based approach to rigorously assess sensitivity and specificity in the image classification context. This enabled us to quantitatively assess the performance of our system.
A potential weakness of this study is that it was performed on patients with HG lesions; the predictive value may be lower in a screening population. In addition, our approach is limited to the detection of neoplastic lesions on the ectocervix only. To find lesions located within the endocervix, additional information is required. Moreover, our proposed approach may fail to detect precancerous lesions, which do not show acetowhitening. There may be lesions that do not have keratin and thus are not acetowhite. We suspect that glandular lesions, both in situ and invasive, may not be detected using this algorithm. Early and advanced invasive squamous cancers are often erythematous and not acetowhite. We are continuing to explore the addition of fluorescent light to enable detection of those lesions that are not keratinized and thus not acetowhite.
A major advantage of digital colposcopy with automated image analysis is that it requires only limited resources, so clinical application of this technique could provide an inexpensive alternative to Pap screening and conventional colposcopic assessment. Furthermore, this technique could also provide a map showing where to perform biopsies for providers with less experience. In the future, with further research to help connect point spectroscopic based detection and multispectral imaging, this technique may provide an inexpensive, in vivo screening solution in an automated manner. For practical applications of this technique, its cost effectiveness will need to be further analyzed and compared with those of other screening alternatives in low-resource settings.
Digital colposcopy, if the equipment can be produced inexpensively, may provide an alternative to DVI, VIA, and VILI for screening in low-resource settings. While DVI is less expensive in terms of instrumentation and methods, it is an entirely provider-dependent screening test. Its success depends on extensive formal training and experience. In the developing world, where health care resources are limited and there are few trained health care providers, screening tools with minimal training requirements are essential.32 Digital colposcopy with automated image analysis has the potential to reduce the cost and time associated with training, leading to faster development and deployment of effective screening programs for cervical cancer throughout the developing world.
Support from the National Cancer Institute, Program Project Grant PO1 CA 82710-04 is gratefully acknowledged. We would also like to acknowledge the contributions of the research staff (Sylvia Au, Olya Shuhatovich, and Trey Kell) and nurse colposcopists (Judy Sandella, Alma Sbach, and Karen Rabel).