Cervical cancer is one of the most common causes of death in women. It accounts for 273,000 deaths per year and represents 9% of all deaths from cancer in women worldwide.1 In 1948, Dr. George Papanicolaou described a test that was capable of detecting the early stages of cervical cancer.2 This ultimately led the American Cancer Society to suggest the implementation of cervical cancer screening for precancerous changes.3 The clinical studies and data have supported that the Pap-smear test, which is based solely on morphologic examination of exfoliated cells from the cervix, has substantially reduced the death rate due to cervical cancer in countries where it is widely used.4, 5, 6 The screening program implemented in the United States screens over Pap-smear samples each year.7 Due to the high volume of routine screening examinations and the large number of stained cells depicted on each slide, the process of visually searching for abnormal cells by subtle differences in cell morphology (i.e., the enlargement, irregularity, and hyperchromasia of nuclei) is tedious and time-consuming. In addition, because of the subjective or random selection of a limited number of analyzable cells, the manual detection method reduces the diagnosis accuracy (i.e., higher false-positive and false-negative detection rates) and introduces potential bias and inter-reader variability into the clinical practice.
Because it is difficult to read Pap-smear images and to visually detect abnormal or carcinoma cervical cells based on cell morphology, a new technology called interphase fluorescent in situ hybridization (FISH) has been investigated and used as an adjunct method to the conventional cytology.8 In pathology laboratories, pathologists apply FISH-labeled deoxyribonucleic acid (DNA) probes to detect interphase cells with numerical and/or structural abnormalities that indicate malignancy. Previous studies identified significant differences in ploidy patterns, i.e., the amount of DNA between negative and cervical cancer cells.9, 10, 11 FISH technology has a number of advantages that improve the efficiency and accuracy of screening for cervical cancer by targeting specific chromosome changes based on different DNA probes, because counting the number of FISH spots inside each identified interphase cell is much more reliable (less ambiguous) than interpreting subtle morphological features of other cytology specimens.12 In addition, since the culturing of metaphase cells is not required and the number of analyzable cells can be substantially increased, interphase FISH analysis is a more efficient detection and diagnostic approach and has a higher statistical classification accuracy than conventional karyotyping.13 After determining that the trisomy in particular chromosome types (e.g., chromosomes 3, 7, and X) had a significant impact on cervical cancer development and prognosis,8, 9, 14, 15 a number of research groups reported that interphase FISH testing using these three chromosomes achieved a higher sensitivity and specificity in detecting early cervical diseases, and it could also predict the progression of uterine cervical dysplasia to invasive cancer with a higher accuracy.16, 17, 18
Although FISH imaging technology is more reliable and efficient than conventional cytology methods, the manual FISH analysis method requires genetic laboratory technologists to subjectively select a limited number of analyzable cells (i.e., 50 to 100 cells) and manually count the number of multispectrum FISH spots within each cell. This is a labor-intensive and time-consuming task with large inter-reader variability. As a result, the accuracy of FISH imaging technology is limited, especially when detecting subtle or early cancer with low concentrations of positive or malignant cells (i.e., screening for cervical cancer at the early stage). Developing automated schemes for FISH spot detection to improve the efficiency of detecting FISH spots, has attracted much interest from researcher. Early studies applied computer schemes to detect and count FISH spots after manual selection of analyzable cell nuclei or regions of interest. These studies indicated a strong correlation of detection results between the manual methods and the semiautomated computer schemes.19, 20 Because the semiautomated methods require userintervention to exclude poorly segmented, overlapped, clustered, or nonepithelial cells and therefore are impractical in the clinical practice, some fully automated schemes have recently been developed and tested. These schemes usually include two processing and detection steps.21 The first step detects and segments analyzable interphase cell nuclei. All unanalyzable “cells” are removed, such as nuclear debris, large clusters of overlapped or touched nuclei, and nonspecific background stains. The second step aims to correctly detect FISH spots by distinguishing spots from artifacts, background noise, and splitting and partially overlapping spots.22 Previous studies have suggested that an automated interphase FISH detection method could be used as a rapid approach or a potential future screening tool for cervical cancer by identifying aneuploidy in premalignant stages of cervical cancer.15, 23
The development of fully automated schemes for FISH image analysis and signal detection faces a number of technical challenges.24, 25 First, there are large variations in shape, size, and intensity between different cell nuclei, and these variations make it difficult to correctly detect and segment analyzable interphase cell nuclei. Second, FISH image technology detects normal and abnormal or carcinoma cells based solely on the counted number of FISH spots inside the cell. However, the initially detected and counted FISH spots are not all independent. Netten reported that, due to the centromeric probes and other image noise, the splitting, overlapping, and missing FISH spots inside identified interphase cells substantially influenced the accuracy rate of FISH spots detection.26 To solve these problems, Kozubek,22 Gué,27 and Kajtar 24 reported and tested several new automatic FISH signal analysis systems. However, there are no commonly accepted rules or standards for deciding how to merge or separate the detected FISH spots. For example, Lukasova28 and Kozubek 22 reported using as the cutoff threshold to merge nearby FISH spots, while Kajtar 24 suggested using as the cutoff threshold. Instead of using the distance as a merging or splitting criterion, Gué 27 defined the following two rules: (1) close spots may be merged into one big spot with a volume roughly twice the average volume of one of the spots; and (2) if the distance between the centers of the two spots is longer than the diameter of one bigger spot, these two FISH spots represent two independent chromosomes and should be counted as two spots.
Despite the progress and encouraging results reported in a number of previous studies, the performance of FISH signal detection schemes is still limited due to the difficulty in detecting analyzable interphase cells and correctly recognizing splitting FISH spots. Kajtar reported that approximately 11% of analyzable interphase cells were not detected by the automated scheme.24 However, missing a fraction of analyzable interphase cells may not affect the final diagnostic results.13 Kajtar reported that automated schemes achieved both a higher false-positive rate and a higher false-negative rate than manual detection with and , respectively.24 Truong reported that since FISH spots demonstrate large variations in shape, size, and intensity, the automated scheme correctly detected and counted FISH spots in approximately 69% of identified interphase cells.25 This clearly indicates that further development and evaluation studies are required to improve the performance of automated schemes for FISH spot detection.
The goal of this study was to develop and test a new computer scheme that aimed to more reliably and robustly detect analyzable interphase cells and to analyze related FISH spots. Specifically, this new automated scheme includes a set of image processing algorithms and knowledge-based multifeature classification rules to: (1) search for and detect analyzable interphase cells while deleting others including the clustered cells, stain debris, and auto-fluorescent artifacts, (2) detect independent FISH spots on each identified interphase cell by merging split spots because of centrometric probes, and (3) classify between abnormal and normal interphase cells. The descriptions of our automated scheme and preliminary testing results are presented here.
Materials and Methods
We selected six Pap-smear specimens (slides) acquired from six patients who underwent annual routine cervical cancer screening examinations at the University of Oklahoma Health Sciences Center. In our genetics laboratory, a centromeric CEP 3 (D3Z1) spectrum orange probe and a centromeric CEP X (DXZ1) spectrum green probe (Vysis, Abbott Molecular Inc., Downers Grove, IL) were applied to process the Pap-smear specimens using a standard FISH procedure24 that marked the chromosome 3 and X located inside the interphase cell nuclei. Figure 1 shows a few examples of captured FISH images in which objects with blue color represent the potential interphase cells, the red color FISH spots represent chromosome 3, and the green color spots indicate chromosome X. These examples include both analyzable interphase cells and unanalyzable clusters. The unanalyzable clusters can typically divided into three types, including (1) one isolated huge region with many fluorescence artifacts [Fig. 1a]; (2) one group of the overlapped and twisted interphase cells [Fig. 1b]; and (3) clusters of many small regions without clearly separated boundaries [Fig. 1c]. Figure 1d displays four normal cells among which two cells are not very compacted and the intensities of these cells are not uniform. An experienced cytogeneticist visually examined these 150 FISH images displayed on a computer monitor screen and detected 248 analyzable interphase cell nuclei and 77 unanalyzable clusters of cells. Among the 248 analyzable cells, the cytogeneticist detected 105 normal cells and 143 abnormal cells. In each of the normal cells, two independent red color FISH spots and two independent green color FISH spots were counted, while in each abnormal cell, at least the FISH spots in the red or green spectrum (color) are not equal to two. Figure 2 displays the distribution of red and green FISH spots among these 143 abnormal cells.
An automated scheme was developed and tested to automatically detect FISH spots using selected Pap-smear specimens. The scheme included the following four steps: (1) detect analyzable interphase cell nuclei, (2) identify independent FISH spots, (3) count the number of FISH spots to analyze the number of particular chromosomes within a cell, and (4) determine normal and abnormal cells by analyzing the number of FISH spots.
Detection and segmentation of analyzable interphase cells
The first step of the scheme was designed to detect potentially analyzable interphase cells by deleting clusters, fluorescence artifacts, and stain debris. A threshold was used to create a binary image in the blue channel. To determine the threshold, the histogram of the blue component was computed. Then the scheme searched for the largest peak value of the histogram as a threshold. All pixels with digital values larger than the threshold were assigned “1” and others were assigned “0.” Then a component-labeling algorithm29 and a raster scanning method were applied to identify the initial regions of interest (ROI). A morphological opening filter followed to separate any adjacent “touching” or connected areas and delete small isolated areas.
After identifying and segmenting ROIs, the scheme classified the labeled ROIs into analyzable and unanalyzable cells. For this purpose, the scheme first computed the following three image features for each ROI: (1) size (S) was computed by counting the number of pixels inside a labeled region; (2) circularity was defined as , the number of pixels located inside an intersection between a labeled region and an equivalent circle that was in the center of the labeled region, divided by the number of pixels located inside the labeled region alone30; and (3) compactness was computed as , where and are the perimeter and size of a labeled region, respectively.
To define the classification rules based on these three features, we applied a proven robust training method to define and optimize this rule-based classifier.31 Specifically, we plotted the scatter diagram of each feature and computed the corresponding mean and standard deviation of the analyzable cells and unanalyzable “cells” identified by the cytogeneticist. A threshold for each feature was decided by the boundary line passing through , where and are the mean and standard deviation of the group of analyzable cells. Thus, our scheme used three rules to delete unanalyzable “cells:” (1) when the size of a labeled region was larger than the threshold ; (2) when the circularity of a labeled region was smaller than the threshold ; and (3) when the compactness of a labeled region was larger than the threshold . If any of the above three conditions was satisfied, the detected and labeled object (region) was classified as an unanalyzable cell and deleted. The remaining labeled regions were classified as analyzable cells so the scheme would be applied to detect and count FISH spots inside the cell in the next step.
Detection of FISH spots
To detect FISH spots, the scheme used a threshold method to define two binary images. In RGB space, each pixel value has three components (R, G, and B). To detect red FISH spots, the scheme generated a red component-based binary image. If a pixel value in the original image satisfied the condition that the R component was larger than both the G and B components, the scheme set this pixel value as 1; otherwise, it was set as 0 in this R component image. The same method was applied to create a binary image for the green component, which was used to detect green FISH spots. In RGB space, if a pixel value satisfied the condition that the G component was larger than both the R and B components, the scheme set this pixel value as 1; otherwise, the pixel value was set as 0. Similarly, the component labeling and raster scanning methods were implemented to detect and count potential FISH spots.
Based on the positions of identified interphase cells, red FISH spots, and green FISH spots, the scheme identified and classified whether a red or green spot belonged to one of the three groups, namely, a FISH spot that was (1) fully located inside an analyzable interphase cell, (2) outside the cell, or (3) partially inside and partially outside the cell (crossing the cell boundary). As long as more than 50% of a spot’s area was located inside the cell boundary, the FISH spot was classified as belonging to the cell and would be further processed; otherwise, it was deleted like other FISH spots located in the image background area. Through this step, the scheme deleted all initially detected red and green FISH spots located outside the analyzable cells.
The purpose of detecting and counting FISH spots is to detect the number of targeted chromosomes within a cell. However, FISH spots can be very noisy due to splitting or overlapping FISH spots. For example, a splitting FISH spot indicates that a specific targeted chromosome may be represented by multiple FISH spots. The biggest challenge in automatically detecting and counting FISH spots is how to identify splitting FISH spots and that avoid repeatedly counting FISH spots, that represent the same targeted chromosomes. In this scheme, we built a knowledge-based expert classifier to identify splitting FISH spots. For this purpose, after identifying FISH spots inside the analyzable interphase cells, the scheme first computed the following six features: (1) the total number of labeled red spots and green spots ; (2) the size of each labeled spot; (3) the circularity of each labeled FISH spot; (4) the average intensity of each labeled spot; (4) the gravity center of each FISH spot; (5) the effective radius of each red or green spot computed as the radius of a circle that had the same size as the labeled spot; and (6) the distances between the same color spots. These features are illustrated in Fig. 3a .
To use these features to build a knowledge-based classifier, we discussed them with an experienced cytogeneticist and observed how he visually detected and counted independent FISH spots. We found that the effective FISH spots should have either bright and compact oval shapes or stringy and diffuse oval shapes. Because of the centromeric probe, the splitting spots should be counted as one spot instead of two. Figure 3a summarizes related features of spots in interphase cells. Figure 3b shows the criteria or rules that we summarized to identify effective FISH spots. Figure 3c shows that a normal interphase cell contains two red and two green FISH spots, and one of the green spots is oval and compact while the other green spot and two red spots are all stringy and diffuse. Figure 3d displays another example of a normal cell in which one red spot splits and the actual number of independent red FISH spots would be counted as two instead of three. Figure 3e shows an example of an abnormal cell comprised of three red spots and three green spots. In this abnormal cell, two red spots are split; therefore, the total number of red spots is three instead of five.
Based on our observations and the discussed examples, we designed and implemented a knowledge-based multifeature classifier in our computer scheme to identify and count the independent number of FISH spots. Figure 4 is a flow diagram of the classifier used to recognize the splitting, stringy, and diffuse FISH spots depicted inside the identified interphase cells. These classification rules are listed as follows:
1. If the objects are red or green spots and their radii are smaller than the threshold , the spots are considered image noise. These spots are deleted by assigning the value of corresponding pixels to zero.
2. A bubbling method is utilized to calculate the distances between two red or two green spots. If two spots belong to the same interphase cell, the distance between them is calculated; otherwise, the scheme continues to identify whether the red or green spots are located in the same cell. For -identified FISH spots in one cell, the sum of the computed distances between one spot and any of the other spots is . The scheme analyzes each distance between two FISH spots.
3. If the distance between two FISH spots ( and ) satisfies , the scheme identifies them as two independent FISH spots.
4. If the distance between two spots ( and ) satisfies , these two spots are selected as candidates to represent a split FISH signal (spot). The scheme compares their sizes and . If both of them are larger than the size threshold (e.g., , which is calculated by the average size of spots in all analyzable cells in our available dataset), the scheme identifies these two spots as two independent spots; otherwise, the scheme compares their average intensities and . If their intensity difference is larger than (e.g., , which is decided by our experiments), the scheme continues to identify these two spots as independent FISH spots. If all of the above conditions fail, the scheme determines that these two FISH spots are split spots and counts them as one FISH spot.
5. If the distance between two spots ( and ) satisfies the equation , these two FISH spots are also selected as candidates for the splitting FISH spots. The scheme then compares their sizes and . If the ratio between the two sizes is larger than , it identifies them as split spots and assigns these two spots to one independent FISH spot. After identifying the splitting FISH spots, the scheme reassigns two FISH spots as the same labeling number recorded in the image buffer. This process iteratively performs until all identified and labeled FISH spots are analyzed.
One unique characteristic of our classification scheme is that unlike previous studies in which the cutoff threshold to merge the split FISH spots was empirically selected and fixed (i.e., 24), the detected FISH spot merging threshold in our scheme was automatically and adaptively determined based on the computed features of FISH spots. The potential advantage of this approach was tested in this study.
Classification between normal and abnormal cells
Each normal interphase cell extracted from Pap-smear examination specimens should include only two chromosome 3 and two chromosome X,8 which are represented in this study by the red and green spots, respectively, according to the number of detected and counted red and green FISH spots inside each cell. Thus, our scheme detects and classifies between normal and abnormal interphase cells. A normal interphase cell contains two counted FISH spots for chromosome 3 and two green spots for chromosome X. If the number of either red or green spots in a cell is not equal to two, this cell is classified as an abnormal cell.
We applied this new computerized scheme to 150 selected FISH images in our dataset. The performance of the scheme was visually evaluated and quantitatively compared with the detection and classification results reported by the cytogeneticist. The comparison results were tabulated, and the Kappa coefficients for agreements between automated and manual (visual) analysis results were computed for detecting analyzable cell nuclei and classification between normal and abnormal cells.
We first plotted and visually examined the results of the automated scheme on cell detection, including the detection error or difference between the analyzable and unanalyzable (or cluster) cells. Figure 5 displays the distribution differences of three features: size, compactness, and circularity between analyzable interphase cells and unanalyzable clusters. These comparisons show that most of the unanalyzable “cells” or clusters were huge and nonuniform regions with irregular shapes, while the analyzable cells were typically small, compact, round regions. The circularities of all analyzable cells were larger than 0.6 and their compactnesses were smaller than 400 pixels. The size of all cells ranged from 2,000 to 22,500 pixels.
We then plotted and visually examined the automated FISH spot detection and counting results. Figure 6 shows the distribution of the distances between splitting FISH spots in normal cells. It demonstrates that the largest distance between splitting FISH spots was around . Figure 7 displays examples when this scheme was applied to two FISH images acquired from Pap-smear specimens in our dataset. In the first example [Fig. 7a], our scheme detected and segmented the mixed interphase cells into a single analyzable cell [Fig. 7b] and an unanalyzable cell (cluster); while in the second example [Fig. 7d], the scheme detected three analyzable cells and one unanalyzable cell [Fig. 7e]. The FISH signal spots detected and counted by the automated scheme within each cell are also tabulated in Fig. 7. Figure 8 shows FISH spot counting results among a set of analyzable cells. Figures 8a, 8b, 8c, 8d, 8e, 8f, 8g, 8h show eight normal cells, and Figs. 8i, 8j, 8k, 8l, 8m, 8n, 8o, 8p display eight abnormal cells. Both the cytogeneticist and our automated scheme obtained the same classification results for these 16 cells. Figures 8q, 8s, 8t display examples of the interphase cells in which there was disagreement in the number of counted FISH spots between the cytogeneticist and our scheme. Specifically, the number of red FISH spots in Figs. 8q and 8s were counted by the cytogeneticist as two, three, and two, respectively. Because these red spots were very close together, the scheme identified them as splitting spots from one FISH spot and combined them. As a result, the number of red spots in Figs. 8q and 8s was counted as one by our scheme. Figure 8t displays another example of a normal cell. Because the intensity of one green FISH spot was quite low, it was missed by the scheme. Thus, the number of green spots was counted as one by the scheme instead of two as visually counted.
The results of manual and automated segmentation and classification between analyzable and unanalyzable interphase cell nuclei are summarized in Table 1 . The results indicate that 98.7% (76 out of 77) and 96.4% (239 out of 248) of detected and segmented cell regions in the dataset were assigned to unanalyzable cell clusters and analyzable interphase cells by both the cytogeneticist and our automated scheme using a set of knowledge-based classification rules. The corresponding Kappa coefficient for agreement was 0.917. The classification performance to distinguish between normal and abnormal interphase cells by counting the number of red and green FISH spots is also summarized in Table 2 . The agreement of FISH spot counting and classification results between the cytogeneticist and our scheme was 90.5% (95 out of 105) for normal cells and 95.8% (137 out of 143) for abnormal cells. The corresponding Kappa coefficient was 0.867. These agreement results (Kappa coefficients) indicate the high agreement level between a cytogeneticist and our computer scheme in both detecting analyzable interphase cell nuclei and counting the independent FISH spots.
Comparison results of analyzable interphase cells between a cytogeneticist and the computerized scheme.
|Data classifiedby acytogeneticist||Automated scheme||Agreement rate|
Comparison results of normal and abnormal cells between a cytogeneticist and the computerized scheme.
|Data classifiedby acytogeneticist||Automated scheme||Agreement rate|
In this study, an automated scheme was developed and tested to detect analyzable interphase cells and count two sets of FISH spots that targeted chromosomes 3 and X using Pap-smear testing specimens. To improve the scheme performance, we implemented several unique approaches. First, instead of using a set of thresholds determined by the separations on the scatter-plot diagrams,24 we computed a set of image features and developed a knowledge-based classifier to identify analyzable interphase cells by deleting unanalyzable cells and debris. Second, identifying splitting FISH spots is the most difficult and important task in developing automated schemes, because this identification it determines not only the true number of FISH spots, but ultimately the diagnostic results of the FISH image examinations. Unlike previous schemes that were implemented with a fixed cutoff threshold to merge splitting FISH spots,24, 28 we first observed and discussed with a cytogeneticist how clinicians visually identify splitting FISH spots before we implemented the classification rules in our scheme. We then developed a knowledge-based classifier using a set of floating or adaptive thresholds that depend on the actual shape, size, and intensity of the detected FISH spots to merge the split FISH signal spots. Our scheme was applied to an image dataset involving 150 FISH images randomly acquired from six Pap-smear specimens.
The capability and performance level of our scheme have been assessed and compared with manual results evaluated by an experienced cytogeneticist for both identifying analyzable interphase cells and detecting FISH spots. Similar to many other previously reported studies, we used the manual classification results provided by an experienced cytogeneticist as the “ground-truth” (reference), and our scheme achieved a high performance level by correctly detecting 96.4% of analyzable interphase cells as well as 90.5% of normal cells and 95.8% of abnormal cells. These performance results are encouraging compared with previous studies. For example, a recent study reported a 0.8 true-positive rate (TPR) and a 0.4 FPR in detecting analyzable cell nuclei, as well as sensitivities of about 92% and 80% for detecting red and green spots, respectively, at a FPR rate of about 25%.21 Although different image datasets were used in different studies so the performance of these schemes cannot be directly compared, we believe that the performance of our scheme is very comparable or higher than those achieved by the available automated schemes.
Despite the encouraging results, we also recognize that this preliminary study has a number of limitations. First, our scheme generates disagreement in a number of cells with the cytogeneticist in counting FISH spots, as shown in Figs. 8q, 8r, 8s. The major reasons for the disagreement or detection error are: (1) the error of our adaptive threshold to merge FISH spots that are not visually considered splitting by the cytogeneticist, and (2) the relatively low sensitivity of the scheme in detecting low-intensity or low contrast FISH spots. Second, while human eyes can segment a fraction of analyzable cells from some clustered or overlapped cells to count FISH spots, our scheme fails to segment these analyzable cells from the “cluster.” Third, due to the limitation in the size of our current image dataset, the robustness of our scheme has not been fully tested and evaluated. Finally, since these 150 FISH images were randomly acquired from six Pap-smear specimen slides, it is not possible to use this automated scheme to generate a diagnostic result index of the testing case at the current stage. To overcome these limitations, we are currently developing at our laboratory a fluorescence microscopic digital image scanning system, which aims to acquire all interphase cells (analyzable and unanalyzable) depicted on one FISH image slide. Thus, we will be able to systematically acquire more FISH images from a testing Pap-smear specimen and apply our scheme to generate a likelihood score of a testing case being positive for cervical cancer.
In conclusion, automatic interphase FISH image analysis technology provides a promising biomedical optical imaging tool to screen and detect cervical cancer and other diseases. Despite the significant progress made and reported by a number of research groups, many issues and difficulties remain in developing automated schemes for FISH image analysis. In this study, we proposed and implemented several new image feature analysis and classification approaches to improve the accuracy of identifying analyzable interphase cells and detecting FISH spots. The preliminary testing results are encouraging. In order to assist clinicians to quickly scan and diagnose FISH images acquired from Pap-smear samples in a busy clinical practice, the performance and robustness of our scheme needs to be further tested by using a much larger and more diverse image dataset before the scheme can be applied in clinical practice for assisting cervical cancer screening and diagnosis.
The authors would like to acknowledge the support of the Charles and Jean Smith Chair endowment fund.