In the popular Nottingham histologic score system for breast cancer grading, the pathologist analyzes the H and E tissue slides and assigns a score, in the range of 1-3, for tubule formation, nuclear pleomorphism and mitotic activity in the tumor regions. The scores from these three factors are added to give a final score, ranging from 3-9 to grade the cancer. Tubule score (TS), which reflects tubular formation, is a value in 1-3 given by manually estimating the percentage of glandular regions in the tumor that form tubules. In this paper, given an H and E tissue image representing a tumor region, we propose an automated algorithm to detect glandular regions and detect the presence of tubules in these regions. The algorithm first detects all nuclei and lumen candidates in the input image, followed by identifying tumor nuclei from the detected nuclei and identifying true lumina from the lumen candidates using a random forest classifier. Finally, it forms the glandular regions by grouping the closely located tumor nuclei and lumina using a graph-cut-based method. The glandular regions containing true lumina are considered as the ones that form tubules (tubule regions). To evaluate the proposed method, we calculate the tubule percentage (TP), i.e., the ratio of the tubule area to the total glandular area for 353 H and E images of the three TSs, and plot the distribution of these TP values. This plot shows the clear separation among these three scores, suggesting that the proposed algorithm is useful in distinguishing images of these TSs.
Automatic whole slide (WS) tissue image segmentation is an important problem in digital pathology. A conventional classification-based method (referred to as CCb method) to tackle this problem is to train a classifier on a pre-built training database (pre-built DB) obtained from a set of training WS images, and use it to classify all image pixels or image patches (test samples) in the test WS image into different tissue types. This method suffers from a major challenge in WS image analysis: the strong inter-slide tissue variability (ISTV), i.e., the variability of tissue appearance from slide to slide. Due to this ISTV, the test samples are usually very different from the training data, which is the source of misclassification. To address the ISTV, we propose a novel method, called slide-adapted classification (SAC), to extend the CCb method. We assume that in the test WS image, besides regions with high variation from the pre-built DB, there are regions with lower variation from this DB. Hence, the SAC method performs a two-stage classification: first classifies all test samples in a WS image (as done in the CCb method) and compute their classification confidence scores. Next, the samples classified with high confidence scores (samples being reliably classified due to their low variation from the pre-built DB) are combined with the pre-built DB to generate an adaptive training DB to reclassify the low confidence samples. The method is motivated by the large size of the test WS image (a large number of high confidence samples are obtained), and the lower variability between the low and high confidence samples (both belonging to the same WS image) compared to the ISTV. Using the proposed SAC method to segment a large dataset of 24 WS images, we improve the accuracy over the CCb method.