We propose a method to accurately obtain the ratio of tumor cells over an entire histological slide. We use deep fully convolutional neural network models trained to detect and classify cells on images of H&E-stained tissue sections. Pathologists' labels consisting of exhaustive nuclei locations and tumor regions were used to trained the model in a supervised fashion. We show that combining two models, each working at a different magnification allows the system to capture both cell-level details and surrounding context to enable successful detection and classification of cells as either tumor-cell or normal-cell. Indeed, by conditioning the classification of a single cell on a multi-scale context information, our models mimic the process used by pathologists who assess cell neoplasticity and tumor extent at different microscope magnifications. The ratio of tumor cells can then be readily obtained by counting the number of cells in each class. To analyze an entire slide, we split it into multiple tiles that can be processed in parallel. The overall tumor cell ratio can then be aggregated. We perform experiments on a dataset of 100 slides with lung tumor specimens from both resection and tissue micro-array (TMA). We train fully-convolutional models using heavy data augmentation and batch normalization. On an unseen test set, we obtain an average mean absolute error on predicting the tumor cell ratio of less than 6%, which is significantly better than the human average of 20% and is key in properly selecting tissue samples for recent genetic panel tests geared at prescribing targeted cancer drugs. We perform ablation studies to show the importance of training two models at different magnifications and to justify the choice of some parameters, such as the size of the receptive field.
Diagnosis of hepatocellular carcinoma (HCC) on the basis of digital images is a challenging problem because, unlike gastrointestinal carcinoma, strong structural and morphological features are limited and sometimes absent from HCC images. In this study, we describe the classification of HCC images using statistical distributions of features obtained from image analysis of cell nuclei and hepatic trabeculae. Images of 130 hematoxylin-eosin (HE) stained histologic slides were captured at 20X by a slide scanner (Nanozoomer, Hamamatsu Photonics, Japan) and 1112 regions of interest (ROI) images were extracted for classification (551 negatives and 561 positives, including 113 well-differentiated positives). For a single nucleus, the following features were computed: area, perimeter, circularity, ellipticity, long and short axes of elliptic fit, contour complexity and gray level cooccurrence matrix (GLCM) texture features (angular second moment, contrast, homogeneity and entropy). In addition, distributions of nuclear density and hepatic trabecula thickness within an ROI were also extracted. To represent an ROI, statistical distributions (mean, standard deviation and percentiles) of these features were used. In total, 78 features were extracted for each ROI and a support vector machine (SVM) was trained to classify negative and positive ROIs. Experimental results using 5-fold cross validation show 90% sensitivity for an 87.8% specificity. The use of statistical distributions over a relatively large area makes the HCC classifier robust to occasional failures in the extraction of nuclear or hepatic trabecula features, thus providing stability to the system.
We present a system that detects cancer on slides of gastric tissue sections stained with hematoxylin and eosin (H&E). At its heart is a classifier trained using the semi-supervised multi-instance learning framework (MIL) where each tissue is represented by a set of regions-of-interest (ROI) and a single label. Such labels are readily obtained because pathologists diagnose each tissue independently as part of the normal clinical workflow. From a large dataset of over 26K gastric tissue sections from over 12K patients obtained from a clinical load spanning several months, we train a MIL classifier on a patient-level partition of the dataset (2/3 of the patients) and obtain a very high performance of 96% (AUC), tested on the remaining 1/3 never-seen before patients (over 8K tissues). We show this level of performance to match the more costly supervised approach where individual ROIs need to be labeled manually. The large amount of data used to train this system gives us confidence in its robustness and that it can be safely used in a clinical setting. We demonstrate how it can improve the clinical workflow when used for pre-screening or quality control. For pre-screening, the system can diagnose 47% of the tissues with a very low likelihood (< 1%) of missing cancers, thus halving the clinicians' caseload. For quality control, compared to random rechecking of 33% of the cases, the system achieves a three-fold increase in the likelihood of catching cancers missed by pathologists. The system is currently in regular use at independent pathology labs in Japan where it is used to double-check clinician's diagnoses. At the end of 2012 it will have analyzed over 80,000 slides of gastric and colorectal samples (200,000 tissues).
Digital pathology is developing based on the improvement and popularization of WSI (whole slide imaging) scanners. WSI scanners are widely expected to be used as the next generation microscope for diagnosis; however, their usage is currently mostly limited to education and archiving. Indeed, there are still many hindrances in using WSI scanners for diagnosis (not research purpose), two of the main reasons being the perceived high cost and small gain in productivity obtained by switching from the microscope to a WSI system and the lack of WSI standardization. We believe that a key factor for advancing digital pathology is the creation of computer assisted diagnosis systems (CAD). Such systems require high-resolution digitization of slides and provide a clear added value to the often costly conversion to WSI. We (NEC Corporation) are creating a CAD system, named e-Pathologist ®. This system is currently used at independent pathology labs for quality control (QC/QA), double-checking pathologists diagnosis and preventing missed cancers. At the end of 2012, about 80,000 slides, 200,000 tissues of gastric and colorectal samples will have been analyzed by e-Pathologist ®. Through the development of e-Pathologist ®, it has become clear that a computer program should be inspired by the pathologist diagnosis process, yet it should not be a mere copy or simulation of it. Indeed pathologists often approach the diagnosis of slides in a "holistic" manner, examining them at various magnifications, panning and zooming in a seemingly haphazard way that they often have a hard time to precisely describe. Hence there has been no clear recipe emerging from numerous interviews with pathologists on how to exactly computer code a diagnosis expert system. Instead, we focused on extracting a small set of histopathological features that were consistently indicated as important by the pathologists and then let the computer figure out how to interpret in a quantitative way the presence or absence of these features over the entire slide. Using the overall pathologists diagnosis (into a class of disease), we train the computer system using advanced machine learning techniques to predict the disease based on the extracted features. By considering the diagnosis of several expert pathologists during the training phase, we insure that the machine is learning a "gold standard" that will be applied consistently and objectively for all subsequent diagnosis, making them more predictable and reliable. Considering the future of digital pathology, it is essential for a CAD system to produce effective and accurate clinical data. To this effect, there remain many hurdles, including standardization as well as more research into seeking clinical evidences from "computer-friendly" objective measurements of histological images. Currently the most commonly used staining method is H&E (Hematoxylin and Eosin), but it is extremely difficult to standardize the H&E staining process. Current pathology criteria, category, definitions, and thresholds are all on based pathologists subjective observations. Digital pathology is an emerging field and researchers should bear responsibility not only for developing new algorithms, but also for understanding the meaning of measured quantitative data.
Conference Committee Involvement (11)
Digital and Computational Pathology
20 February 2022 | San Diego, California, United States
Digital and Computational Pathology
15 February 2021 | Online Only, California, United States
19 February 2020 | Houston, Texas, United States
20 February 2019 | San Diego, California, United States
11 February 2018 | Houston, Texas, United States
12 February 2017 | Orlando, Florida, United States
Digital Pathology Posters
12 February 2017 | Orlando, FL, United States
2 March 2016 | San Diego, California, United States
25 February 2015 | Orlando, Florida, United States
16 February 2014 | San Diego, California, United States
10 February 2013 | Lake Buena Vista (Orlando Area), Florida, United States