Convolutional neural networks (CNNs) have been popularly used to solve the problem of cell/nuclei classification and segmentation in histopathology images. Despite their pervasiveness, CNNs are fine-tuned on specific, large and labeled datasets as these datasets are hard to collect and annotate. However, this is not a scalable approach. In this work, we aim to gain deeper insights into the nature of the problem. We used a cervical cancer dataset with cells labeled into four classes by an expert pathologist. By employing pre-training on this dataset, we propose a one-shot learning model for cervical cell classification in histopathology tissue images. We extract regional maximum activation of convolutions (R-MAC) global descriptors and train a one-shot learning memory module with the goal of using it for various cancer types and eliminate the need for expensive, difficult to collect, large, labeled whole slide image (WSI) datasets. Our model achieved 94.6% accuracy in detecting the four cell classes on the test dataset. Further, we present our analysis of the dataset and features to better understand and visualize the problem in general.
Whole slide images (WSIs) can greatly improve the workflow of pathologists through the development of software for automatic detection and analysis of cellular and morphological features. However, the gigabyte size of a WSI poses serious challenge for scalable storage and fast retrieval, which is essential for next-generation image analytics. In this paper, we propose a system for scalable storage of WSIs and fast retrieval of image tiles using Apache Spark, a space-filling curve, and popular data storage formats. We investigate two schemes for storing the tiles of WSIs. In the first scheme, all the WSIs were stored in a single table (partitioned by certain table attributes for fast retrieval). In the second scheme, each WSI is stored in a separate table. The records in each table are sorted using the index values assigned by the space-filling curve. We also study two data storage formats for storing WSIs: Parquet and ORC (Optimized Row Columnar). Through performance evaluation on a 16-node cluster in CloudLab, we observed that ORC enables faster retrieval of tiles than Parquet and requires 6 times less storage space. We also observed that the two schemes for storing WSIs achieved comparable performance. On an average, our system took 2 secs to retrieve a single tile and less than 6 seconds for 8 tiles on up to 80 WSIs. We also report the tile retrieval performance of our system on Microsoft Azure to gain insight on how the underlying computing platform can affect the performance of our system.