6 March 2018 Scalable storage of whole slide images and fast retrieval of tiles using Apache Spark
Author Affiliations +
Abstract
Whole slide images (WSIs) can greatly improve the workflow of pathologists through the development of software for automatic detection and analysis of cellular and morphological features. However, the gigabyte size of a WSI poses serious challenge for scalable storage and fast retrieval, which is essential for next-generation image analytics. In this paper, we propose a system for scalable storage of WSIs and fast retrieval of image tiles using Apache Spark, a space-filling curve, and popular data storage formats. We investigate two schemes for storing the tiles of WSIs. In the first scheme, all the WSIs were stored in a single table (partitioned by certain table attributes for fast retrieval). In the second scheme, each WSI is stored in a separate table. The records in each table are sorted using the index values assigned by the space-filling curve. We also study two data storage formats for storing WSIs: Parquet and ORC (Optimized Row Columnar). Through performance evaluation on a 16-node cluster in CloudLab, we observed that ORC enables faster retrieval of tiles than Parquet and requires 6 times less storage space. We also observed that the two schemes for storing WSIs achieved comparable performance. On an average, our system took 2 secs to retrieve a single tile and less than 6 seconds for 8 tiles on up to 80 WSIs. We also report the tile retrieval performance of our system on Microsoft Azure to gain insight on how the underlying computing platform can affect the performance of our system.
© (2018) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Daniel E. Lopez Barron, Daniel E. Lopez Barron, Dig Vijay Kumar Yarlagadda, Dig Vijay Kumar Yarlagadda, Praveen Rao, Praveen Rao, Ossama Tawfik, Ossama Tawfik, Deepthi Rao, Deepthi Rao, } "Scalable storage of whole slide images and fast retrieval of tiles using Apache Spark", Proc. SPIE 10581, Medical Imaging 2018: Digital Pathology, 1058113 (6 March 2018); doi: 10.1117/12.2290380; https://doi.org/10.1117/12.2290380
PROCEEDINGS
6 PAGES


SHARE
Back to Top