6 March 2018 SlideSeg: a Python module for the creation of annotated image repositories from whole slide images
Author Affiliations +
Machine learning methods are being widely used in medicine to aid cancer diagnosis and detection. In the area of digital pathology, prediction heat maps produced by convolutional neural networks (CNN) have already exceeded the performance of a trained pathologist with no time constraints. To train deep learning networks, large datasets of accurately labeled ground truth data are required; however, whole slide images are often on the scale of 10+ gigapixels when digitized at 40X magnification, contain multiple magnification levels, and have unstandardized formats. Due to these characteristics, traditional techniques for the production of training and validation data cannot be used, resulting in the limited availability of annotated datasets. This research presents a Python module and method to rapidly produce accurately annotated image patches from whole slide images. This module is built on OpenCV, an open source computer vision library, OpenSlide, an open source library for reading virtual slide images, and NumPy, a library for scientific computing with Python. These Python scripts successfully produce 'ground truth' image patches and will help transfer advances in research laboratories into clinical application by addressing many of the challenges associated with the development of annotated datasets for machine learning in histopathology.
© (2018) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Brendan Crabb, Brendan Crabb, Niels Olson, Niels Olson, "SlideSeg: a Python module for the creation of annotated image repositories from whole slide images", Proc. SPIE 10581, Medical Imaging 2018: Digital Pathology, 105811C (6 March 2018); doi: 10.1117/12.2300262; https://doi.org/10.1117/12.2300262

Back to Top