Translator Disclaimer
29 March 2013 Storage and breast region segmentation for a non-distributed approach to clinical scale content-based image retrieval in mammography
Author Affiliations +
The goal of our work is to lay the foundation for implementing a personal computer mammography content-based image retrieval (MCBIR) system that can search a small to midsized clinical practice's picture archive and communications system (PACS). For a system to be relevant to clinicians it must be able to operate over a large dataset because: the number of mammograms within a PACS can grow by as many as 8,000 images per month; and, the amount of training data available can impact MCBIR retrieval performance. We therefore elected to use the largest publically available mammography dataset, the Digital Database for Screening Mammography (DDSM). We propose a non-distributed approach to MCBIR. We confirm the feasibility of this approach by applying it to modernizing the DDSM*. Our modernization work includes: encoding the dataset's images in the DICOM supported PNG lossless compression format; using a combination of an embedded database and compressed files to store textual data; and performing image segmentation to extract the breast regions in the DDSM's 10,411 useable mammograms. Our segmentation algorithm uses a combination of thresholding and seeded region growing. The resulting image masks are stored in compressed files. We implemented ImageJ plug-ins to support our work. Generally MCBIR work employs distributed approaches such as client/server computing, or web services. Our work demonstrates that approaches using a single personal computer are now feasible due to the increases in computing power. Our work on the DDSM has implications for the systems requirements for clinical MCBIR systems. We found that the new dataset requires less than 256GB in storage. We were able to perform rapid automated breast region segmentation with acceptable results in 98.15% of the dataset's10,411 images. Mean processing time for segmentation was 22.1 seconds per image while processing three images concurrently. Due to the DDSM’s inaccessibility researchers often either use a small subset of the available mammograms or abandon the DDSM altogether and use a much smaller, but more useable, dataset. Our work makes the entire DDSM accessible. We use standard open-source/public domain technologies including, ImageJ, and the H2 embedded SQL databases. We also believe that the approach used for the DDSM will be similar to the approaches for MCBIR storage and processing in future clinical PACS.
© (2013) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Fumbeya Marungo and Paul Taylor "Storage and breast region segmentation for a non-distributed approach to clinical scale content-based image retrieval in mammography", Proc. SPIE 8674, Medical Imaging 2013: Advanced PACS-based Imaging Informatics and Therapeutic Applications, 86740H (29 March 2013);

Back to Top