Translator Disclaimer
24 January 2011 A simple and effective figure caption detection system for old-style documents
Author Affiliations +
Proceedings Volume 7874, Document Recognition and Retrieval XVIII; 78740T (2011)
Event: IS&T/SPIE Electronic Imaging, 2011, San Francisco Airport, California, United States
Identifying figure captions has wide applications in producing high quality e-books such as kindle books or ipad books. In this paper, we present a rule-based system to detect horizontal figure captions in old-style documents. Our algorithm consists of three steps: (i) segment images into regions of different types such as text and figures, (ii) search the best caption region candidate based on heuristic rules such as region alignments and distances, and (iii) expand caption regions identified in step (ii) with its neighboring text-regions in order to correct oversegmentation errors. We test our algorithm using 81 images collected from old-style books, with each image containing at least one figure area. We show that the approach is able to correctly detect figure captions from images with different layouts, and we also measure its performances in terms of both precision rate and recall rate.
© (2011) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Zongyi Liu and Hanning Zhou "A simple and effective figure caption detection system for old-style documents", Proc. SPIE 7874, Document Recognition and Retrieval XVIII, 78740T (24 January 2011);


Back to Top