Paper
28 January 2008 A mixed approach to book splitting
Liangcai Gao, Zhi Tang
Author Affiliations +
Proceedings Volume 6815, Document Recognition and Retrieval XV; 68150B (2008) https://doi.org/10.1117/12.765813
Event: Electronic Imaging, 2008, San Jose, California, United States
Abstract
In this paper, we present a hybrid approach to splitting a book document into individual chapters. We use multiple sources of information to obtain a reliable assessment of the chapter title pages. These sources are produced by four methods: blank space detection, font analysis, header and footer association, and table of content (TOC) analysis. Finally, a combination component is used to score potential chapter title pages and select the best candidates. This approach takes full advantage of various kinds of information such as page header and footer, layout, and keywords. It works well even without the information of TOC which is crucial for most previous similar researches. Experiments show that this approach is robust and reliable.
© (2008) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Liangcai Gao and Zhi Tang "A mixed approach to book splitting", Proc. SPIE 6815, Document Recognition and Retrieval XV, 68150B (28 January 2008); https://doi.org/10.1117/12.765813
Lens.org Logo
CITATIONS
Cited by 4 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Error analysis

Statistical methods

Visualization

Analytical research

Statistical analysis

Computer science

Computing systems

RELATED CONTENT

Texel-based image classification with orthogonal bases
Proceedings of SPIE (April 29 2016)
Young addicted men hormone profile detection
Proceedings of SPIE (September 14 2010)
Imbalance rotating machine balancing
Proceedings of SPIE (February 20 2006)

Back to Top