15 December 2003 Automatic content extraction of filled-form images based on clustering component block projection vectors
Author Affiliations +
Abstract
Automatic understanding of document images is a hard problem. Here we consider a sub-problem, automatically extracting content from filled form images. Without pre-selected templates or sophisticated structural/semantic analysis, we propose a novel approach based on clustering the component-block-projection-vectors. By combining spectral clustering and minimal spanning tree clustering, we generate highly accurate clusters, from which the adaptive templates are constructed to extract the filled-in content. Our experiments show this approach is effective for a set of 1040 US IRS tax form images belonging to 208 types.
© (2003) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Hanchuan Peng, Xiaofeng He, Fuhui Long, "Automatic content extraction of filled-form images based on clustering component block projection vectors", Proc. SPIE 5296, Document Recognition and Retrieval XI, (15 December 2003); doi: 10.1117/12.527345; https://doi.org/10.1117/12.527345
PROCEEDINGS
9 PAGES


SHARE
Back to Top