21 December 2000 Table analysis for multiline cell identification
Author Affiliations +
A table in a document is a rectilinear arrangement of cells where each cell contains a sequence of words. Several lines of text may compose one cell. Cells may be delimited by horizontal or vertical lines, but often this is not the case. A table analysis system is described which reconstructs table formatting information from table images whether or not the cells are explicitly delimited. Inputs to the system are word bounding boxes and any horizontal and vertical lines that delimit cells. Using a sequence of carefully-crafted rules, multi-line cells and their interrelationships are found even though no explicit delimiters are visible. This robust system is a component of a commercial document recognition system.
© (2000) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
John C. Handley, John C. Handley, } "Table analysis for multiline cell identification", Proc. SPIE 4307, Document Recognition and Retrieval VIII, (21 December 2000); doi: 10.1117/12.410853; https://doi.org/10.1117/12.410853

Back to Top