3 April 1997 Performance evaluation of document layout analysis algorithms on the UW data set
Author Affiliations +
Proceedings Volume 3027, Document Recognition IV; (1997); doi: 10.1117/12.270067
Event: Electronic Imaging '97, 1997, San Jose, CA, United States
Abstract
A performance evaluation protocol for the layout analysis is discussed in this paper. In the University of Washington English Document Image Database-III, there are 1600 English document images that come with manually edited ground truth of entity bounding boxes. These bounding boxes enclose text and non-text zones, text-lines, and words. We describe a performance metric for the comparison of the detected entities and the ground truth in terms of their bounding boxes. The Document Attribute Format Specification is used as the standard data representation. The protocol is intended to serve as a model for using the UW-III database to evaluate the document analysis algorithms. A set of layout analysis algorithms which detect different entities have been tested based on the data set and the performance metric. The evaluation results are presented in this paper.
© (1997) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Jisheng Liang, Ihsin T. Phillips, Robert M. Haralick, "Performance evaluation of document layout analysis algorithms on the UW data set", Proc. SPIE 3027, Document Recognition IV, (3 April 1997); doi: 10.1117/12.270067; https://doi.org/10.1117/12.270067
PROCEEDINGS
12 PAGES


SHARE
KEYWORDS
Image segmentation

Detection and tracking algorithms

Data modeling

Databases

Error analysis

Algorithm development

Image processing algorithms and systems

RELATED CONTENT


Back to Top