3 April 1997 Performance evaluation of document layout analysis algorithms on the UW data set
Author Affiliations +
Abstract
A performance evaluation protocol for the layout analysis is discussed in this paper. In the University of Washington English Document Image Database-III, there are 1600 English document images that come with manually edited ground truth of entity bounding boxes. These bounding boxes enclose text and non-text zones, text-lines, and words. We describe a performance metric for the comparison of the detected entities and the ground truth in terms of their bounding boxes. The Document Attribute Format Specification is used as the standard data representation. The protocol is intended to serve as a model for using the UW-III database to evaluate the document analysis algorithms. A set of layout analysis algorithms which detect different entities have been tested based on the data set and the performance metric. The evaluation results are presented in this paper.
© (1997) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Jisheng Liang, Ihsin T. Phillips, Robert M. Haralick, "Performance evaluation of document layout analysis algorithms on the UW data set", Proc. SPIE 3027, Document Recognition IV, (3 April 1997); doi: 10.1117/12.270067; https://doi.org/10.1117/12.270067
PROCEEDINGS
12 PAGES


SHARE
Back to Top