22 December 1999 Automated zone correction in bitmapped document images
Author Affiliations +
Abstract
The optical character recognition system (OCR) selected by the National Library of Medicine (NLM) as part of its system for automating the production of MEDLINER records frequently segments the scanned page images into zones which are inappropriate for NLM's application. Software has been created in-house to correct the zones using character coordinate and character attribute information provided as part of the OCR output data. The software correctly delineates over 97% of the zones of interest tested to date.
© (1999) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Susan E. Hauser, Daniel X. Le, George R. Thoma, "Automated zone correction in bitmapped document images", Proc. SPIE 3967, Document Recognition and Retrieval VII, (22 December 1999); doi: 10.1117/12.373499; https://doi.org/10.1117/12.373499
PROCEEDINGS
11 PAGES


SHARE
RELATED CONTENT

Correcting OCR text by association with historical datasets
Proceedings of SPIE (January 13 2003)
Study of style effects on OCR errors in the MEDLINE...
Proceedings of SPIE (January 17 2005)
Automated labeling in document images
Proceedings of SPIE (December 21 2000)
Automated data entry system: performance issues
Proceedings of SPIE (December 18 2001)

Back to Top