18 December 2001 Automated data entry system: performance issues
Author Affiliations +
Abstract
This paper discusses the performance of a system for extracting bibliographic fields from scanned pages in biomedical journals to populate MEDLINE, the flagship database of the national Library of Medicine (NLM), and heavily used worldwide. This system consists of automated processes to extract the article title, author names, affiliations and abstract, and manual workstations for the entry of other required fields such as pagination, grant support information, databank accession numbers and others needed for a completed bibliographic record in MEDLINE. Labor and time data are given for (1) a wholly manual keyboarding process to create the records, (2) an OCR-based system that requires all fields except the abstract to be manually input, and (3) a more automated system that relies on document image analysis and understanding techniques for the extraction of several fields. It is shown that this last, most automated, approach requires less than 25% of the labor effort in the first, manual, process.
© (2001) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
George R. Thoma, George R. Thoma, Glenn Ford, Glenn Ford, } "Automated data entry system: performance issues", Proc. SPIE 4670, Document Recognition and Retrieval IX, (18 December 2001); doi: 10.1117/12.450734; https://doi.org/10.1117/12.450734
PROCEEDINGS
10 PAGES


SHARE
RELATED CONTENT

Correcting OCR text by association with historical datasets
Proceedings of SPIE (January 12 2003)
Automated zone correction in bitmapped document images
Proceedings of SPIE (December 21 1999)
Study of style effects on OCR errors in the MEDLINE...
Proceedings of SPIE (January 16 2005)
Automated labeling in document images
Proceedings of SPIE (December 20 2000)

Back to Top