Proc. SPIE. 7534, Document Recognition and Retrieval XVII
KEYWORDS: Data mining, Mirrors, Detection and tracking algorithms, Optical character recognition, Scientific research, Lanthanum, Electronic imaging, Information science, Computer security, Current controlled current source
This paper presents the implementation and evaluation of a pattern-based program to extract date of birth information
from OCR text. Although the program finds data of birth information with high precision and recall, this type of
information extraction task seems to be negatively impacted by OCR errors.
We report on an attempt to build an automatic redaction system by applying information extraction techniques to the identification of private dates of birth. We conclude that automatic redaction is a promising concept although information extraction is significantly affected by the presence of OCR error.
This paper presents the implementation and evaluation of a Hidden Markov Model to extract addresses from OCR text. Although Hidden Markov Models discover addresses with high precision and recall, this type of Information Extraction task seems to be affected negatively by the presence of OCR text.