21 December 2000 Evaluating text categorization in the presence of OCR errors
Author Affiliations +
Abstract
In this paper we describe experiments that investigate the effects of OCR errors on text categorization. In particular, we show that in our environment, OCR errors have no effect on categorization when we use a classifier based on the naive Bayes model. We also observe that dimensionality reduction techniques eliminate a large number of OCR errors and improve categorization results.
© (2000) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Kazem Taghva, Thomas A. Nartker, Julie Borsack, Steven Lumos, Allen Condit, Ron Young, "Evaluating text categorization in the presence of OCR errors", Proc. SPIE 4307, Document Recognition and Retrieval VIII, (21 December 2000); doi: 10.1117/12.410861; https://doi.org/10.1117/12.410861
PROCEEDINGS
7 PAGES


SHARE
RELATED CONTENT

OCR correction based on document level knowledge
Proceedings of SPIE (January 13 2003)
Date of birth extraction using precise shallow parsing
Proceedings of SPIE (January 18 2010)
Effectiveness of thesauri-aided retrieval
Proceedings of SPIE (January 07 1999)
Do Thesauri enhance rule-based categorization for OCR text?
Proceedings of SPIE (January 13 2003)
Title extraction and generation from OCR'd documents
Proceedings of SPIE (January 29 2007)

Back to Top