21 December 2000 Evaluating text categorization in the presence of OCR errors
Author Affiliations +
Abstract
In this paper we describe experiments that investigate the effects of OCR errors on text categorization. In particular, we show that in our environment, OCR errors have no effect on categorization when we use a classifier based on the naive Bayes model. We also observe that dimensionality reduction techniques eliminate a large number of OCR errors and improve categorization results.
© (2000) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Kazem Taghva, Kazem Taghva, Thomas A. Nartker, Thomas A. Nartker, Julie Borsack, Julie Borsack, Steven Lumos, Steven Lumos, Allen Condit, Allen Condit, Ron Young, Ron Young, } "Evaluating text categorization in the presence of OCR errors", Proc. SPIE 4307, Document Recognition and Retrieval VIII, (21 December 2000); doi: 10.1117/12.410861; https://doi.org/10.1117/12.410861
PROCEEDINGS
7 PAGES


SHARE
RELATED CONTENT

Effectiveness of thesauri-aided retrieval
Proceedings of SPIE (January 06 1999)
Do Thesauri enhance rule-based categorization for OCR text?
Proceedings of SPIE (January 12 2003)
Expert system for automatically correcting OCR output
Proceedings of SPIE (March 22 1994)
Address extraction using hidden Markov models
Proceedings of SPIE (January 16 2005)
Title extraction and generation from OCR'd documents
Proceedings of SPIE (January 28 2007)

Back to Top