Paper
17 April 2019 Digitizing physical documents using optical character recognition
Abhinav Kaushal Keshari, Rajat Sharma, Madhav J. Nigam
Author Affiliations +
Proceedings Volume 11071, Tenth International Conference on Signal Processing Systems; 110710H (2019) https://doi.org/10.1117/12.2516743
Event: Tenth International Conference on Signal Processing Systems, 2018, Singapore, Singapore
Abstract
The need to convert printed text into a computer documented form which can be edited has increased rapidly in recent years which is fulfilled by using Optical Character Recognition (OCR). The challenge is to develop a character recognition mechanism which can convert these scanned images to an electronic mode which will provide the feature to reuse this text, access to every line and word of the document. This paper analyzes the architecture and method used for text recognition in OCR performed by Tesseract and extend this to an application which can transform sources of large number of paper printed documents like magazines, books, newspapers, etc. to an editable electronic format. This paper hence provides an application system that can make digitization of the physical documents faster and better with more accuracy.
© (2019) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Abhinav Kaushal Keshari, Rajat Sharma, and Madhav J. Nigam "Digitizing physical documents using optical character recognition", Proc. SPIE 11071, Tenth International Conference on Signal Processing Systems, 110710H (17 April 2019); https://doi.org/10.1117/12.2516743
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Optical character recognition

Image processing

Image segmentation

Gaussian filters

Image enhancement

Back to Top