23 September 1999 Geometrical approach to skew detection for documents containing the Latin/Cyrillic characters
Author Affiliations +
Abstract
Document skew is a distortion mainly concerning the orientation of text lines and occurring when digitizing the paper documents. Its visual effect is a slope of text lines, which are normally horizontal for such scripts as Latin or Cyrillic, with respect to the X-axis. Many available document recognition systems, however, require properly aligned text liens for accurate text segmentation and recognition. It means that the skew, if present, should be estimated and compensated before further processing. The Hough transform is one of the popular techniques for skew detection. To lower its computational cost, it is usually applied to a small number of representative points of each character or its bounding box. However, a problem with this method is that different characters have different heights. As a result, the representative points of characters belonging to the same line often do not fit well to a straight line and this often leads to errors in skew detection by using the Hough transform. In this paper, we propose a new algorithm to overcome this problem. It only uses the bounding boxes of the connected components of characters and a number of simple tests in order to obtain the skew angle estimation.
© (1999) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Oleg G. Okun, Oleg G. Okun, } "Geometrical approach to skew detection for documents containing the Latin/Cyrillic characters", Proc. SPIE 3811, Vision Geometry VIII, (23 September 1999); doi: 10.1117/12.364111; https://doi.org/10.1117/12.364111
PROCEEDINGS
9 PAGES


SHARE
RELATED CONTENT

OPC methods to improve image slope and process window
Proceedings of SPIE (July 09 2003)
Error Diffusion Using Random Field Models
Proceedings of SPIE (January 08 1984)

Back to Top