1 August 1992 Segmenting handwritten text lines into words using distance algorithms
Author Affiliations +
Abstract
This paper explores different distance algorithms that can group connected components of a handwritten text line into words. A binarized handwritten text image normally consists of many connected components, where each component is a character fragment, an isolated character, or a group of characters. When the writing style is unconstrained, recognition of individual components is unreliable so the components must be grouped into words before recognition algorithms (which may require dictionaries) can be used. Algorithms that compute the distance between connected components can indicate how the connected components should be clustered into words. We show that fast straightforward distance algorithms (such as using the horizontal distance between the component''s bounding boxes) have mediocre performance. Euclidean distance algorithms perform well but are computationally slow. This paper describes original methods of computing distances. These algorithms include combining a set of horizontal distances between components (applied to each pixel row) with the Euclidean and bounding box methods to achieve high performance and reasonable speed. We examine six distance algorithms and each is tested on unconstrained handwritten address images.
© (1992) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Giovanni Seni, Edward Cohen, "Segmenting handwritten text lines into words using distance algorithms", Proc. SPIE 1661, Machine Vision Applications in Character Recognition and Industrial Inspection, (1 August 1992); doi: 10.1117/12.130274; https://doi.org/10.1117/12.130274
PROCEEDINGS
12 PAGES


SHARE
KEYWORDS
Detection and tracking algorithms

Distance measurement

Image quality

Associative arrays

Binary data

CCD cameras

Image processing algorithms and systems

Back to Top