23 March 1994 Degraded text recognition using word collocation
Author Affiliations +
A relaxation-based algorithm is proposed that improves the performance of a text recognition technique by propagating the influence of word collocation statistics. Word collocation refers to the likelihood that two words co-occur within a fixed distance of one another. For example, in a story about water transportation, it is highly likely that the word `river' will occur within ten words on either side of the word `boat.' The proposed algorithm receives groups of visually similar decisions (called neighborhoods) for words in a running text that are computed by a word recognition algorithm. The position of decisions within the neighborhoods are modified based on how often they co-occur with decisions in the neighborhoods of other nearby words. This process is iterated a number of times effectively propagating the influence of the collocation statistics across an input text. This improves on a strictly local analysis by allowing for strong collocations to reinforce weak (but related) collocations elsewhere. An experimental analysis is discussed in which the algorithm is applied to improving text recognition results that are less than 60% correct. The correct rate is effectively improved to 90% or better in all cases.
© (1994) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Tao Hong, Jonathan J. Hull, "Degraded text recognition using word collocation", Proc. SPIE 2181, Document Recognition, (23 March 1994); doi: 10.1117/12.171121; https://doi.org/10.1117/12.171121

Back to Top