The most important points in the development of an OCR system are the font independence and the ability to read free layout text. The feature extraction algorithm based on contour tracing generates size invariant geometrical and topological features which make the recognition as font independent as possible. In our OCR system (Recognita) these features are arranged in a tree structure which enables fast classification to be done. The character and line finding algorithm is designed to meet the second requirement including the recognition of proportional spacing, ligatures, kerning and automatic separation of graphics and text.
"Developing A General Purpose Optical Character Recognition System", Proc. SPIE 1074, Imaging Workstations, (24 July 1989); doi: 10.1117/12.952619; https://doi.org/10.1117/12.952619