A method to provide estimates of font attributes in an OCR system, using detectors of individual attributes that are error-prone. For an OCR system to preserve the appearance of a scanned document, it needs accurate detection of font attributes. However, OCR environments have noise and other sources of errors, tending to make font attribute detection unreliable. Certain assumptions about font use can greatly enhance accuracy. Attributes such as boldness and italics are more likely to change between neighboring words, while attributes such as serifness are less likely to change within the same paragraph. Furthermore, the document as a whole, tends to have a limited number of sets of font attributes. These assumptions allow a better use of context than the raw data, or what would be achieved by simpler methods that would oversmooth the data.
"Producing good font attribute determination using error-prone information", Proc. SPIE 3027, Document Recognition IV, (3 April 1997); doi: 10.1117/12.270079; https://doi.org/10.1117/12.270079