We propose a highly efficient content-lossless compression scheme for Chinese document images. The scheme combines morphologic analysis with pattern matching to cluster patterns. In order to achieve the error maps with minimal error numbers, the morphologic analysis is applied to decomposing and recomposing the Chinese character patterns. In the pattern matching, the criteria are adapted to the characteristics of Chinese characters. Since small-size components sometimes can be inserted into the blank spaces of large-size components, we can achieve small-size pattern library images. Arithmetic coding is applied to the final compression. Our method achieves much better compression performance than most alternative methods, and assures content-lossless reconstruction.
"Compression of Chinese document images based on morphologic analysis and pattern matching," Optical Engineering 45(10), 107001 (1 October 2006). https://doi.org/10.1117/1.2361171