21 May 2012 Binary document image compression using a three-symbol grouped code dictionary
Author Affiliations +
Abstract
A novel method of lossy compression for images of text documents is proposed. The method is based on classifying the objects, characters, and pictures that appear in the images. We used the Tanimoto distance to group the objects into different classes to create an object dictionary; then, we codified the instances of each class by means of a code of three symbols called the three orthogonal symbol chain code (3OT). We applied an entropy coder to the resulting chain, which groups the symbols of 3OT; finally, we compressed the chain obtained by using the Paq8l archiver, which is based on a context-mixing algorithm divided into a predictor and an arithmetic coder. We obtained a high performance in memory storage, with an average of 2.17 times better compression levels with respect to the international standard Joint Bi-level Image Experts Group 2 on its lossy information version.
© 2012 SPIE and IS&T
Hermilo Sanchez-Cruz, Hermilo Sanchez-Cruz, Mario A. Rodríguez-Díaz, Mario A. Rodríguez-Díaz, } "Binary document image compression using a three-symbol grouped code dictionary," Journal of Electronic Imaging 21(2), 023013 (21 May 2012). https://doi.org/10.1117/1.JEI.21.2.023013 . Submission:
JOURNAL ARTICLE
13 PAGES


SHARE
Back to Top