The JBIG2 standard is widely used for binary document image compression primarily because it achieves much
higher compression ratios than conventional facsimile encoding standards, such as T.4, T.6, and T.82 (JBIG1).
A typical JBIG2 encoder works by first separating the document into connected components, or symbols. Next
it creates a dictionary by encoding a subset of symbols from the image, and finally it encodes all the remaining
symbols using the dictionary entries as a reference.
In this paper, we propose a novel method for measuring the distance between symbols based on a conditionalentropy
estimation (CEE) distance measure. The CEE distance measure is used to both index entries of the
dictionary and construct the dictionary. The advantage of the CEE distance measure, as compared to conventional
measures of symbol similarity, is that the CEE provides a much more accurate estimate of the number of
bits required to encode a symbol. In experiments on a variety of documents, we demonstrate that the incorporation
of the CEE distance measure results in approximately a 14% reduction in the overall bitrate of the JBIG2
encoded bitstream as compared to the best conventional dissimilarity measures.