The mixed content compression (MCC) algorithm developed in this research provides a hardware efficient solution
for compression of scanned compound document images. MCC allows for an easy implementation in imaging
pipeline hardware by using only an 8 row buffer of pixels. MCC uses the JPEG encoder to effectively compress
the background and picture content of a document image. The remaining text and line graphics in the image,
which require high spatial resolution, but can tolerate low color resolution, are compressed using a JBIG1 encoder
and color quantization. To separate the text and graphics from the image, MCC uses a simple mean square
error (MSE) block classification algorithm to allow a hardware efficient implementation. Results show that for
our comprehensive training suite, the compression ratio average achieved by MCC was 60:1, but JPEG only
achieved 35:1. In particular, MCC compression ratios become very high on average (82:1 versus 44:1) for mono
text documents, which are very common documents being copied and scanned with all-in-ones. In addition,
MCC has an edge sharpening side-effect that is very desirable for the target application.
The JBIG2 binary image encoder dramatically improves compression ratios over previous encoders. The effectiveness
of JBIG2 is largely due to its use of pattern matching techniques and symbol dictionaries for the
representation of text. While dictionary design is critical to achieving high compression ratios, little research
has been done in the optimization of dictionaries across stripes and pages.
In this paper we propose a novel dynamic dictionary design that substantially improves JBIG2 compression
ratios, particularly for multi-page documents. This dynamic dictionary updating scheme uses caching algorithms
to more effciently manage the symbol dictionary memory. Results show that the new dynamic symbol caching
technique outperforms the best previous dictionary construction schemes by between 13% and 46% for lossy
compression when encoding multi-page documents. In addition, we propose a fast and low-complexity pattern
matching algorithm that is robust to substitution errors and achieves high compression ratios.