This research addresses the document vs. non-document image classification problem. The ability to select images
containing text from an OCR processing stream that also includes images of scenes, people, faces, etc., will
eliminate unnecessary computation and free up valuable computer resources for other tasks. This is particularly
true for high volume OCR systems. Fisher vectors represent images as gradients of a global generative Gaussian
Mixture Model (GMM) of low level image descriptors, and exhibit state-of-the-art performance for object categorization.
Gaussian supervectors represent images by soft clustering low level image descriptors according to
posterior GMM mixture probabilities, optionally using MAP adaptation, and have demonstrated state-of-the-art
performance for scene categorization. We compare results obtained by applying linear SVMs to Fisher vector
and Gaussian supervector representations to categorize images as having only text, no text, or a mixture of
text and non-text. We also report the performance of GMM-based soft versions of vectors of locally aggregated
descriptors (VLAD) and Bag of Visual words (BOV).