26 September 2013 A comparison of Fisher vectors and Gaussian supervectors for document versus non-document image classification
Author Affiliations +
Abstract
This research addresses the document vs. non-document image classification problem. The ability to select images containing text from an OCR processing stream that also includes images of scenes, people, faces, etc., will eliminate unnecessary computation and free up valuable computer resources for other tasks. This is particularly true for high volume OCR systems. Fisher vectors represent images as gradients of a global generative Gaussian Mixture Model (GMM) of low level image descriptors, and exhibit state-of-the-art performance for object categorization. Gaussian supervectors represent images by soft clustering low level image descriptors according to posterior GMM mixture probabilities, optionally using MAP adaptation, and have demonstrated state-of-the-art performance for scene categorization. We compare results obtained by applying linear SVMs to Fisher vector and Gaussian supervector representations to categorize images as having only text, no text, or a mixture of text and non-text. We also report the performance of GMM-based soft versions of vectors of locally aggregated descriptors (VLAD) and Bag of Visual words (BOV).
© (2013) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
David C. Smith, David C. Smith, Keri A. Kornelson, Keri A. Kornelson, "A comparison of Fisher vectors and Gaussian supervectors for document versus non-document image classification", Proc. SPIE 8856, Applications of Digital Image Processing XXXVI, 88560N (26 September 2013); doi: 10.1117/12.2023329; https://doi.org/10.1117/12.2023329
PROCEEDINGS
12 PAGES


SHARE
RELATED CONTENT


Back to Top