Paper
26 September 2013 A comparison of Fisher vectors and Gaussian supervectors for document versus non-document image classification
David C. Smith, Keri A. Kornelson
Author Affiliations +
Abstract
This research addresses the document vs. non-document image classification problem. The ability to select images containing text from an OCR processing stream that also includes images of scenes, people, faces, etc., will eliminate unnecessary computation and free up valuable computer resources for other tasks. This is particularly true for high volume OCR systems. Fisher vectors represent images as gradients of a global generative Gaussian Mixture Model (GMM) of low level image descriptors, and exhibit state-of-the-art performance for object categorization. Gaussian supervectors represent images by soft clustering low level image descriptors according to posterior GMM mixture probabilities, optionally using MAP adaptation, and have demonstrated state-of-the-art performance for scene categorization. We compare results obtained by applying linear SVMs to Fisher vector and Gaussian supervector representations to categorize images as having only text, no text, or a mixture of text and non-text. We also report the performance of GMM-based soft versions of vectors of locally aggregated descriptors (VLAD) and Bag of Visual words (BOV).
© (2013) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
David C. Smith and Keri A. Kornelson "A comparison of Fisher vectors and Gaussian supervectors for document versus non-document image classification", Proc. SPIE 8856, Applications of Digital Image Processing XXXVI, 88560N (26 September 2013); https://doi.org/10.1117/12.2023329
Lens.org Logo
CITATIONS
Cited by 5 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Principal component analysis

Image classification

Databases

Dimension reduction

Expectation maximization algorithms

Optical character recognition

Matrices

Back to Top