We present an end-to-end feature description pipeline which uses a novel interest point detector and Rotation-
Invariant Fast Feature (RIFF) descriptors. The proposed RIFF algorithm is 15× faster than SURF<sup>1</sup> while
producing large-scale retrieval results that are comparable to SIFT.<sup>2 </sup>Such high-speed features benefit a range of
applications from Mobile Augmented Reality (MAR) to web-scale image retrieval and analysis.
In mobile visual search applications, an image-based query is typically sent from a mobile client to the server. Because of the bit-rate limitations, the query should be as small as possible. When performing image-based retrieval with local features, there are two types of information: the descriptors of the image features and the locations of the image features within the image. Location information can be used to check geometric consistency of the set of features and thus improve the retrieval performance. To compress the location information, location histogram coding is an effective solution. We present a location histogram coder that reduces the bitrate by 2:8x when compared to a fixed-rate scheme and 12:5x when compared to a floating point representation of the locations. A drawback is the large context table which can be difficult to store in the coder and requires large training data. We propose a new sum-based context for coding the location histogram map. We show that it can reduce the context up to 200x while being able to perform just as well as or better than previously proposed location histogram coders.
State-of-the-art image retrieval pipelines are based on "bag-of-words" matching. We note that the original order in which
features are extracted from the image is discarded in the "bag-of-words" matching pipeline. As a result, a set of features
extracted from a query image can be transmitted in any order. A set ofm unique features has <i>m</i>! orderings, and if the order
of transmission can be discarded, one can reduce the query size by an additional log<sub>2</sub>(<i>m</i>!) bits. In this work, we compare
two schemes for discarding ordering: one based on Digital Search Trees, and another based on location histograms. We
apply the two schemes to a set of low bitrate Compressed Histogram of Gradient (CHoG) features, and compare their
performance. Both schemes achieve approximately log<sub>2</sub>(<i>m</i>!) reduction in query size for a set of m features.
Orientation-invariant feature descriptors are widely used for image matching. We propose a new method of
computing and comparing Histogram of Gradients (HoG) descriptors which allows for re-orientation through
permutation. We do so by moving the orientation processing to the distance comparison, rather than the
descriptor computation. This improves upon prior work by increasing spatial distinctiveness. Our method
method allows for very fast descriptor computation, which is advantageous since many mobile applications of
HoG descriptors require fast descriptor computation on hand-held devices.
We review construction of a Compressed Histogram of Gradients (CHoG) image feature descriptor, and study
quantization problem that arises in its design. We explain our choice of algorithms for solving it, addressing
both complexity and performance aspects. We also study design of algorithms for decoding and matching of
compressed descriptors, and offer several techniques for speeding up these operations.
We investigate transform coding to efficiently store and transmit SIFT and SURF image descriptors. We show
that image and feature matching algorithms are robust to significantly compressed features. We achieve near-perfect
image matching and retrieval for both SIFT and SURF using ~2 bits/dimension. When applied to SIFT
and SURF, this provides a 16× compression relative to conventional floating point representation. We establish a
strong correlation between MSE and matching error for feature points and images. Feature compression enables
many application that may not otherwise be possible, especially on mobile devices.
Content-based image retrieval using a Scalable Vocabulary Tree (SVT) built from local scale-invariant features
is an effective method of fast search through a database. An SVT built from fronto-parallel database images,
however, is ineffective at classifying query images that suffer from perspective distortion. In this paper, we
propose an efficient server-side extension of the single-view SVT to a set of multiview SVTs that may be simultaneously
employed for image classification. Our solution results in significantly better retrieval performance when
perspective distortion is present. We also develop an analysis of how perspective increases the distance between
matching query-database feature descriptors.