We present an end-to-end feature description pipeline which uses a novel interest point detector and Rotation-
Invariant Fast Feature (RIFF) descriptors. The proposed RIFF algorithm is 15× faster than SURF<sup>1</sup> while
producing large-scale retrieval results that are comparable to SIFT.<sup>2 </sup>Such high-speed features benefit a range of
applications from Mobile Augmented Reality (MAR) to web-scale image retrieval and analysis.
In mobile visual search applications, an image-based query is typically sent from a mobile client to the server. Because of the bit-rate limitations, the query should be as small as possible. When performing image-based retrieval with local features, there are two types of information: the descriptors of the image features and the locations of the image features within the image. Location information can be used to check geometric consistency of the set of features and thus improve the retrieval performance. To compress the location information, location histogram coding is an effective solution. We present a location histogram coder that reduces the bitrate by 2:8x when compared to a fixed-rate scheme and 12:5x when compared to a floating point representation of the locations. A drawback is the large context table which can be difficult to store in the coder and requires large training data. We propose a new sum-based context for coding the location histogram map. We show that it can reduce the context up to 200x while being able to perform just as well as or better than previously proposed location histogram coders.
State-of-the-art image retrieval pipelines are based on "bag-of-words" matching. We note that the original order in which
features are extracted from the image is discarded in the "bag-of-words" matching pipeline. As a result, a set of features
extracted from a query image can be transmitted in any order. A set ofm unique features has <i>m</i>! orderings, and if the order
of transmission can be discarded, one can reduce the query size by an additional log<sub>2</sub>(<i>m</i>!) bits. In this work, we compare
two schemes for discarding ordering: one based on Digital Search Trees, and another based on location histograms. We
apply the two schemes to a set of low bitrate Compressed Histogram of Gradient (CHoG) features, and compare their
performance. Both schemes achieve approximately log<sub>2</sub>(<i>m</i>!) reduction in query size for a set of m features.
We review construction of a Compressed Histogram of Gradients (CHoG) image feature descriptor, and study
quantization problem that arises in its design. We explain our choice of algorithms for solving it, addressing
both complexity and performance aspects. We also study design of algorithms for decoding and matching of
compressed descriptors, and offer several techniques for speeding up these operations.
Orientation-invariant feature descriptors are widely used for image matching. We propose a new method of
computing and comparing Histogram of Gradients (HoG) descriptors which allows for re-orientation through
permutation. We do so by moving the orientation processing to the distance comparison, rather than the
descriptor computation. This improves upon prior work by increasing spatial distinctiveness. Our method
method allows for very fast descriptor computation, which is advantageous since many mobile applications of
HoG descriptors require fast descriptor computation on hand-held devices.
Maintaining an accurate and up-to-date inventory of one's assets is a labor-intensive, tedious, and costly operation.
To ease this difficult but important task, we design and implement a mobile asset tracking system for
automatically generating an inventory by snapping photos of the assets with a smartphone. Since smartphones
are becoming ubiquitous, construction and deployment of our inventory management solution is simple and costeffective.
Automatic asset recognition is achieved by first segmenting individual assets out of the query photo
and then performing bag-of-visual-features (BoVF) image matching on the segmented regions. The smartphone's
sensor readings, such as digital compass and accelerometer measurements, can be used to determine the location
of each asset, and this location information is stored in the inventory for each recognized asset.
As a special case study, we demonstrate a mobile book tracking system, where users snap photos of books
stacked on bookshelves to generate a location-aware book inventory. It is shown that segmenting the book spines
is very important for accurate feature-based image matching into a database of book spines. Segmentation
also provides the exact orientation of each book spine, so more discriminative upright local features can be
employed for improved recognition. This system's mobile client has been implemented for smartphones running
the Symbian or Android operating systems. The client enables a user to snap a picture of a bookshelf and to
subsequently view the recognized spines in the smartphone's viewfinder. Two different pose estimates, one from
BoVF geometric matching and the other from segmentation boundaries, are both utilized to accurately draw the
boundary of each spine in the viewfinder for easy visualization. The BoVF representation also allows matching
each photo of a bookshelf rack against a photo of the entire bookshelf, and the resulting feature matches are
used in conjunction with the smartphone's orientation sensors to determine the exact location of each book.
Content-based image retrieval using a Scalable Vocabulary Tree (SVT) built from local scale-invariant features
is an effective method of fast search through a database. An SVT built from fronto-parallel database images,
however, is ineffective at classifying query images that suffer from perspective distortion. In this paper, we
propose an efficient server-side extension of the single-view SVT to a set of multiview SVTs that may be simultaneously
employed for image classification. Our solution results in significantly better retrieval performance when
perspective distortion is present. We also develop an analysis of how perspective increases the distance between
matching query-database feature descriptors.
We investigate transform coding to efficiently store and transmit SIFT and SURF image descriptors. We show
that image and feature matching algorithms are robust to significantly compressed features. We achieve near-perfect
image matching and retrieval for both SIFT and SURF using ~2 bits/dimension. When applied to SIFT
and SURF, this provides a 16× compression relative to conventional floating point representation. We establish a
strong correlation between MSE and matching error for feature points and images. Feature compression enables
many application that may not otherwise be possible, especially on mobile devices.