A system for vehicle recognition in video based on SIFT features, mixture models, and support vector machines

Abhikesh Nag; David J. Miller; Andrew P. Brown; Kevin J. Sullivan

doi:10.1117/12.723746

30 April 2007 A system for vehicle recognition in video based on SIFT features, mixture models, and support vector machines

Abhikesh Nag, David J. Miller, Andrew P. Brown, Kevin J. Sullivan

Author Affiliations +

Proceedings Volume 6560, Intelligent Computing: Theory and Applications V; 65600G (2007) https://doi.org/10.1117/12.723746
Event: Defense and Security Symposium, 2007, Orlando, Florida, United States

Abstract

We present a system for scale and affine invariant recognition of vehicular objects in video sequences. We use local descriptors (SIFT keypoints) from image frames to model the object. These features are claimed in the literature to be highly distinctive and invariant to rotation, scale, and affine transformations. However, since the SIFT keypoints that are extracted from an object are instance-specific (variable), they form a dynamic feature space. This presents certain challenges for classification techniques, which generally require use of the same set of features for every instance of an object to be classified. To resolve this difficulty, we associate the extracted keypoints to the components (representative keypoints) in a mixture model for each target class. While the extracted keypoints are variable, the mixture components are fixed. The mixture models the keypoint features, as well as the location and scale at which each keypoint was detected in the frame. Keypoint to component association is achieved via a switching optimization procedure that locally maximizes the joint likelihood of keypoints and their locations and scales with the latter based on an affine transformation. To each mixture component from a class, we link a (first layer) support vector machine (SVM) classifier which votes for or against the hypothesis that the keypoint associated to the component belongs to the model's target class. A second layer SVM pools the votes from the ensemble of SVM classifiers in the first layer and gives the final class decision. We show promising results of experiments for video sequences from the VIVID database.

Citation Download Citation

Abhikesh Nag, David J. Miller, Andrew P. Brown, and Kevin J. Sullivan "A system for vehicle recognition in video based on SIFT features, mixture models, and support vector machines", Proc. SPIE 6560, Intelligent Computing: Theory and Applications V, 65600G (30 April 2007); https://doi.org/10.1117/12.723746

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available