1 September 2010 Discovering objects and their location in videos using spatial-temporal context words
Hao Sun, Cheng Wang, Boliang Wang, Naser El-Sheimy
Author Affiliations +
Abstract
We present a novel unsupervised learning algorithm for discovering objects and their location in videos from moving cameras. The videos can switch between different shots, and contain cluttered background, occlusion, camera motion, and multiple independently moving objects. We exploit both appearance consistency and spatial configuration consistency of local patches across frames for object recognition and localization. The contributions of this paper are twofold. First, we propose a combined approach for simultaneous spatial context and temporal context generation. Local video patches are extracted and described using the generated spatial-temporal context words. Second, a dynamic topic model, based on the representation of a bag of spatial-temporal context words, is introduced to learn object category models in video sequences. The proposed model can categorize and localize multiple objects in a single video. Objects leaving or entering the scene at multiple times can also be handled efficiently in the dynamic framework. Experimental results on the CamVid data set and the VISAT™ data set demonstrate the effectiveness and robustness of the proposed method.
©(2010) Society of Photo-Optical Instrumentation Engineers (SPIE)
Hao Sun, Cheng Wang, Boliang Wang, and Naser El-Sheimy "Discovering objects and their location in videos using spatial-temporal context words," Optical Engineering 49(9), 097003 (1 September 2010). https://doi.org/10.1117/1.3488041
Published: 1 September 2010
Lens.org Logo
CITATIONS
Cited by 1 scholarly publication.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Video

Video surveillance

Cameras

Optical engineering

Sensors

Sun

Digital video discs

RELATED CONTENT


Back to Top