Translator Disclaimer
1 September 2010 Discovering objects and their location in videos using spatial-temporal context words
Author Affiliations +
We present a novel unsupervised learning algorithm for discovering objects and their location in videos from moving cameras. The videos can switch between different shots, and contain cluttered background, occlusion, camera motion, and multiple independently moving objects. We exploit both appearance consistency and spatial configuration consistency of local patches across frames for object recognition and localization. The contributions of this paper are twofold. First, we propose a combined approach for simultaneous spatial context and temporal context generation. Local video patches are extracted and described using the generated spatial-temporal context words. Second, a dynamic topic model, based on the representation of a bag of spatial-temporal context words, is introduced to learn object category models in video sequences. The proposed model can categorize and localize multiple objects in a single video. Objects leaving or entering the scene at multiple times can also be handled efficiently in the dynamic framework. Experimental results on the CamVid data set and the VISAT™ data set demonstrate the effectiveness and robustness of the proposed method.
©(2010) Society of Photo-Optical Instrumentation Engineers (SPIE)
Hao Sun, Cheng Wang, Boliang Wang, and Naser El-Sheimy "Discovering objects and their location in videos using spatial-temporal context words," Optical Engineering 49(9), 097003 (1 September 2010).
Published: 1 September 2010

Back to Top