One of the important tasks in video surveillance is to detect and track targets moving independently in a scene.
Most real-time research to date has focused on scenarios from stationary cameras where there is limited movement
in the background, such as videos taken at traffic lights or from buildings where there is no background proximal
to the background. A more robust method is needed when there are moving background objects such as trees
or flags close in the camera or when the camera is moving. In this paper we first introduce a variant of the
multimodal mean (MM) background model that we call the spatial multimodal mean (SMM) background model
that is better suited for these scenarios while improving the speed of the mixture of Gaussians (MoG) background
model. It approximates the multimodal MoG background with the generalization that each pixel has a random
spatial distribution. The SMM background model is well suited for real-time nonstationary scenes since it models
each pixel with a spatial distribution and the simplifications make it computationally feasible to apply image
transformations. We then describe how this can be integrated into a real-time MTI system that does not require
the estimation of depth.
A typical structure from motion (SFM) technique is to construct 3-D structures from the observation of the motions of
salient features tracked over time. Although the sparse feature-based SFM provides additional solutions to robotic
platforms as a tool to augment navigation performance, the technique often fails to produce dense 3-D structures due to
the sparseness that is introduced during the feature selection and matching processes. For midrange sensing and tactical
planning, it is important to have a dense map that is able to provide not only 3-D coordinates of features, but also
clustered terrain information around the features for better thematic representation of the scene. In order to overcome the
shortfalls embedded in the sparse feature-based SFM, we propose an approach that uses Voronoi decomposition with an
equidistance-based triangulation that is applied to each of segmented and classified regions. The set of the circumcenters
of the circum-hyperspheres used in the triangulation is formed with the feature points extracted from the SFM
processing. We also apply flat surface detection to find traversable surface for a robotic vehicle to be able to maneuver
Maintaining real-time situational awareness of military combat vehicles (manned/unmanned) with onboard vision
sensors for either autonomous mobility or reconnaissance missions such as moving target indication (MTI) and
automatic target recognition (ATR) while the vehicle is on the move has been technically and operationally challenging.
In this paper, we investigate and present a practical implementation of a robust real-time structure from motion
technique that allows moving robotic vehicles to be able to reconstruct 3D models from observed 2D features with
dynamically adjusting motion parameters. We also demonstrate applications that locate and track moving targets within
the structured environment built by the SFM and recognize the targets such as vehicles and humans through a
hierarchical shape model.
Many autonomous vehicle navigation systems have adopted area-based stereo image processing techniques that use correlation measures to construct disparity maps as a basic obstacle detection and avoidance mechanism. Although the intra-scale area-based techniques perform well in pyramid processing frameworks, significant performance enhancement and reliability improvement may be achievable using wavelet- based inter-scale correlation measures. This paper presents a novel framework, which can be facilitated in unmanned ground vehicles, to recover 3D depth information (disparity maps) from binocular stereo images. We propose a wavelet- based coarse-to-fine incremental scheme to build up refined disparity maps from coarse ones, and demonstrate that usable disparity maps can be generated from sparse (compressed) wavelet coefficients. Our approach is motivated by a biological mechanism of the human visual system where multiresolution is known feature for perceptional visual processing. Among traditional multiresolution approaches, wavelet analysis provides a mathematically coherent and precise definition to the concept of multiresolution. The variation of resolution enables the transform to identify image signatures of objects in scale space. We use these signatures embedded in the wavelet transform domain to construct more detailed disparity maps at finer levels. Inter-scale correlation measures within the framework are used to identify the signature at the next finer level, since wavelet coefficients contain well-characterized evolutionary information.