Video is increasingly used in various advanced applications. Many of these applications require common video represen- tations that should be oriented towards how people describe video content. In this paper we first discuss the background of high-level video representations. We then introduce a computational framework for high-level video representation that evolves towards how people describe video content. Our framework represents a video shot in terms of its moving objects and their related semantic features such as events and other high-level motion features. To achieve higher applicability, content should be extracted independently of the type and the context of the input video. Our representation system, implemented on 6371 images with multi-object occlusion and artifacts, produces stable results in real-time. This is due to the adaptation to noise, the compensation of estimation errors at the various processing levels, and the division of the processing system into simple but effective tasks.
We propose a novel Mixture of Gaussian (MOG)-based real-time background update technique. The proposed technique consists of a new selective matching scheme based on the combined approaches of component ordering and winner-takes-all. This matching scheme not only selects the most probable component for the first matching with new pixel data, greatly improving performance, but also simplifies pixel classification and component replacement in case of no match. Further performance improvement achieved by using a new simple and functional component variance adaptation formula. Also in this technique, the proposed new hysteresis-based component matching and temporal motion history schemes improve segmentation quality. Component hysteresis matching improves detected foreground object blobs by reducing the amount of cracks and added shadows, while motion history preserves the integrity of moving objects boundaries, both with minimum computational overhead. The proposed background update technique implicitly handles both gradual illumination change and temporal clutter problems. The problem of shadows and ghosts is partially addressed by the proposed hysteresis-based matching scheme. The problem of persistent sudden illumination changes and camera movements are addressed at frame level depending on the percentage of pixels classified as foreground. We implemented three different state-of-the-art background update techniques and compared their segmentation quality and computational performance with those of the proposed technique. Experimental results on reference outdoor sequences and real traffic surveillance streams show that the proposed technique improved segmentation accuracy for extracting moving objects of interest compared to other reference techniques.
This paper proposes a novel algorithm for the real-time detection and correction of occlusion and split in feature-based
tracking of objects for surveillance applications. The proposed algorithm detects sudden variations of
spatio-temporal features of objects in order to identify possible occlusion or split events. The detection is
followed by a validation stage that uses past tracking information to prevent false detection of occlusion or split.
Special care is taken in case of heavy occlusion, when there is a large superposition of objects. In this case
the system relies on long-term temporal behavior of objects to avoid updating the video object features with
unreliable (e.g. shape and motion) information. Occlusion is corrected by separating occluded objects. For
the detection of splits, in addition to the analysis of spatio-temporal changes in objects features, our algorithm
analyzes the temporal behavior of split objects to discriminate between errors in segmentation and real separation
of objects, such as in the deposit of an object. Split is corrected by physically merging the objects detected to be
split. To validate the proposed approach, objective and visual results are presented. Experimental results show
the ability of the proposed algorithm to detect and correct, both, split and occlusion of objects. The proposed
algorithm is most suitable in video surveillance applications due to: its good performance in multiple, heavy, and
total occlusion; its distinction between real object separation and faulty object split; its handling of simultaneous
occlusion and split events; and its low computational complexity.
In the context of content-oriented applications such as video surveillance and video retrieval this paper proposes a stable object tracking method based on both object segmentation and motion estimation. The method focuses on the issues of speed of execution and reliability in the presence of noise, coding artifacts, shadows, occlusion, and object split. Objects are tracked based on the similarity of their features in successive images. This is done in three steps: object segmentation and motion estimation, object matching, and feature monitoring and correction. In the first step, objects are segmented and their spatial and temporal features are computed. In the second step, using a non-linear voting strategy, each object of the previous image is matched with an object of the current image creating a unique correspondence. In the third step, object segmentation errors, such as when objects occlude or split, are detected and corrected. These new data are then used to update the results of previous steps, i.e., object segmentation and motion estimation. The contributions in this paper are the multi-voting strategy and the monitoring and correction of segmentation errors.
Extensive experiments on indoor and outdoor video shots containing over 6000 images, including images with multi-object occlusion, noise, and coding artifacts have demonstrated the reliability and real-time response of the proposed method.
The steadily increasing need for video content accessibility necessitates the development of stable systems to represent video sequences based on their high-level (semantic) content. The core of such systems is the automatic extraction of video content. In this paper, a computational layered framework to effectively extract multiple high-level features of a video shot is presented. The objective with this framework is to extract rich high-level video descriptions of real world scenes. In our framework, high-level descriptions are related to moving objects which are represented by their spatio-temporal low-level features. High-level features are
represented by generic high-level object features such as events.
To achieve higher applicability, descriptions are extracted independently of the video context. Our framework is based on four interacting video processing layers: enhancement to estimate and reduce noise, stabilization to compensate for global changes, analysis to extract meaningful objects, and interpretation to extract context-independent semantic features. The effectiveness and real-time response of the our framework are demonstrated by extensive experimentation on indoor and outdoor video shots in the presence of
multi-object occlusion, noise, and artifacts.
In real-time content-oriented video applications, fast unsupervised object segmentation is required. This paper proposes a real-time unsupervised object segmentation that is stable throughout large video shots. It trades precise segmentation at object boundaries for speed of execution and reliability in varying image conditions. This interpretation is most appropriate to applications such as surveillance and video retrieval where speed and temporal reliability are of more concern than accurate object boundaries. Both objective and subjective evaluations, and comparisons to other methods show the robustness of the proposed methods while being of reduced complexity. The proposed algorithm needs on average 0.15 seconds per image. The proposed segmentation consists of four steps: motion detection, morphological edge detection, contour analysis, and object labeling. The contributions in this paper are: a segmentation process of simple but effective tasks avoiding complex operations, a reliable memory-based noise-adaptive motion detection, and a memory-based contour tracing and analysis method. The proposed contour tracing aims 1) at finding contours with complex structure such as those containing dead or inner branches and 2) at spatial and temporal adaptive selection of contours. The motion detection is spatio-temporal adaptive as it uses estimated intra-image noise variance and detected inter-image motion.
It is likely that in many applications block-matching techniques for motion estimation will be further used. In this paper, a novel object-based approach for enhancement of motion fields generated by block matching is proposed. Herein, a block matching is first applied in parallel with a fast spatial image segmentation. Then, a rule-based object postprocessing strategy is used where each object is partitioned into sub-objects and each sub-object motion histogram first separately analyzed. The sub-object treatment is, in particular, useful when image segmentation errors occur. Then, using plausibility histogram tests, object motions are segregated into translational or non-translational motion. For non-translational motion, a single motion-vector per sub-object is first assigned. Then motion vectors of the sub-objects are examined according to plausibility criteria and adjusted in order to create smooth motion inside the whole object. As a result, blocking artifacts are reduced and a more accurate estimation is achieved. Another interesting result is that motion vectors are implicitly assigned to pixels of covered/exposed areas. In the paper, performance comparison of the new approach and block matching methods is given. Furthermore, a fast unsupervised image segmentation method of reduced complexity aimed at separating objects is proposed. This method is based on a binarization method and morphological edge detection. The binarization combines local and global texture-homogeneity tests based on special homogeneity masks which implicitly take possible edges into account for object separation. The paper contributes also a novel formulation of binary morphological erosion, dilation and binary edge detection. The presented segmentation uses few parameters which are automatically adjusted to the amount of noise in the image and to the local standard deviation.
In this paper, novel techniques for image segmentation and explicit object-matching-based motion estimation are presented. The principal aims of this work are to reconstruct motion-compensated images without introducing significant artifacts and to introduce an explicit object-matching and noise-robust segmentation technique which shows low computational costs and regular operations. A main feature of the new motion estimation technique is its tolerance against image segmentation errors such as the fusion or separation of objects. In addition, motion types inside recognized objects are detected. Depending on the detected object motion types either 'object/unique motion-vector' relations or 'object/several motion-vectors' relations are established. For example, in the case of translation and rotation, objects are divided into different regions and a 'region/one motion vector' relation is achieved using interpolation techniques. Further, suitability (computational cost) of the proposed methods for online applications (e.g. image interpolation) is shown. Experimental results are used to evaluate the performance of the proposed methods and to compare with block- based motion estimation techniques. In this stage of our work, the segmentation part is based on intensity and contour information (scalar segmentation). For further stabilization of the segmentation and hence the estimation process, the integration other statistical properties of objects (e.g. texture) (vector segmentation) is our current research.
Digital transmission of video signals and block-based coding/decoding schemes produce new artifacts such as blocking, dirty window, ringing and mosquite effects. These artifacts become worse with decreasing MPEG-2 data rates. Therefore the reduction of MPEG-artifacts becomes an attractive feature for digital TV-receivers. On the other hand an important feature for digital receivers is the performance of their postprocessing techniques such as object recognition, motion estimation, vector-based upconversion and noise reduction on MPEG-signals which are decoded in a receiver-based module called 'set top box'. In this paper different models dealing with the interaction between 'set top box' and digital receiver are discussed. Hereby the influence of MPEG-artifacts on postprocessing are presented. A vector-based upconversion algorithm which applied nonlinear center weighted median filters is presented. Assuming a 2-channel model of the human visual system with different spatio temporal characteristics, errors of the separated channels can be orthogonalized and avoided by an adequate splitting of the spectrum. Hereby a very robust vector error tolerant upconversion method which significantly improves the interpopulation quality is achieved. This paper describes also a concept for temporal recursive noise and MPEG-artifact filtering on TV images based on visual noise perception characteristics. Different procedures in the spatial subbands lead to results well matched to the requirements of the human visual system. Using a subband-based noise filter temporally non-correlated MPEG-artifacts can significantly be reduced. Image analysis using object recognition for video postprocessing becomes more important. Therefore a morphological, contour-based multilevel object recognition method which even stays robust in strongly corrupted MPEG-2 images is also introduced.