We describe a system for vehicle make and model recognition (MMR) that automatically detects and classifies the make and model of a car from a live camera mounted above the highway. Vehicles are detected using a histogram of oriented gradient detector and then classified by a convolutional neural network (CNN) incorporating the frontal view of the car. We propose a semiautomatic data-selection approach for the vehicle detector and the classifier, by using an automatic number plate recognition engine to minimize human effort. The resulting classification has a top-1 accuracy of 97.3% for 500 vehicle models. This paper presents a more extensive in-depth evaluation. We evaluate the effect of occlusion and have found that the most informative vehicle region is the grill at the front. Recognition remains accurate when the left or right part of vehicles is occluded. The small fraction of misclassifications mainly originates from errors in the dataset, or from insufficient visual information for specific vehicle models. Comparison of state-of-the-art CNN architectures shows similar performance for the MMR problem, supporting our findings that the classification performance is dominated by the dataset quality.
This research considers gender classification in surveillance environments, typically involving low-resolution images and a large amount of viewpoint variations and occlusions. Gender classification is inherently difficult due to the large intra-class variation and interclass correlation. We have developed a gender classification system, which is successfully evaluated on two novel datasets, which realistically consider the above conditions, typical for surveillance. The system reaches a mean accuracy of up to 90% and approaches our human baseline of 92.6%, proving a high-quality gender classification system. We also present an in-depth discussion of the fundamental differences between SVM and RF classifiers. We conclude that balancing the degree of randomization in any classifier is required for the highest classification accuracy. For our problem, an RF-SVM hybrid classifier exploiting the combination of HSV and LBP features results in the highest classification accuracy of 89.9 0.2%, while classification computation time is negligible compared to the detection time of pedestrians.
Object detection is an important technique for video surveillance applications. Although different detection algorithms were proposed, they all have problems in detecting occluded objects. In this paper, we propose a novel system for occlusion handling and integrate this in a sliding-window detection framework using HOG features and linear classification. The occlusion handling is obtained by applying multiple classifiers, each covering a different level of occlusion and focusing on the non-occluded object parts. Experiments show that our approach based on 17 classifiers, obtains an increase of 8% in detection performance. To limit computational complexity, we propose a cascaded implementation that only increases the computational cost by 3.4%. Although the paper presents results for pedestrian detection, our approach is not limited to this object class. Finally, our system does not need an additional dataset for training, covering all possible types of occlusions.
This paper proposes an original moving ship detection approach in video surveillance systems, especially con- centrating on occlusion problems among ships and vegetation using context information. Firstly, an over- segmentation is performed to divide and classify by SVM (Support Vector Machine) segments into water or non-water, while exploiting the context that ships move only in water. We assume that the ship motion to be characterized by motion saliency and consistency, such that each ship distinguish itself. Therefore, based on the water context model, non-water segments are merged into regions with motion similarity. Then, moving ships are detected by measuring the motion saliency of those regions. Experiments on real-life surveillance videos prove the accuracy and robustness of the proposed approach. We especially pay attention to testing in the cases of severe occlusions between ships and between ship and vegetation. The proposed algorithm outperforms, in terms of precision and recall, our earlier work and a proposal using SVM-based ship detection.
In port surveillance, video-based monitoring is a valuable supplement to a radar system by helping to detect smaller ships in the shadow of a larger ship and with the possibility to detect nonmetal ships. Therefore, automatic video-based ship detection is an important research area for security control in port regions. An approach that automatically detects moving ships in port surveillance videos with robustness for occlusions is presented. In our approach, important elements from the visual, spatial, and temporal features of the scene are used to create a model of the contextual information and perform a motion saliency analysis. We model the context of the scene by first segmenting the video frame and contextually labeling the segments, such as water, vegetation, etc. Then, based on the assumption that each object has its own motion, labeled segments are merged into individual semantic regions even when occlusions occur. The context is finally modeled to help locating the candidate ships by exploring semantic relations between ships and context, spatial adjacency and size constraints of different regions. Additionally, we assume that the ship moves with a significant speed compared to its surroundings. As a result, ships are detected by checking motion saliency for candidate ships according to the predefined criteria. We compare this approach with the conventional technique for object classification based on support vector machine. Experiments are carried out with real-life surveillance videos, where the obtained results outperform two recent algorithms and show the accuracy and robustness of the proposed ship detection approach. The inherent simplicity of our algorithmic subsystems enables real-time operation of our proposal in embedded video surveillance, such as port surveillance systems based on moving, nonstatic cameras.
This paper presents an automatic ship detection approach for video-based port surveillance systems. Our approach combines context and motion saliency analysis. The context is represented by the assumption that ships only travel inside a water region. We perform motion saliency analysis since we expect ships to move with higher speed compared to the water flow and static environment. A robust water detection is first employed to extract the water region as contextual information in the video frame, which is achieved by graph-based segmentation and region-based classification. After the water detection, the segments labeled as non-water are merged to form the regions containing candidate ships, based on the spatial adjacency. Finally, ships are detected by checking motion saliency for each candidate ship according to a set of criteria. Experiments are carried out with real-life surveillance videos, where the obtained results prove the accuracy and robustness of the proposed ship detection approach. The proposed algorithm outperforms a state-of-the-art algorithm when applied to the same sets of surveillance videos.
Text detection and recognition in natural images have conventionally been seen in the prior art as autonomous
tasks executed in a strictly sequential processing chain with limited information sharing between sub-systems.
This approach is flawed because it introduces (1) redundancy in extracting the same text properties multiple
times and (2) error by prohibiting verification of hard (often binarized) detection results at later stages. We
explore the possibilities for integration of detection and recognition modules by a feedforward multidimensional
information stream. Integration involves suitable characterization of the text string at detection and application
of the knowledge to ease recognition by a given OCR system. The choice of characterization properties generally
depends on the OCR system, although some of them have proven universally applicable. We show that
the proposed integration measures enable more robust recognition of text in complex, unconstrained natural
environments. Specifically, integration by the proposed measures (1) eliminates textual input irregularities that
recognition engines cannot handle and (2) adaptively tunes the recognition stage for each input image. The
former function boosts correct detections, while the latter mainly reduces the number of false positives. Our
validation experiments on a set of low-quality natural images show that adaptively tuning the OCR stage to
the typical text-to-background transitions in the input image (gradient significance profiling) allows to attain an
improvement of 29% in the precision-recall performance, mostly through boosting precision.
Many proposed video content analysis algorithms for surveillance applications are very computationally intensive, which limits the integration in a total system, running on one processing unit (e.g. PC). To build flexible prototyping systems of low cost, a distributed system with scalable processing power is therefore required. This paper discusses requirements for surveillance systems, considering two example applications. From these requirements, specifications for a prototyping architecture are derived. An implementation of the proposed architecture is presented, enabling mapping of multiple software modules onto a number of processing units (PCs). The architecture enables fast prototyping of new algorithms for complex surveillance applications without considering resource constraints.
Partners of the CANDELA project are realizing a system for real-time image processing for traffic and video-surveillance applications. This system performs some segmentation, labels the extracted blobs and follows their track into the scene. We also address the problem of evaluating the results of such processes. We are developing a tool to generate and manage the results of the performance evaluation of VCA systems. This evaluation is done by comparison of the results of the global application and its components with a ground truth file generated manually. Both manually and automatically generated description files are formatted in XML. This descriptive markup language is then treated to assemble appropriately parts of the document and process this metadata. For a scientific purpose this tool will provide an objective measure of improvement and a mean to choose between competitive methods. In addition, it is a powerful tool for algorithm designers to measure the progress of their work at the different levels of the processing chain. For an industrial purpose this tool will assess both the accuracy of the VCA with an obvious marketing impact. We present the definition of the evaluation tool, its metrics and specific implementations designed for our applications.