Unseen object detection problem is known as a semantic matching problem. Thus, a semantic matcher takes two images as an input – the request image and the test image. The request image represents an object class needed to be found on the test image. In this paper, we propose a new region proposal based semantic matcher. In our region based semantic matcher we use the same ideas as in R-CNN. Our Body CNN also generates proposals similar to classical Faster R-CNN, and Head-CNN compares proposals with a request descriptor, extracted from the request image. To extract features from the request image we use Request descriptor CNN. All three CNNs – Head, Body and Request descriptor are trained together, end-to-end for seen class object detection by request and then applied to both seen and unseen classes. We have trained and tested our CNN on Pascal VOC Dataset.
In this paper we propose a new algorithm for image filtering using morphological thickness map. Compared to the other smoothing methods, such as anisotropic diffusion, comparative filters, guided and rolling guidance filters, the benefit of our method is that it natively works with the image structure – thickness map, so it does not depend on the various levels of image noise, lightning conditions and effects. We present the method idea, algorithm itself and various experimental results. The results of the filtering using our algorithm can be widely applied in such image processing tasks as image segmentation, motion analysis, invariant feature transformation, data compression.
More than 80% of video surveillance systems are used for monitoring people. Old human detection algorithms, based on background and foreground modelling, could not even deal with a group of people, to say nothing of a crowd. Recent robust and highly effective pedestrian detection algorithms are a new milestone of video surveillance systems. Based on modern approaches in deep learning, these algorithms produce very discriminative features that can be used for getting robust inference in real visual scenes. They deal with such tasks as distinguishing different persons in a group, overcome problem with sufficient enclosures of human bodies by the foreground, detect various poses of people. In our work we use a new approach which enables to combine detection and classification tasks into one challenge using convolution neural networks. As a start point we choose YOLO CNN, whose authors propose a very efficient way of combining mentioned above tasks by learning a single neural network. This approach showed competitive results with state-of-the-art models such as FAST R-CNN, significantly overcoming them in speed, which allows us to apply it in real time video surveillance and other video monitoring systems. Despite all advantages it suffers from some known drawbacks, related to the fully-connected layers that obstruct applying the CNN to images with different resolution. Also it limits the ability to distinguish small close human figures in groups which is crucial for our tasks since we work with rather low quality images which often include dense small groups of people. In this work we gradually change network architecture to overcome mentioned above problems, train it on a complex pedestrian dataset and finally get the CNN detecting small pedestrians in real scenes.
The problem of automatic abandoned bag detection is of the great importance for ensuring security in the public areas. At the same time emergency situations occur rarely in the large-scale video surveillance systems. Therefore it is important to keep false alarms low maintaining high accuracy of detection. The approach that satisfies mentioned requirements for abandoned bag detection in complex environments is proposed. It consists of two blocks. The first block does the preliminary detection of abandoned bags on pixel level by background modelling via Gaussian mixture model. It ensures high speed and precise positioning of the bounding boxes on the objects of interest. The second part performs the bag recognition on a region level via a compact convolutional neural network. Using of the convolutional neural network is a key component to success. All processing happens on a central processing unit. The proposed approach is suitable for systems (microcomputers), which do not have powerful graphical subsystems. The experiments have been conducted on the real-world scenes. Obtained results indicate that the proposed approach is efficient and provides acceptable quality characteristics.
Existing image fusion methods based on morphological image analysis, that expresses the geometrical idea of image shape as a label image, are quite sensitive to the quality of image segmentation and, therefore, not sufficiently robust to noise and high frequency distortions. On the other hand, there are a number of methods in the field of dimensionality reduction and data comparison that give possibility of avoiding an image segmentation step by using diffusion maps techniques. The paper proposes a new approach for multispectral image fusion based on the combination of morphological image analysis and diffusion maps theory (i.e. Diffusion Morphology). A new image fusion algorithm is described that uses a matched diffusion filtering procedure instead of morphological projection. The algorithm is implemented for a three channels Enhanced Vision System prototype. The comparative results of image fusion are shown on real images acquired in flight experiments.
Automated and accurate spatial motion capturing of an object is necessary for a wide variety of applications including industry and science, virtual reality and movie, medicine and sports. For the most part of applications a reliability and an accuracy of the data obtained as well as convenience for a user are the main characteristics defining the quality of the motion capture system. Among the existing systems for 3D data acquisition, based on different physical principles (accelerometry, magnetometry, time-of-flight, vision-based), optical motion capture systems have a set of advantages such as high speed of acquisition, potential for high accuracy and automation based on advanced image processing algorithms. For vision-based motion capture accurate and robust object features detecting and tracking through the video sequence are the key elements along with a level of automation of capturing process. So for providing high accuracy of obtained spatial data the developed vision-based motion capture system “Mosca” is based on photogrammetric principles of 3D measurements and supports high speed image acquisition in synchronized mode. It includes from 2 to 4 technical vision cameras for capturing video sequences of object motion. The original camera calibration and external orientation procedures provide the basis for high accuracy of 3D measurements. A set of algorithms as for detecting, identifying and tracking of similar targets, so for marker-less object motion capture is developed and tested. The results of algorithms’ evaluation show high robustness and high reliability for various motion analysis tasks in technical and biomechanics applications.
The improved stereo-based approach for dynamic road scene understanding in a Driver Assistance System (DAS) is presented. System calibration is addressed. Algorithms for road lane detection, road 3D model generation, obstacle predetection and object (vehicle) detection are described. Lane detection is based on the evidence analysis. Obstacle predetection procedure performs the comparison of radial ortophotos, obtained by left and right stereo images. Object detection algorithm is based on recognition of back part of cars by histograms of oriented gradients. Car Stereo Sequences (CSS) Dataset captured by vehicle-based laboratory and published for DAS algorithms testing.
In this paper, we propose a background stabilization method for an arbitrary camera movement. We investigate the state of the art algorithms for feature point detection and introduce a composite LBP descriptor to describe the feature points both with an algorithm for feature points matching on a sequence of images. In addition, an algorithm for constructing an affine transformation of the old frame in the sequence into the new one for the tasks of stabilization and image stitching was proposed.
In this paper we propose a method for classification of moving objects of “human” and “car” types in computer vision
systems using statistical hypotheses and integration of the results using two different decision rules. FAR-FRR graphs
for all criteria and the decision rule are plotted. Confusion matrix for both ways of integration is presented. The example
of the method application to the public video databases is provided. Ways of accuracy improvement are proposed.