In this paper, we present a novel approach for multiple-feature, multiple-sensor classification and localization of three-dimensional objects in two-dimensional images. We use a hypothesize-and-test-approach where we fit three-dimensional geometric models to image data. A hypothesis consists of an object's class and its six degrees of freedom. Our models consist of the objects' geometric data which is attributed with several local features, e.g. hotspots, edges and textures, and their respective rule of applicability (e.g. visibility). The model-fitting process is divided into three parts: using the hypothesis we first project the object onto the image plane while evaluating the rules of applicability for its local features. Hence, we get a two-dimensional representation of the objects which - in a second step - is aligned to the image data. In the last step, we perform a pose estimation to calculate the object's six degrees of freedom and to update the hypothesis out of the alignment results. The paper describes the major components of our system. This includes the management and generation of the hypotheses, the matching process, the pose estimation, and model-based prediction of the object's pose in six degrees of freedom. At the end, we show the performance, robustness and accuracy of the system in two applications (optical inspection for quality control and airport ground-traffic surveillance).
This paper presents three novel matching algorithms, where a hypothesis of a 3D object is matched into a 2D image. The three algorithms are compared with respect to speed and precision on some examples.
A hypothesis consists of the object model and its six degrees of freedom. The hypothesis is projected into the image plane using a pinhole camera model. The model of the used object is a feature-attributed 3D geometric model. It contains various local features and their rules of visibility. After the projection into the image plane the local environment of the projected features is searched for the best match value of the various features. There exists
a trade-off between the rigidity of the object and the best-match position of the local features in the image. After the matching a 2D-3D pose estimation is run to get an updated pose from the matching.
Three novel algorithms for matching the local features under the consideration of their geometric formation are decribed in this paper. The first algorithm combines the local features into a graph. The graph is viewed as a network of springs, where the spring forces constraint the object's rigidity. The quality of the local best matches is represented by additional forces introduced into
the nodes of the graph. The second matching algorithm decouples the local features from each other for moving them independently. This does not impose constraints on the rigidity of the object and does not consider the feature quality. The third matching method takes into account the feature quality by using it within the pose estimation.
One trend in modern object recognition from images is the use of multiple features and sensors which are combined for the object recognition task. To get better classification results the features used for the classification of the objects should be physically 'orthogonal'. To be independent of the kind of features and of their combination method, it is necessary to represent each feature in a unified measure. This measure should define the quality of the feature in the examined image. The measure must be unified, because only such a measure can be combined to a meaningful global result. This paper presents a method which normalizes different kinds of local features. A probabilistic approach is used which provides the unified measure. To map the feature information to a probabilistic interpretation, a generalized function model is used. It is largely independent of the type of application. Two examples of the presented method are shown. The first example uses the Chamfer-Distance to measure edge-features, the second one uses a gray-value correlation coefficient.
This work presents a method for detection, localization, classification and pose estimation of objects in SAR-image sequences. Such methods have to deal with strong noise in SAR-images and have the challenge that shadows, which may occur, should not affect the recognition process. The disturbing effect of noise is significantly reduced in the presented method by temporal integration of the SAR-images, using a motion-model of the sensor. Thus it is possible to perform a segmentation on the integrated images with quantile-thresholds and a region growing algorithm using an edge image created by a Canny-edge detector. To be independent of the number of objects in the image and the brightness of the image, a multi-threshold approach is used. By accumulating the segmented images, following an analysis of the homogeneity of the accumulated segments, it is possible to identify stable segments as possible objects. An optimization process is used to fit a generic model of a house into the stable segments. As initial values for the optimization process the results of a connected-pixel algorithm are used. An application example is presented, in which house-objects can be separated from shadows in a village formation and their pose can be determined correctly.