KEYWORDS: Facial recognition systems, Principal component analysis, 3D acquisition, Detection and tracking algorithms, Data modeling, 3D modeling, Feature selection, Algorithm development, Performance modeling, 3D image processing
We propose a novel method to improve the performance of existing three dimensional (3D) human face recognition algorithms that employ Euclidean distances between facial fiducial points as features. We further investigate a novel 3D face recognition algorithm that employs geodesic and Euclidean distances between facial fiducial points. We demonstrate that this algorithm is robust to changes in facial expression. Geodesic and Euclidean distances were calculated between pairs of 25 facial fiducial points. For the proposed algorithm, geodesic distances and 'global curvature' characteristics, defined as the ratio of geodesic to Euclidean distance between a pairs of points, were employed as features. The most discriminatory features were selected using stepwise linear discriminant analysis (LDA). These were projected onto 11 LDA directions, and face models were matched using the Euclidean distance metric. With a gallery set containing one image each of 105 subjects and a probe set containing 663 images of the same subjects, the algorithm produced EER=1.4% and a rank 1 RR=98.64%. It performed significantly better than existing algorithms based on principal component analysis and LDA applied to face range images. Its verification performance for expressive faces was also significantly better than an algorithm that employed Euclidean distances between facial fiducial points as features.
Understanding human behavior in video is essential in numerous applications including smart surveillance, video annotation/retrieval, and human-computer interaction.
However, recognizing human interactions is a challenging task due to ambiguity in body articulation, variations in body size and appearance, loose clothing, mutual occlusion, and shadows.
In this paper we present a framework for recognizing human actions and interactions in color video, and a hierarchical graphical model that unifies multiple-level processing in video computing: pixel level, blob level, object level, and event level. A mixture of Gaussian (MOG) model is used at the pixel level to train and classify individual pixel colors. A relaxation labeling with attribute relational graph (ARG) is used at the blob level to merge the pixels into coherent blobs and to register inter-blob relations. At the object level, the poses of individual body parts are recognized using Bayesian networks (BNs). At the event level, the actions of a single person are modeled using a dynamic Bayesian network (DBN). The results of the object-level descriptions for each person are juxtaposed along a common timeline to identify an interaction between two persons. The linguistic 'verb argument structure' is used to represent human action in terms of triplets. A meaningful semantic description in terms of is obtained. Our system achieves semantic descriptions of positive, neutral, and negative interactions between two persons including hand-shaking, standing hand-in-hand, and hugging as the positive interactions, approaching, departing, and pointing as the neutral interactions, and pushing, punching, and kicking as the negative interactions.
This paper presents a methodology for object recognition in complex scenes by learning multiple feature object representation in second generation Forward Looking InfraRed (FLIR) images. A hierarchical recognition framework is developed which solves the recognition task by performing classification using decisions at the lower levels and the input features. The system uses new algorithms for detection and segmentation of objects and a Bayesian formulation for combining multiple object features for improved discrimination. Experimental results on a large database of FLIR images is presented to validate the robustness of the system, and its applicability to FLIR imagery obtained from real scenes.
A computational framework is presented to extract geometric structures and a region of interest from a monochrome image for the detection of man-made objects in a non-urban scene. The framework is based on the principles of perceptual organization. Several new techniques are developed to implement this framework. Examples using real complex images are presented to show the effectiveness of the approach.
This paper presents a new approach to integrated modeling of three dimensional objects for generating visual and thermal images under different viewing, ambient, and internal conditions. A volume surface octree is used for object modeling. It is shown that the above representation is suitable for thermal modeling of complex objects with non-homogeneities and heat generation. The technique to incorporate non- homogeneities and heat generation using octree intersection is described. The proposed model may be used to predict a discriminatory feature for object recognition based on the surface temperature and intrinsic thermal properties in any desired ambient condition. The model is designed to be used in a multi-sensor vision system using a hypothesize and verify approach. Several examples of the generated thermal and visual images are presented, which illustrate the usefulness of the approach.