We formulate a simple human-pose tracking theory from monocular video based on the fundamental relationship between changes in pose and image motion vectors. We investigate the natural embedding of the low-dimensional body pose space into a high-dimensional space of body configurations that behaves locally in a linear manner. The embedded manifold facilitates the decomposition of the image motion vectors into basis motion vector fields of the tangent space to the manifold. This approach benefits from the style invariance of image motion flow vectors, and experiments to validate the fundamental theory show reasonable accuracy (within 4.9 deg of the ground truth).
Point cloud data present a broad swath of intriguing problems in signal processing. Namely, the data may be sparse, may be non-uniformly sampled in space and time, and cannot be processed directly by way of conventional techniques such as convolutional filters. This paper addresses such data under the application umbrella of remote sensing. Specifically, we examine the potential of interferometric synthetic aperture radar for detecting geohazards that affect transportation. Using sparsely distributed coherent scatterers on the ground, our algorithms attempt to locate events in process such as sinkholes in the vicinity of highways. Theoretically, the problem boils down to the detection of Gaussian-shaped changes that evolve predictably in space and time. The solution to the detection problem involves two basic approaches, one grounded in pattern matching and the other in statistical signal processing. Essentially, the spatiotemporal pattern matching extends a Hough-like voting algorithm to a method that penalizes deviation from the known model in space and time. For confirmation of geohazard location, we can exploit a fixed-time analysis of the distribution of subsidence from the point cloud data by way of computing mutual information. Results show that the detection and screening strategies conform to geological evidence.
A grid-based Bayesian array (GBA) for robust visual tracking has recently been developed, which proposes a novel method of deterministic sample generation and sample weighting for position estimation. In particular, a target motion model is constructed, predicting target position in the next frame based on estimations in previous frames. Samples are generated by gridding within an ellipsoid centered at the prediction. For localization, radial edge detection is applied for each sample to determine if it is inside the target boundary. Sample weights are then assigned according to the number of the edge points detected around the sample and its distance from the predicted position. The position estimation is computed as the weighted sum of the sample set. In this paper, we enhance the capacity of the GBA tracker in accommodating the tracking of targets in video with erratic motion, by introducing adaptation in the motion model and iterative position estimation. The improved tracking performance over the original GBA tracker are demonstrated in tracking a single leukocyte in vivo and ground vehicle target observed from UAV videos, both undergoing abrupt changes in motion. The experimental results show that the enhanced GBA tracker outperforms the original by tracking more than 10% of the total number of frames, and increases the number of video sequences with all frames tracked by greater than 20%.
There exists no measure to quantify the difficulty of a video tracking problem. Such difficulty depends upon the quality
of the video and upon the ability to distinguish the target from the background and from other potential targets. We
define a trackability measure in an information theoretic framework. The tools of information theory allow a measure of
trackability that seamlessly combines the video-dependent aspects with the target-dependent aspects of tracking
difficulty using measure of rate and information content. Specifically, video quality is encapsulated into a term that
measures spatial resolution, temporal resolution and signal-to-noise ratio by way of a Shannon-Hartley analysis. Then,
the ability to correctly match a template to a target is evaluated through an analysis of the mutual information between
the template, the detected signal and the interfering clutter. The trackability measure is compared to the performance of a
recent tracker based on scale space features computed via connected filters. The results show high Spearman correlation
magnitude between the trackability measure and actual performance.
Tracking human pose from monocular video sequences is a challenging problem due to the large number of independent
parameters affecting image appearance and nonlinear relationships between generating parameters and the resultant
images. Unlike the current practice of fitting interpolation functions to point correspondences between underlying pose
parameters and image appearance, we exploit the relationship between pose parameters and image motion flow vectors
in a physically meaningful way. Change in image appearance due to pose change is realized as navigating a low
dimensional submanifold of the infinite dimensional Lie group of diffeomorphisms of the two dimensional sphere S<sup>2</sup>.
For small changes in pose, image motion flow vectors lie on the tangent space of the submanifold. Any observed image
motion flow vector field is decomposed into the basis motion vector flow fields on the tangent space and combination
weights are used to update corresponding pose changes in the different dimensions of the pose parameter space. Image
motion flow vectors are largely invariant to style changes in experiments with synthetic and real data where the subjects
exhibit variation in appearance and clothing. The experiments demonstrate the robustness of our method (within ±4° of
ground truth) to style variance.
Improvised explosive devices (IEDs) are common and lethal instruments of terrorism, and linking a terrorist entity to a
specific device remains a difficult task. In the effort to identify persons associated with a given IED, we have
implemented a specialized content based image retrieval system to search and classify IED imagery. The system makes
two contributions to the art. First, we introduce a shape-based matching technique exploiting shape, color, and texture
(wavelet) information, based on novel vector field convolution active contours and a novel active contour initialization
method which treats coarse segmentation as an inverse problem. Second, we introduce a unique graph theoretic approach
to match annotated printed circuit board images for which no schematic or connectivity information is available. The
shape-based image retrieval method, in conjunction with the graph theoretic tool, provides an efficacious system for
matching IED images. For circuit imagery, the basic retrieval mechanism has a precision of 82.1% and the graph based
method has a precision of 98.1%. As of the fall of 2007, the working system has processed over 400,000 case images.
The marked point process (MPP) provides a useful and theoretically well-established tool for integrating spatial information into the image analysis process. We consider the problem of detecting rolling leukocytes within intravital microscopy images. A first stage of the detection method reduces the detection to a set of points, each one representing a possible leukocyte. Our task is then to decide which points are actual leukocytes. We propose an MPP-based approach that aims at improving both the accuracy and efficiency of the detection process by means of exploiting the spatial interrelationships. We construct a Markov chain Monte Carlo algorithm to obtain the maximum a posteriori (MAP) estimation of a set of points corresponding to the centroids of leukocytes observed in the image. The optimal solution, in terms of the MAP principle, is computed with respect to all leukocytes, rather than a single leukocyte. A quantitative study of our detection approach demonstrates results that compare very well to those achieved by manual detection and exceed the solution quality given by two competing methods. Our approach can serve as a fully automated substitute to the tedious and time-consuming manual rolling leukocyte detection process.
An implementation for parametric snakes used for object tracking is proposed via generalized deterministic annealing (GDA). Given an arbitrary energy functional that quantifies the quality of the contour solution, GDA computes the snake position by approximating the solution given by stochastic simulated annealing. First, the Markov chain representing the solution space for the snake position is broken into N smaller, local Markov chains representing the position of each discrete snake sample. At each annealing temperature, GDA directly approximates the stationary distribution of the local Markov chains using a mean field approximation for neighboring snake sample positions, and the final distribution reveals the solution. In contrast to the typical implementation via gradient descent, annealing methods can avoid suboptimal local solutions and can be used to compute snakes that are effective in the presence of severe noise and distant initial positions. Unlike simulated annealing, GDA does not utilize random moves to slowly locate a high quality solution and is thus appropriate for time critical applications. In the paper, synthetic experiments (on 231 images) are provided that compare the edge localization performance of snakes computed by GDA, simulated annealing and gradient descent for conditions of varying noise and varying initial snake position. The effectiveness of GDA is also demonstrated in a challenging real-data application (on 910 images) in which white blood cells are tracked from video microscopy.
Cardiac magnetic resonance studies have led to a greater understanding of the pathophysiology of ischemic heart disease. Manual segmentation of myocardial borders, a major task in the data analysis of these studies, is a tedious and time consuming process subject to observer bias. Automated segmentation reduces the time needed to process studies and removes observer bias. We propose an automated segmentation algorithm that uses an active surface to capture the endo- and epicardial borders of the left ventricle in a mouse heart. The surface is initialized as an ellipsoid corresponding to the maximal gradient inverse of variation (GICOV) value. The GICOV is the mean divided by the normalized standard deviation of the image intensity gradient in the outward normal direction along the surface. The GICOV is maximal when the surface lies along strong, constant gradients. The surface is then evolved until it maximizes the GICOV value subject to shape constraints. The problem is formulated in a Bayesian framework and is implemented using a Markov Chain Monte Carlo technique.
A standard approach to extending gray-scale filters to color is the use of gray-scale filters on each of the red, green, and blue bands separately. This RGB component method, when applied to connected filters, causes the introduction of new colors into the image, which could lead to incorrect classification or analysis. We develop a method of applying connected filters to color images using rotation in the hue band of the hue-saturation-value (HSV) color space. The rotation shifts the discontinuity in hue such that the distance to the nearest k-means cluster mean is maximized and the histogram value is simultaneously minimized. When applied to the sample image set, the error from the RGB component approach is of the order of 10 times greater than that of our hue rotation approach. Additionally, our approach yields qualitatively improved images, with fewer new colors introduced than the RGB component method.
We propose a method that uses projection models in conjunction with a sequential Monte Carlo approach to track rigid targets. We specifically address the problems associated with tracking objects in scenarios characterized by cluttered images and high variability in target scale. The projection model snake is introduced in order to track a target boundary over a variety of scales by geometrically transforming the boundary to account for three-dimensional relative motion between the target and camera. The complete solution is a potent synergism of the projection model snake and a sequential Monte Carlo method. The projection model Monte Carlo method randomly generates the parameters of target motion and pose from empirically derived distributions. The resultant "particles" are then weighted according to a likelihood determined by the integration of the mean gradient magnitude around the target contour, yielding the expected target path and pose. We demonstrate the effectiveness of this approach for tracking dynamic targets in sequences with noise, clutter, occlusion, and scale variability.
Multi-scale, multi-resolution image decompositions are efficacious for real-time target tracking applications. In these real-time systems, objects are initially located using coarse descriptions of the original image. These coarse scale results then guide and refine further inspection, with queries of higher resolution image representations restricted to regions of potential objects occurrence. The result is the classical coarse-to-fine search. In this paper, we describe a method for generating an adaptive template within the coarse-to-fine framework. Causality properties between image representations are directly exploited and lead to a template mechanism that is resilient to noise and occlusion. With minimal computational requirements, the method is well suited for real-time application.
Multi-resolution image analysis utilizes subsampled image representations for applications such as image coding, hierarchical image segmentation and fast image smoothing. An anti-aliasing filter may be used to insure that the sampled signals adequately represent the frequency components/features of the higher resolution signal. Sampling theories associated with linear anti-aliasing filtering are well-defined and conditions for nonlinear filters are emerging. This paper analyzes sampling conditions associated with anisotropic diffusion, an adaptive nonlinear filter implemented by partial differential equations (PDEs). Sampling criteria will be defined within the context of edge causality, and conditions will be prescribed that guarantee removal of all features unsupported in the sample domain. Initially, sampling definitions will utilize a simple, piecewise linear approximation of the anisotropic diffusion mechanism. Results will then demonstrate the viability of the sampling approach through the computation of reconstruction errors. Extension to more practical diffusion operators will also be considered.
This paper addresses the construction of the edge-preservation coefficient for anisotropic diffusion for image enhancement. We evaluate the performance of existing diffusion techniques in the presence of several corruptive processes. The results from this study were used to design a hybrid algorithm which capitalizes on the strengths of the current diffusion coefficients. The new edge-preserving algorithm adaptively determines the presence of noise or edges at each image location and selects the appropriate diffusion coefficient. The results generated from this algorithm exhibit improvements in mean squared error, signal-to-noise ratio, and visual quality. A comparative study shows the performance of the standard diffusion techniques and the hybrid algorithm for images corrupted by Laplacian-distributed, Gaussian- distributed, and 'salt and pepper' impulse noise.
We introduce a novel unsupervised image segmentation technique that is based on piecewise constant (PICO) regression. Given an input image, a PICO output image for a specified feature size (scale) is computed via nonlinear regression. The regression effectively provides the constant region segmentation of the input image that has a minimum deviation from the input image. PICO regression-based segmentation avoids the problems of region merging, poor localization, region boundary ambiguity, and region fragmentation. Additionally, our segmentation method is particularly well-suited for corrupted (noisy) input data. An application to segmentation and classification of remotely sensed imagery is provided.
We introduce new classes of image enhancement techniques that are based on optimizing local characteristics of the image. Using a new optimization technique for nonconvex combinatorial optimization problems, generalized deterministic annealing (GDA), we compute fuzzy nonlinear regressions of noisy images with respect to characteristic image sets defined by certain local image models. The image enhancement results demonstrate the powerful approach of nonlinear regression and the low-cost, high-quality optimization of GDA.
Generalized Deterministic Annealing (GDA) is a useful new tool for computing fast multi-state combinatorial optimization of difficult non-convex problems. By estimating the stationary distribution of simulated annealing (SA), GDA yields equivalent solutions to practical SA algorithms while providing a significant speed improvement. Using the standard GDA, the computational time of SA may be reduced by an order of magnitude, and, with a new implementation improvement, Windowed GDA, the time improvements reach two orders of magnitude with a trivial compromise in solution quality. The fast optimization of GDA has enabled expeditious computation of complex nonlinear image enhancement paradigms, such as the Piecewise Constant (PICO) regression examples used in this paper. To validate our analytical results, we apply GDA to the PICO regression problem and compare the results to other optimization methods. Several full image examples are provided that show successful PICO image enhancement using GDA in the presence of both Laplacian and Gaussian additive noise.
This paper presents a new cooperative technique for solving the dense stereo correspondence problem in natural images using mean field theory (MFT). Given a gray scale stereo image pair, the disparity map for the scene is modeled as a locally interconnected network of graded neurons. The network encodes the correspondence problem as an energy function composed of terms representing disparity uniqueness, disparity continuity, and system stability evaluated at each neuron. A MFT approximation to the simulated annealing process commonly used to locate the minimum energy solution for the disparity map is introduced and developed. Results using this approach are compared with those from a standard simulated annealing algorithm and demonstrate a significant improvement in rate of convergence with comparable solution quality.
We have developed a system, Generalized cylinder Recognition Using Perceptual Organization (GRUPO), that performs model-based recognition of the projections of generalized cylinders. Motivated by psychological theory, the approach uses perceptual organization, the grouping of structurally significant features, to limit the object and viewpoint search spaces in recognition. The system receives feature data from a segmentation based on perceptual organization and ranks the object space according to estimates of conditional object probabilities. Depth information is not used in the approach.
To complete the recognition system, several problems were solved. For modeling, theoretical contributions include a proof for the invariance of discontinuities to projection, a method to find the axis of symmetry1, and a technique for determining self-occlusion. For the recognition process, solutions to search administration, feature matching, probabilistic search of the object space, and final template matching have been developed. The theory has been implemented and tested on synthetic data.