In this paper, we propose a robust visual tracking algorithm based on online learning of a joint sparse dictionary. The joint sparse dictionary consists of positive and negative sub-dictionaries, which model foreground and background objects respectively. An online dictionary learning method is developed to update the joint sparse dictionary by selecting both positive and negative bases from bags of positive and negative image patches/templates during tracking. A linear classifier is trained with sparse coefficients of image patches in the current frame, which are calculated using the joint sparse dictionary. This classifier is then used to locate the target in the next frame. Experimental results show that our tracking method is robust against object variation, occlusion and illumination change.
This paper presents a new fully automatic approach for multi-view urban scene reconstruction. Our algorithm is based on the Manhattan-World assumption, which can provide compact models while preserving fidelity of synthetic architectures. Starting from a dense point cloud, we extract its main axes by global optimization, and construct a nonuniform volume based on them. A graph model is created from volume facets rather than voxels. Appropriate edge weights are defined to ensure the validity and quality of the surface reconstruction. Compared with the common pointcloud- to-model methods, the proposed methodology exploits image information to unveil the real structures of holes in the point cloud. Experiments demonstrate the encouraging performance of the algorithm.
Vehicle color recognition plays an important role in the intelligent transportation system. Most of the state-of-art methods roughly take all pixels into consideration, but many parts of cars such as car windows and wheels contain no color information. Also these methods do not work well enough in reducing the influence of sunlight. In this paper, we propose a novel approach that aims to estimate the RGB value of the car body rather than just classify the vehicle’s color and achieve state-of-art performance. We try to filter the useless parts automatically and estimate the influence of sunlight on each pixel by introducing the specular-free image and the weighted-light-influence image. Experimental results demonstrate the performance of the proposed scheme in differentiating cars with very similar color.
In this paper, we propose a novel approach for violence detection and localization in a public scene. Currently, violence detection is considerably under-researched compared with the common action recognition. Although existing methods can detect the presence of violence in a video, they cannot precisely locate the regions in the scene where violence is happening. This paper will tackle the challenge and propose a novel method to locate the violence location in the scene, which is important for public surveillance. The Gaussian Mixed Model is extended into the optical flow domain in order to detect candidate violence regions. In each region, a new descriptor, Histogram of Optical Flow Orientation (HOFO), is proposed to measure the spatial-temporal features. A linear SVM is trained based on the descriptor. The performance of the method is demonstrated on the publicly available data sets, BEHAVE and CAVIAR.
This paper presents a robust method to search for the correct SIFT keypoint matches with adaptive distance ratio threshold. Firstly, the reference image is analyzed by extracting some characteristics of its SIFT keypoints, such as their distance to the object boundary and the number of their neighborhood keypoints. The matching credit of each keypoint is evaluated based on its characteristics. Secondly, an adaptive distance ratio threshold for the keypoint is determined based on its matching credit to identify the correctness of its best match in the source image. The adaptive threshold loosens the matching conditions for keypoints of high matching credits and tightens the conditions for those of low matching credits. Our approach improves the scheme of SIFT keypoint matching by applying adaptive distance ratio threshold rather than global threshold that ignores different matching credits of various keypoints. The experiment results show that our algorithm outperforms the standard SIFT matching method in some complicated cases of object recognition, in which it discards more false matches as well as preserves more correct matches.
Acquisition-to-acquisition signal intensity variations (non-standardness) are inherent in MR images. Standardization
is a post processing method for correcting inter-subject intensity variations through transforming all
images from the given image gray scale into a standard gray scale wherein similar intensities achieve similar
tissue meanings. The lack of a standard image intensity scale in MRI leads to many difficulties in tissue characterizability,
image display, and analysis, including image segmentation. This phenomenon has been documented
well; however, effects of standardization on medical image registration have not been studied yet. In this paper,
we investigate the influence of intensity standardization in registration tasks with systematic and analytic evaluations
involving clinical MR images. We conducted nearly 20,000 clinical MR image registration experiments
and evaluated the quality of registrations both quantitatively and qualitatively. The evaluations show that intensity
variations between images degrades the accuracy of registration performance. The results imply that the
accuracy of image registration not only depends on spatial and geometric similarity but also on the similarity of
the intensity values for the same tissues in different images.
Scale is a fundamental concept in computer vision and pattern recognition, especially in the fields of shape analysis, image segmentation, and registration. It represents the level of detail of object information in scenes. Global scale methods in image processing process the scene at each of various fixed scales and combine the results, as in scale space approaches. Local scale approaches define the largest homogeneous region at each point, and treat these as fundamental units. A similar dichotomy exists for describing shapes also. To vary the level of detail depending on application, it is desirable to be able to detect dominant points on shape boundaries at different scales. In this paper, we compare global and local scale approaches to shape analysis. For global scale, the Curvature Scale Space (CSS) method is selected, which is a state of the art shape descriptor, and is used in the MPEG-7 standard. The local scale approach is based on the notion of curvature-scale (c-scale), which is a new local scale concept that brings the idea of local morphometric scale (such as ball-, tensor-, and generalized scale) developed for images to the realm of boundaries. All previous methods of extracting dominant points lack this concept of a local scale. In this paper, we present a thorough evaluation of these global and local scale methods. Our analysis indicates that locally adaptive scale has advantages over global scale in shape description, just as it has also been demonstrated in image filtering, segmentation, and registration.
Model-based segmentation approaches, such as those employing Active Shape Models (ASMs), have proved to be
useful for medical image segmentation and understanding. To build the model, however, we need an annotated
training set of shapes wherein corresponding landmarks are identified in every shape. Manual positioning of
landmarks is a tedious, time consuming, and error prone task, and almost impossible in the 3D space. In an
attempt to overcome some of these drawbacks, we have devised several automatic methods under two approaches:
c-scale based and shape variance based. The c-scale based methods use the concept of local curvature to find
landmarks on the mean shape of the training set. These landmarks are then propagated to all the shapes of
the training set to establish correspondence in a local-to-global manner. The variance-based method is guided
by the strategy of equalization of the shape variance contained in the training set for selecting landmarks. The
main premise here is that this strategy itself takes care of the correspondence issue and at the same time deploys
landmarks very frugally and optimally considering shape variations. The desired landmarks are positioned
around each contour so as to equally distribute the total variance existing in the training set in a global-to-local
manner. The methods are evaluated on 40 MRI foot data sets and compared in terms of compactness. The
results show that, for the same number of landmarks, the proposed methods are more compact than manual and
equally spaced methods of annotation, and the variance equalization method tops the list.
A multi-tensor model with identifiable parameters is developed for diffusion weighted MR images. A new parameterization method guarantees the symmetric positive-definiteness of the diffusion tensor. We set up a Bayesian method for parameter estimation. To investigate properties of the method, Monte Carlo simulated data from three distinct DTI direction schemes have been analyzed. The multi-tensor model with automatic model selection has also been applied to a healthy human brain dataset. Standard tensor-derived maps are obtained when the single-tensor model is fitted to a region of interest with a single dominant fiber direction. High anisotropy diffusion flows and main diffusion directions can be shown clearly in the FA map and diffusion ellipsoid map. For another region containing crossing fiber bundles, we estimate and display the ellipsoid map under the single tensor and double-tensor regimes of the multi-tensor model, suitably thresholding the Bayes factor for model selection.
Segmentation of organs in medical images is a difficult task requiring very often the use of model-based approaches.
To build the model, we need an annotated training set of shape examples with correspondences
indicated among shapes. Manual positioning of landmarks is a tedious, time-consuming, and error prone task,
and almost impossible in the 3D space. To overcome some of these drawbacks, we devised an automatic method
based on the notion of c-scale, a new local scale concept. For each boundary element b, the arc length of the
largest homogeneous curvature region connected to b is estimated as well as the orientation of the tangent at b.
With this shape description method, we can automatically locate mathematical landmarks selected at different
levels of detail. The method avoids the use of landmarks for the generation of the mean shape. The selection of
landmarks on the mean shape is done automatically using the c-scale method. Then, these landmarks are propagated
to each shape in the training set, defining this way the correspondences among the shapes. Altogether
12 strategies are described along these lines. The methods are evaluated on 40 MRI foot data sets, the object of
interest being the talus bone. The results show that, for the same number of landmarks, the proposed methods
are more compact than manual and equally spaced annotations. The approach is applicable to spaces of any
dimensionality, although we have focused in this paper on 2D shapes.
In this paper, we propose three novel and important methods for the registration of histological images for 3D
reconstruction. First, possible intensity variations and nonstandardness in images are corrected by an intensity
standardization process which maps the image scale into a standard scale where the similar intensities correspond
to similar tissues meaning. Second, 2D histological images are mapped into a feature space where continuous
variables are used as high confidence image features for accurate registration. Third, we propose an automatic
best reference slice selection algorithm that improves reconstruction quality based on both image entropy and
mean square error of the registration process. We demonstrate that the choice of reference slice has a significant
impact on registration error, standardization, feature space and entropy information. After 2D histological slices
are registered through an affine transformation with respect to an automatically chosen reference, the 3D volume
is reconstructed by co-registering 2D slices elastically.
A new boundary shape description based on the notion of curvature-scale is presented. This shape descriptor performs better than the commonly used Rosenfeld's method of curvature estimation and can be applied directly to digital boundaries without requiring prior approximations. It can extract special points of interest such as convex and concave corners, straight lines, circular segments, and inflection points. The results show that this method produces a complete boundary shape description capable of handling different levels of shape detail. It also has numerous potential applications such as automatic landmark tagging which becomes necessary to build model-based approaches toward the goal of organ modelling and segmentation.
In this paper we propose a novel 3D face recognition system. Furthermore we propose and discuss the
development of a 3D reconstruction system designed specifically for the purpose of face recognition. The
reconstruction subsystem utilises a capture rig comprising of six cameras to obtain two independent stereo
pairs of the subject face during a structured light projection with the remaining two cameras obtaining texture
data under normal lighting conditions. Whilst the most common approaches to 3D reconstruction use least
square comparison of image intensity values, our system achieves dense point matching using Gabor
Wavelets as the primary correspondence measure. The matching process is aided by Voronoi segmentation
of the input images using strong confidence correlations as Voronoi seeds. Additional matches are then
propagated outwards from the initial seed matches to produce a dense point cloud and surface model. Within
the recognition subsystem models are first registered to a generic head model, and then an ICP variant is
applied between the recognition subject and each model in the comparison database, using the average
point-to-plane error as the recognition metric. Our system takes full advantage of the additional information
obtained from the shape and structure of the face, thus combating some of the inherent weaknesses of
traditional 2D methods such as pose and illumination variations. This novel reconstruction / recognition
process achieves 98.2% accuracy on databases containing in excess of 175 meshes.
A feature extracting method based on wavelets for Fourier Transform Infrared (FTIR) cancer data analysis is presented in this paper. A set of low frequency wavelet basis is used to represent FTIR data to reduce data dimension and remove noise. The fuzzy C-means algorithm is used to classify the data. Experiments are conducted to compare classification performance using wavelet features and the original FTIR data provided by the Derby City General Hospital in the UK. Experiments show that only 30 wavelet features are needed to represent 901 wave numbers of the FTIR data to produce good clustering results.