Most of the visual tracking algorithms are very sensitive to the initialized bounding-box of the tracking object, while, how to obtain a precise bounding-box in the first frame needs further research. In this paper, we propose an automatic algorithm to refine the references of the tracking object after a roughly selected bounding-box in the first frame. Based on the input rough location and scale information, the proposed algorithm exploits the region merger algorithm based on maximal similarity to segment the superpixel regions into foreground or background. In order to improve the segmentation effect, a feature clustering strategy is exploited to obtain reliable foreground label and background label and color histogram in HSI space is exploited to describe the superpixel feature. The final refinement bounding-box is the minimal enclosing rectangle of the foreground region. Extensive experiments are performed and the results indicate that the proposed algorithm can reliably refine the initial bounding-box relying only on the first frame information and improve the robustness of the tracking algorithms distinctively.
In recent years, several visual tracking methods have applied multilayer convolutional features to correlation filters, but they mostly use fixed weights to fuse the multilayer response maps, which is difficult to adapt to various scene changes. To address this problem, a robust tracking algorithm based on adaptive fusion of multilayer response maps is proposed. In this paper, we extract multilayer convolutional features from the target’s candidate area to improve the tracking robustness and the translation correlation filter is feed with CNN features extracted from each layer. Different from previous methods, we proposed a fast covariance intersection algorithm to adaptive fuse the multilayer response maps. After the final target center position is determined, we adopted a 1D scale filter through multi-scale sampling with HOG features to handle large scale variations. Moreover, in order to solve the problem of tracking drifts due to the severe occlusion and error accumulation, we present a new random update mechanism to update the translation filters. The experimental results on some challenging benchmark datasets show that the proposed algorithm achieves the outstanding performance against the state-of-the-art tracking methods.
Aiming at the problem of complicated dynamic scenes in visual target tracking, a multi-feature fusion tracking algorithm based on covariance matrix is proposed to improve the robustness of the tracking algorithm. In the frame-work of quantum genetic algorithm, this paper uses the region covariance descriptor to fuse the color, edge and texture features. It also uses a fast covariance intersection algorithm to update the model. The low dimension of region covariance descriptor, the fast convergence speed and strong global optimization ability of quantum genetic algorithm, and the fast computation of fast covariance intersection algorithm are used to improve the computational efficiency of fusion, matching, and updating process, so that the algorithm achieves a fast and effective multi-feature fusion tracking. The experiments prove that the proposed algorithm can not only achieve fast and robust tracking but also effectively handle interference of occlusion, rotation, deformation, motion blur and so on.
In visual tracking, deep learning with offline pretraining can extract more intrinsic and robust features. It has significant success solving the tracking drift in a complicated environment. However, offline pretraining requires numerous auxiliary training datasets and is considerably time-consuming for tracking tasks. To solve these problems, a multiscale sparse networks-based tracker (MSNT) under the particle filter framework is proposed. Based on the stacked sparse autoencoders and rectifier linear unit, the tracker has a flexible and adjustable architecture without the offline pretraining process and exploits the robust and powerful features effectively only through online training of limited labeled data. Meanwhile, the tracker builds four deep sparse networks of different scales, according to the target’s profile type. During tracking, the tracker selects the matched tracking network adaptively in accordance with the initial target’s profile type. It preserves the inherent structural information more efficiently than the single-scale networks. Additionally, a corresponding update strategy is proposed to improve the robustness of the tracker. Extensive experimental results on a large scale benchmark dataset show that the proposed method performs favorably against state-of-the-art methods in challenging environments.
Convolutional Neural Networks (CNN) have dramatically boosted the performance of various computer vision tasks except visual tracking due to the lack of training data. In this paper, we pre-train a deep CNN offline to classify the 1 million images from 256 classes with very leaky non-saturating neurons for training acceleration, which is transformed to a discriminative classifier by adding an additional classification layer. In addition, we propose a novel approach for combining increasingly our CNN classifiers in a “cascade” structure through a modification of the AdaBoost framework, and then transfer the selected discriminative features from the ensemble of CNN classifiers to the robust visual tracking task, by updating online to robustly discard the background regions from promising object-like region to cope with appearance changes of the target. Extensive experimental evaluations on an open tracker benchmark demonstrate outstanding performance of our tracker by improving tracking success rate and tracking precision on an average of 9.2% and 13.9% at least over other state-of-the-art trackers.