Recently, kernel-based target-tracking methods have received considerable attention in the computer vision field.1, 2, 3 A key issue in the development of those methods is the construction of a target model. Comaniciu designed the target model with an isotropic kernel.1 Yilmaz defined the target model by cascading two Epanechnikov kernels.2 Hager constructed the target model with multiple kernels of different tracking structures.3 IR images are the thermal images that are extremely noisy due to rampant systemic noise or color noise sources incurred by the sensing instrument and the noise from the environment.4 In most cases, the target region with a common tracker is vaguely located because of the noise. A target model based on the located target region is thus improperly computed. This may cause the tracker to fail to capture the target completely or even to lose the target in the successive tracking process. Thus, it is required to identify a more realistic target model of the IR target for the tracking task. This letter aims to extend the current kernel-based target-tracking method to achieve a robust tracking performance with a well-designed target model.
Multi-Information Incorporation Kernel-Based Target Model
Let be the normalized pixel locations in the target region with center in the current frame. The function ( -bin histograms are used) associates to the pixel at location the index of its bin in the quantized feature space. The probability of the feature (intensity values are commonly used) in the target model is computed as1is the Kronecker delta function, is the normalization constant, is the common profile used in corresponding feature domain, and is the kernel bandwidth. Cascading two kernels is another way to estimate the kernel density in the target region.2, 5 In Ref. 5, the kernel is defined as: is the spatial part, is the range part of a feature vector, and are the common profiles used in corresponding domain, and are the employed kernel bandwidths, and is the image vector dimension. Thus, the probability of the feature in the target model is given by and are the centers of the corresponding kernels. Here is used to define the spatial relation of the intensity values through the Euclidean distance of its spatial position from the target center, and is used as a weighting factor in the intensity values histogram.
Equations 1, 3, do not pay much attention to the uneven distribution of the intensity values of the pixels in the target region. Moreover, the target center and the kernel bandwidth, which are important parameters for kernel density estimation, are not clearly shown. Here we present a new method for designing a well-performing kernel-based target model. The final target model with the kernel density estimate method incorporates intensity value, spatial relation, and local standard deviation information of the pixels in the target region. Furthermore, the computed kernel density is more approximate to the true distribution of the intensity values of the tracked target.
For an IR image, the local standard deviation of the pixel can be computed as2and denote the gray values of pixel and pixel , respectively (pixel is the pixel around pixel in a predefined window), and denotes number of pixels in the neighborhood. Figure 1 shows the target region and rough contour in the local standard deviation images are clearly emphasized, and this is an indication that we can use the information to set the target center and the kernel bandwidth. For a discrete 2-D local standard deviation image, the zeroth moment can be defined as is the local standard deviation of a pixel at position . The first moment is given by
Then the component of center of kernel is computed asof kernel is defined as the quantized intensity value at position . Zeroth-moment information is also used to set the search window size in Ref. 6. Illumined by this work, we set the kernel bandwidth based on a function of the zeroth moment of the local standard deviation image. If the maximum local standard deviation value is denoted as in the target region of a certain IR image, the kernel bandwidth is defined as and are the factors that are determined by our understanding of the target distribution. Thus, the target model representation is then defined by and are computed by Eqs. 7, 8, respectively; and is obtained with the value at position in the quantized intensity value space.
In our experiments, an outer margin of from the target region forms the background sample. For the target region, a Gaussian kernel is adopted, while for background region, we use a reverse Gaussian kernel.
Our insight is that the best-designed target model can best distinguish between target and background for a robust tracking task. The discrimination of different target models can be embodied by relative entropy values which are given byand are the target kernel density distribution and the background kernel density distribution, respectively. Here is a negative value, and a small means a high separation power for target and background by the corresponding kernel density estimation method. In Fig. 2, eight typical -pixels IR images are selected to confirm the validity of our approach. The rectangles in the IR images show the target regions. Table 1 shows values of different representations of the target model with several kernel density estimation methods. Here, we find that the method, which incorporates multi-information of target region, is more effective where used in a tracking framework because the discrimination of the target and background indicated by values. Moreover, when the target region is poorly located, the superiority of our method over two other methods is evident.
Relative entropy values of different target model representations.
|Information Used in Kernel Density Estimation||Relative Entropy Values|
|Spatial relation [Eq. 1]|
|Our method [Eq. 9]|
We also embedded the proposed kernel density estimation in a mean shift tracking system. Figure 3 shows some selected frames of a 180-frame test video sequence where each frame is . Here, the intensity space is taken as the feature space and it is quantized into 32 bins. The tracking algorithm with different target model constructions was developed in Matlab7.0 on a Pentium 4 platform. In Figs. 4a and 4b, the rectangles on the left IR images show the initial target bounding box and the plots on the right show the tracking performances of mean shift tracking algorithm with different target model representations. It is shown that the proposed method is more effective to help to track the target with minor prediction errors, and the superior performance is obvious when the initial selected target region is poorly located. Undoubtedly, the additional computational complexity incurred by the proposed target model representation per frame is dominated by the computation of local standard deviation and moment. Based on the target information in the previous frames, we perform this computation in a region that is larger than the actual target region size. The current implementations of the mean shift tracking algorithm with the initial target bounding box illustrated in Fig. 4a are capable of tracking at 15, 14, and for the target models obtained with Eqs. 1, 3, 9, respectively. As such, if the tracking algorithm adopts the initial target bounding box shown in Fig. 4b, the target models represented by Eqs. 1, 3, 9 enable tracking at frame rates of 15, 14, and , respectively. From this, we find that the tracking algorithm with the proposed target model construction is competent and a little more complex with respect to computational complexity and cost of implementation.
A new method that incorporates multi-information into the kernel density estimation of an IR target model was proposed. The local standard deviation information was designed to select the appropriate target center and kernel bandwidth. This constructed target model was evaluated based on the relative entropy of two classes and applied in a mean shift tracking system for IR target tracking to verify the effectiveness.
We would like to thank the anonymous reviewers for their valuable comments. This work is partially supported by Aeronautics Science Fund (China) under Grant No. 04F57004.