1 November 2006 Multi-information incorporation approach to kernel-based infrared target model construction with application to target tracking
Author Affiliations +
We present an approach that incorporates multi-information, including intensity value, spatial relation, and local standard deviation information of the pixels in target region, into kernel density estimation for constructing the kernel-based infrared (IR) target model. The incorporated information can complement each other for a target-tracking task. This constructed target model is evaluated based on the relative entropy of the two classes and is applied in a mean shift tracking system for IR target tracking to verify the effectiveness.



Recently, kernel-based target-tracking methods have received considerable attention in the computer vision field.1, 2, 3 A key issue in the development of those methods is the construction of a target model. Comaniciu designed the target model with an isotropic kernel.1 Yilmaz defined the target model by cascading two Epanechnikov kernels.2 Hager constructed the target model with multiple kernels of different tracking structures.3 IR images are the thermal images that are extremely noisy due to rampant systemic noise or color noise sources incurred by the sensing instrument and the noise from the environment.4 In most cases, the target region with a common tracker is vaguely located because of the noise. A target model based on the located target region is thus improperly computed. This may cause the tracker to fail to capture the target completely or even to lose the target in the successive tracking process. Thus, it is required to identify a more realistic target model of the IR target for the tracking task. This letter aims to extend the current kernel-based target-tracking method to achieve a robust tracking performance with a well-designed target model.


Multi-Information Incorporation Kernel-Based Target Model

Let {xi}i=1n be the normalized pixel locations in the target region with center c in the current frame. The function b:R2{1m} ( m -bin histograms are used) associates to the pixel at location xi the index b(xi) of its bin in the quantized feature space. The probability of the feature (intensity values are commonly used) u=1m in the target model is computed as1


where δ is the Kronecker delta function, C is the normalization constant, k() is the common profile used in corresponding feature domain, and h is the kernel bandwidth. Cascading two kernels is another way to estimate the kernel density in the target region.2, 5 In Ref. 5, the kernel is defined as:


where xs is the spatial part, xr is the range part of a feature vector, ks() and kr() are the common profiles used in corresponding domain, hs and hr are the employed kernel bandwidths, and p is the image vector dimension. Thus, the probability of the feature u=1m in the target model is given by


where c and v are the centers of the corresponding kernels. Here ks() is used to define the spatial relation of the intensity values through the Euclidean distance of its spatial position from the target center, and kr() is used as a weighting factor in the intensity values histogram.

Equations 1, 3, do not pay much attention to the uneven distribution of the intensity values of the pixels in the target region. Moreover, the target center and the kernel bandwidth, which are important parameters for kernel density estimation, are not clearly shown. Here we present a new method for designing a well-performing kernel-based target model. The final target model with the kernel density estimate method incorporates intensity value, spatial relation, and local standard deviation information of the pixels in the target region. Furthermore, the computed kernel density is more approximate to the true distribution of the intensity values of the tracked target.

For an IR image, the local standard deviation of the pixel xi can be computed as2


where I(xi) and I(X) denote the gray values of pixel xi and pixel X , respectively (pixel X is the pixel around pixel xi in a predefined window), and M denotes number of pixels in the neighborhood. Figure 1 shows the target region and rough contour in the local standard deviation images are clearly emphasized, and this is an indication that we can use the information to set the target center and the kernel bandwidth. For a discrete 2-D local standard deviation image, the zeroth moment can be defined as


where rows and cols are the sizes of the analyzed target region along different orientation, and S(i,j) is the local standard deviation of a pixel at position (i,j) . The first moment is given by



Fig. 1

Original images and the corresponding local standard deviation images: (a) and (c) original IR images and (b) and (d) the corresponding local standard deviation images.


Then the component of center c=(cx,cy) of kernel ks() is computed as


In addition, center v of kernel kr() is defined as the quantized intensity value at position (cx,cy) . Zeroth-moment information is also used to set the search window size in Ref. 6. Illumined by this work, we set the kernel bandwidth based on a function of the zeroth moment of the local standard deviation image. If the maximum local standard deviation value is denoted as δmax in the target region of a certain IR image, the kernel bandwidth hs=hr=(hx,hy) is defined as


where α and β are the factors that are determined by our understanding of the target distribution. Thus, the target model representation is then defined by


where c=(cx,cy) and hs=hr=(hx,hy) are computed by Eqs. 7, 8, respectively; and v is obtained with the value at position (cx,cy) in the quantized intensity value space.


Experimental Results

In our experiments, an outer margin of 10pixels from the target region forms the background sample. For the target region, a Gaussian kernel is adopted, while for background region, we use a reverse Gaussian kernel.

Our insight is that the best-designed target model can best distinguish between target and background for a robust tracking task. The discrimination of different target models can be embodied by relative entropy values which are given by


where p and b are the target kernel density distribution and the background kernel density distribution, respectively. Here W(p,b) is a negative value, and a small W(p,b) means a high separation power for target and background by the corresponding kernel density estimation method. In Fig. 2, eight typical 128×128 -pixels IR images are selected to confirm the validity of our approach. The rectangles in the IR images show the target regions. Table 1 shows W(p,b) values of different representations of the target model with several kernel density estimation methods. Here, we find that the method, which incorporates multi-information of target region, is more effective where used in a tracking framework because the discrimination of the target and background indicated by W(p,b) values. Moreover, when the target region is poorly located, the superiority of our method over two other methods is evident.

Fig. 2

Eight typical IR images and the marked target regions: A1, B1, C1, and D1, IR images with the appropriate target regions; A2, B2, C2, and D2, target regions poorly located of the corresponding IR images.


Table 1

Relative entropy values of different target model representations.

Information Used in Kernel Density EstimationRelative Entropy Values
Spatial relation [Eq. 1] 8.03 4.59 9.11 5.10 3.54 1.76 3.95 2.64
Intensity+spatial [Eq. 3] 8.15 4.61 11.28 8.09 2.70 1.45 4.60 2.84
Our method [Eq. 9] 11.67 10.18 12.67 11.00 4.83 6.62 4.97 3.99

We also embedded the proposed kernel density estimation in a mean shift tracking system. Figure 3 shows some selected frames of a 180-frame test video sequence where each frame is 128×128pixels . Here, the intensity space is taken as the feature space and it is quantized into 32 bins. The tracking algorithm with different target model constructions was developed in Matlab7.0 on a Pentium 4 platform. In Figs. 4a and 4b, the rectangles on the left IR images show the initial target bounding box and the plots on the right show the tracking performances of mean shift tracking algorithm with different target model representations. It is shown that the proposed method is more effective to help to track the target with minor prediction errors, and the superior performance is obvious when the initial selected target region is poorly located. Undoubtedly, the additional computational complexity incurred by the proposed target model representation per frame is dominated by the computation of local standard deviation and moment. Based on the target information in the previous frames, we perform this computation in a region that is 2to3pixels larger than the actual target region size. The current implementations of the mean shift tracking algorithm with the initial target bounding box illustrated in Fig. 4a are capable of tracking at 15, 14, and 13framess for the target models obtained with Eqs. 1, 3, 9, respectively. As such, if the tracking algorithm adopts the initial target bounding box shown in Fig. 4b, the target models represented by Eqs. 1, 3, 9 enable tracking at frame rates of 15, 14, and 12framess , respectively. From this, we find that the tracking algorithm with the proposed target model construction is competent and a little more complex with respect to computational complexity and cost of implementation.

Fig. 3

Test sequence.


Fig. 4

Prediction errors (between prediction and ground truth) of different target model representations: (a) appropriate target region initially selected (target region is 13×9pixels ) and (b) the initial target region is poorly located (target region is 18×12pixels ).




A new method that incorporates multi-information into the kernel density estimation of an IR target model was proposed. The local standard deviation information was designed to select the appropriate target center and kernel bandwidth. This constructed target model was evaluated based on the relative entropy of two classes and applied in a mean shift tracking system for IR target tracking to verify the effectiveness.


We would like to thank the anonymous reviewers for their valuable comments. This work is partially supported by Aeronautics Science Fund (China) under Grant No. 04F57004.


1.  D. Comaniciu, V. Ramesh, and P. Meer, “Kernel-based object tracking,” IEEE Trans. Pattern Anal. Mach. Intell.0162-8828 10.1109/TPAMI.2003.1195991 25(5), 564–577 (2003). Google Scholar

2.  A. Yilmaz, K. Shafique, and M. Shah, “Target tracking in airborne forward looking infrared imagery,” Image Vis. Comput.0262-8856 10.1016/S0262-8856(03)00059-3 21(7), 623–635 (2003). Google Scholar

3.  G. D. Hager, M. Dewan, and C. V. Stewart, “Multiple kernel tracking with SSD,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Vol. 1, pp. 790–797 (2004). Google Scholar

4.  J. Wei and I. Gertner, “Discrimination, tracking, and recognition of small and fast moving objects,” Proc. SPIE0277-786X 4726, 253–266 (2002). Google Scholar

5.  D. Comaniciu and P. Meer, “Mean shift: a robust approach toward feature space analysis,” IEEE Trans. Pattern Anal. Mach. Intell.0162-8828 10.1109/34.1000236 24(5), 603–619 (2002). Google Scholar

6.  G. R. Bradski and C. Santa, “Computer vision face tracking for use in a perceptual user interface,” Intel Technol. J.1535-864X 2(2), 12–21 (1998). Google Scholar

© (2006) Society of Photo-Optical Instrumentation Engineers (SPIE)
Jianguo Ling, Jianguo Ling, Erqi Liu, Erqi Liu, Lei Yang, Lei Yang, Jie Yang, Jie Yang, } "Multi-information incorporation approach to kernel-based infrared target model construction with application to target tracking," Optical Engineering 45(11), 110502 (1 November 2006). https://doi.org/10.1117/1.2388341 . Submission:

Back to Top