## 1.

## Introduction

Recently, kernel-based target-tracking methods have received considerable attention in the computer vision field.^{1, 2, 3} A key issue in the development of those methods is the construction of a target model. Comaniciu designed the target model with an isotropic kernel.^{1} Yilmaz defined the target model by cascading two Epanechnikov kernels.^{2} Hager constructed the target model with multiple kernels of different tracking structures.^{3} IR images are the thermal images that are extremely noisy due to rampant systemic noise or color noise sources incurred by the sensing instrument and the noise from the environment.^{4} In most cases, the target region with a common tracker is vaguely located because of the noise. A target model based on the located target region is thus improperly computed. This may cause the tracker to fail to capture the target completely or even to lose the target in the successive tracking process. Thus, it is required to identify a more realistic target model of the IR target for the tracking task. This letter aims to extend the current kernel-based target-tracking method to achieve a robust tracking performance with a well-designed target model.

## 2.

## Multi-Information Incorporation Kernel-Based Target Model

Let
${\left\{{x}_{i}\right\}}_{i=1\cdots n}$
be the normalized pixel locations in the target region with center
$c$
in the current frame. The function
$b:{R}^{2}\to \{1\dots m\}$
(
$m$
-bin histograms are used) associates to the pixel at location
${x}_{i}$
the index
$b\left({x}_{i}\right)$
of its bin in the quantized feature space. The probability of the feature (intensity values are commonly used)
$u=1\dots m$
in the target model is computed as^{1}

## 1

$${q}_{u}=C\sum _{i=1}^{n}k\left({\parallel \frac{{x}_{i}-c}{h}\parallel}^{2}\right)\delta [b\left({x}_{i}\right)-u],$$^{2, 5}In Ref. 5, the kernel is defined as:

## 2

$${K}_{{h}_{s},{h}_{r}}\left(x\right)=\frac{C}{{h}_{s}^{2}{h}_{r}^{p}}{k}_{s}\left({\parallel \frac{{x}^{s}}{{h}_{s}}\parallel}^{2}\right){k}_{r}\left({\parallel \frac{{x}^{r}}{{h}_{r}}\parallel}^{2}\right),$$## 3

$${q}_{u}=\frac{C}{{h}_{s}^{2}{h}_{r}^{p}}\sum _{i=1}^{n}{k}_{s}\left({\parallel \frac{{x}_{i}^{s}-c}{{h}_{s}}\parallel}^{2}\right){k}_{r}\left({\parallel \frac{{x}_{i}^{r}-v}{{h}_{r}}\parallel}^{2}\right)\delta [b\left({x}_{i}\right)-u],$$Equations 1, 3, do not pay much attention to the uneven distribution of the intensity values of the pixels in the target region. Moreover, the target center and the kernel bandwidth, which are important parameters for kernel density estimation, are not clearly shown. Here we present a new method for designing a well-performing kernel-based target model. The final target model with the kernel density estimate method incorporates intensity value, spatial relation, and local standard deviation information of the pixels in the target region. Furthermore, the computed kernel density is more approximate to the true distribution of the intensity values of the tracked target.

For an IR image, the local standard deviation of the pixel
${x}_{i}$
can be computed as^{2}

## 4

$$S\left({x}_{i}\right)={\left\{\frac{1}{\mid M\mid -1}\sum _{X\u220aM}{[I\left({x}_{i}\right)-I\left(X\right)]}^{2}\right\}}^{1\u22152},$$## 6.

Then the component of center $c=({c}_{x},{c}_{y})$ of kernel ${k}_{s}(\u2022)$ is computed as

In addition, center $v$ of kernel ${k}_{r}(\u2022)$ is defined as the quantized intensity value at position $({c}_{x},{c}_{y})$ . Zeroth-moment information is also used to set the search window size in Ref. 6. Illumined by this work, we set the kernel bandwidth based on a function of the zeroth moment of the local standard deviation image. If the maximum local standard deviation value is denoted as ${\delta}_{\mathrm{max}}$ in the target region of a certain IR image, the kernel bandwidth ${h}_{s}={h}_{r}=({h}_{x},{h}_{y})$ is defined as## 8.

## 9

$${q}_{u}=\frac{C}{{h}_{s}^{2}{h}_{r}^{p}}\sum _{i=1}^{n}{k}_{s}\left({\parallel \frac{{x}_{i}^{s}-c}{{h}_{s}}\parallel}^{2}\right){k}_{r}\left({\parallel \frac{{x}_{i}^{r}-v}{{h}_{r}}\parallel}^{2}\right)\delta [b\left({x}_{i}\right)-u],$$## 3.

## Experimental Results

In our experiments, an outer margin of $10\phantom{\rule{0.3em}{0ex}}\text{pixels}$ from the target region forms the background sample. For the target region, a Gaussian kernel is adopted, while for background region, we use a reverse Gaussian kernel.

Our insight is that the best-designed target model can best distinguish between target and background for a robust tracking task. The discrimination of different target models can be embodied by relative entropy values which are given by

## 10

$$W(p,b)=-\sum _{u=1}^{m}p\left(u\right)\mathrm{log}[p\left(u\right)\u2215b\left(u\right)]-\sum _{u=1}^{m}b\left(u\right)\mathrm{log}[b\left(u\right)\u2215p\left(u\right)]$$## Table 1

Relative entropy values of different target model representations.

Information Used in Kernel Density Estimation | Relative Entropy Values | |||||||
---|---|---|---|---|---|---|---|---|

A1 | A2 | B1 | B2 | C1 | C2 | D1 | D2 | |

Spatial relation [Eq. 1] | $-8.03$ | $-4.59$ | $-9.11$ | $-5.10$ | $-3.54$ | $-1.76$ | $-3.95$ | $-2.64$ |

$\text{Intensity}+\text{spatial}$ [Eq. 3] | $-8.15$ | $-4.61$ | $-11.28$ | $-8.09$ | $-2.70$ | $-1.45$ | $-4.60$ | $-2.84$ |

Our method [Eq. 9] | $-11.67$ | $-10.18$ | $-12.67$ | $-11.00$ | $-4.83$ | $-6.62$ | $-4.97$ | $-3.99$ |

We also embedded the proposed kernel density estimation in a mean shift tracking system. Figure 3 shows some selected frames of a 180-frame test video sequence where each frame is $128\times 128\phantom{\rule{0.3em}{0ex}}\text{pixels}$ . Here, the intensity space is taken as the feature space and it is quantized into 32 bins. The tracking algorithm with different target model constructions was developed in Matlab7.0 on a Pentium 4 platform. In Figs. 4a and 4b, the rectangles on the left IR images show the initial target bounding box and the plots on the right show the tracking performances of mean shift tracking algorithm with different target model representations. It is shown that the proposed method is more effective to help to track the target with minor prediction errors, and the superior performance is obvious when the initial selected target region is poorly located. Undoubtedly, the additional computational complexity incurred by the proposed target model representation per frame is dominated by the computation of local standard deviation and moment. Based on the target information in the previous frames, we perform this computation in a region that is $2\phantom{\rule{0.3em}{0ex}}\text{to}\phantom{\rule{0.3em}{0ex}}3\phantom{\rule{0.3em}{0ex}}\text{pixels}$ larger than the actual target region size. The current implementations of the mean shift tracking algorithm with the initial target bounding box illustrated in Fig. 4a are capable of tracking at 15, 14, and $13\phantom{\rule{0.3em}{0ex}}\text{frames}\u2215\mathrm{s}$ for the target models obtained with Eqs. 1, 3, 9, respectively. As such, if the tracking algorithm adopts the initial target bounding box shown in Fig. 4b, the target models represented by Eqs. 1, 3, 9 enable tracking at frame rates of 15, 14, and $12\phantom{\rule{0.3em}{0ex}}\text{frames}\u2215\mathrm{s}$ , respectively. From this, we find that the tracking algorithm with the proposed target model construction is competent and a little more complex with respect to computational complexity and cost of implementation.

## 4.

## Conclusions

A new method that incorporates multi-information into the kernel density estimation of an IR target model was proposed. The local standard deviation information was designed to select the appropriate target center and kernel bandwidth. This constructed target model was evaluated based on the relative entropy of two classes and applied in a mean shift tracking system for IR target tracking to verify the effectiveness.

## Acknowledgments

We would like to thank the anonymous reviewers for their valuable comments. This work is partially supported by Aeronautics Science Fund (China) under Grant No. 04F57004.