## 1.

## Introduction

Kernel-based tracking (KBT), which is also called “mean-shift” tracking, has attracted more and more attention because of its speediness and simplicity.^{1} Generally, a visual target is tracked by extracting some kinds of target’s feature, which can be called a template in the first frame, and then finding the image region that matches the template as closely as possible in remaining frames. One of the most critical challenges in creating a robust tracker is how to update the template. In KBT, the target’s model can be regarded as tracking template and tracking window’s scale is the only parameter. However, in traditional KBT, the target model is not updated during the whole tracking period, which leads to poor localization when the object changes its scale or appearance. Thus, the template updating problem of KBT includes two aspects: target-scale update and target-model update.

There is a simple solution to periodically update the target model according to a constant threshold on the similarity metric used in tracking. The model can be updated if the correlation of the model with the target is higher than a predefined threshold. However, using this simple solution always leads to tracking failure. Because each time the target model is updated, small errors are introduced in the location of the template. With each update, these errors accumulate and the tracking region steadily drifts away from the object. To deal with template-updating problem of KBT, some more effective updating methods have been proposed. The technique of “tracking through scale space” is proposed in Ref. 2. The method uses Lindeberg’s theory to select the best scale of tracking window. In Ref. 3, the multiple kernels–tracking method is proposed, and the dimensionality of the measurement space can be increased by using multiple kernels. In Ref. 4, the target model can be updated according to the distributions of the target intensity and the local standard deviation measures. In Ref. 5, the mean shift is modified to deal with dynamically changing color probability distributions derived from video frame sequences. The modified algorithm is called the continuously adaptive mean shift (CAMSHIFT). In CAMSHIFT, the continuous adaptive distribution image can be directly used to track the human face. However, compared to CAMSHIFT, KBT is more robust than CAMSHIFT in much other application, because the KBT uses a more static kernel histogram as the target model. If the CAMSHIFT and KBT are regarded as two extreme conditions of mean shift, then the proposed tracking algorithm tries to get a balance between the stability of KBT and adaptability of CAMSHIFT for creating a more robust tracker.

In KBT, both the target model and target candidate are characterized by a kernel-based histogram vector. This vector is a kind of image-probability distribution by weighting the histogram with a simple monotonically decreasing kernel profile. A kernel-based histogram can be denoted as follows:

## Eq. 1

$$\stackrel{\u20d7}{q}={\left[{q}_{u}\right]}_{u=1,\dots ,m},\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{0.3em}{0ex}}{q}_{u}=\frac{1}{{C}_{h}}\sum _{i=1}^{n}\text{Kernel}[{X}_{i}-{c}^{0}]\delta [b\left({X}_{i}\right),u],$$Originally, KBT is derived from second-order Taylor expansion of Bhattacharyya coefficient, which is defined as follows:

## Eq. 3

$$B[\stackrel{\u20d1}{p}\left({c}^{k}\right),\stackrel{\u20d1}{q}]=\sum _{u=1}^{m}\sqrt{{p}_{u}\left({c}^{k}\right){q}_{u}}.$$## Eq. 4

$$\Delta {c}^{*}=\underset{\Delta {c}^{k}}{\mathrm{arg}\phantom{\rule{0.2em}{0ex}}\mathrm{max}}\phantom{\rule{0.2em}{0ex}}B[\stackrel{\u20d1}{q},\stackrel{\u20d1}{p}({c}^{k}+\Delta {c}^{k})].$$## Eq. 5

$$\Delta {c}^{k}=\frac{{\sum}_{i=1}^{n}\text{Kernel}({X}_{i}-{c}^{k})w\left({X}_{i}\right)({X}_{i}-{c}^{k})}{{\sum}_{i=1}^{n}\text{Kernel}({X}_{i}-{c}^{k})w\left({X}_{i}\right)},$$For finding the proper $\Delta {c}^{*}$ in 4, an iterated computation is needed by computing the $w\left({X}_{i}\right)$ using 6 and deriving $\Delta {c}^{k}$ using 5. The Bhattacharyya coefficient is computed after the algorithm completion to evaluate the similarity between the target model and the chosen candidate model.

## Continuously Adaptive Distribution

The primary difference between CAMSHIFT and mean shift is that CAMSHIFT uses a continuously adaptive probability distribution, which means the kernel-based histogram is recomputed for each frame. In KBT, the candidate model can also be recomputed and used to create the probability distribution image; thus, the candidate model can be regarded as a continuously adaptive distribution in this letter. And then, the target’s probability distribution image can be created by histogram backprojection (HBP), which replaces the pixel values of the input image with the value of the corresponding bin of the histogram. Thus, if the candidate model can keep proper coherence with the target model in the tracking process, the target can be enhanced by HBP. In order to provide the range of probability values between 0 and 255, the histogram bin values are rescaled as follows:

## Eq. 7

$$\widehat{p}={\left[{\widehat{p}}_{u}\right]}_{u=1\dots m}\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{0.3em}{0ex}}{\widehat{p}}_{u}=\mathrm{min}{(\frac{255}{\mathrm{max}\left(\stackrel{\u20d1}{p}\right)}{p}_{u},255)}_{u=1,\dots ,m}.$$## Zeroth Moment of HBP Image

In CAMSHIFT, the scale of tracking window is determined by zeroth moment. After HBP, given $I(x,y)$ is the intensity of HBP image at $(x,y)$ within the tracking window. The zeroth moment is computed as follows:

And the tracking window’s scale can be determined as follows: where $\alpha $ is the scale proportion of tracking window’s scale to target’s scale. The $\beta $ is the proportion of ${S}_{y}$ to ${S}_{x}$ . Because the size of the search window is rounded up to the current or next-greatest odd number, in practice, the tracking window is set larger than the target. In this paper, $\alpha =1.1$ .## Visual Tracking with Continuous Adaptive Distribution

The proposed algorithm can be divided into three phases. First, estimate the tracked target’s position using the KBT algorithm. Second, determine the tracking window’s scale using the zeroth moment, which is computed by CAMSHIFT. Third, update tracking window’s scale and target model. The proposed algorithm is illustrated by Fig. 1.

As Fig. 1 shows, the target region is denoted as $T{R}^{k-1}$ and corresponding target model is denoted as ${\stackrel{\u20d7}{q}}^{k-1}$ . The target’s template information can be completely included in $T{R}^{k-1}$ and ${\stackrel{\u20d7}{q}}^{k-1}$ . After KBT is executed, the tracking window is moved to the candidate region, which is denoted as $C{R}^{k}$ . In the candidate region, candidate model is denoted as ${\stackrel{\rightharpoonup}{p}}^{k}$ . On the basis of $C{R}^{k}$ and ${\stackrel{\rightharpoonup}{p}}^{k}$ , a probability distribution image can be created, and then the CAMSHIFT can be executed to get the target’s new scale information $({S}_{x},{S}_{y})$ . In the end, ${\stackrel{\rightharpoonup}{p}}^{k}$ , ${\stackrel{\u20d7}{q}}^{k-1}$ , and $({S}_{x},{S}_{y})$ are used to update the template. Here, we give the proposed algorithm in detail. After the initial stage, the tracking starts from the $No.k(k=2)$ frame as follows:

**Input:** the target’s initial center position
$\u27e8{x}_{c}^{k-1},{y}_{c}^{k-1}\u27e9$
; scale
$({S}_{x}^{k-1},{S}_{y}^{k-1})$
in the previous frame.

**Step 1:** Determine tracking region:
$T{R}^{k-1}=\mathrm{RECT}({x}_{c}^{k-1},{y}_{c}^{k-1},{S}_{x}^{k-1},{S}_{y}^{k-1})$
.

**Step 2:** Compute mean-shift vector
$\Delta {c}^{k}=(\Delta {x}^{k},\Delta {y}^{k})$
using 5, 6. Update candidate region:
$C{R}^{k}=\mathrm{RECT}({x}_{c}^{k-1}+\Delta {x}^{k},{y}_{c}^{k-1}+\Delta {y}^{k},{S}_{x}^{k-1},{S}_{y}^{k-1})$
. Extract new candidate model
${\stackrel{\rightharpoonup}{p}}^{k}$
using 2 in
$C{R}^{k}$
.

**Step 3:** Transform
$C{R}^{k}$
by Histogram-Projection according to 7 and
${\stackrel{\rightharpoonup}{p}}^{k}$
.

**Step 4:** Compute the
${M}_{00}$
using 8 iteratively until CAMSHIFT converges.

**Step 5:** Compute tracking window scale
$({S}_{x},{S}_{y})$
using 9, 10.

**Step 6:** According to 3, compute
$B({\stackrel{\rightharpoonup}{q}}^{k-1},{\stackrel{\rightharpoonup}{p}}^{k})$
with new scale:
$({S}_{x},{S}_{y})$
.

**Output:** ** If**
$B({\stackrel{\rightharpoonup}{q}}^{k-1},{\stackrel{\rightharpoonup}{p}}^{k-1})<B({\stackrel{\rightharpoonup}{q}}^{k-1},{\stackrel{\rightharpoonup}{p}}^{k})$

*Else*

** End**.

In the proposed algorithm, except for using the target’s kernel histogram as the tracking feature, the target’s continuously probability distribution image is also used. The zeroth moment of the HBP image is used to estimate the target scale based on CAMSHIFT. However, differences are that the target’s position is computed by KBT in the tracking phase and the Bhattacharyya coefficient is used as an updating indicator and weighting factor in the updating phase. As the updating phase shows the target model has become a dynamic distribution that is adaptive to likelihood between current and previous distributions.

## Results

In the test video sequence, the target’s scale and appearance are changed constantly in a dynamic clutter background. As Figs. 2, 2, 2 show, traditional KBT can not fit the target’s varying scale in tracking process. Figures 2, 2, 2 show that the tracking performance of CAMSHIFT is also inefficient. Because of the clutter background, CAMSHIFT always gets a tracking window that is too large. The tracking results of the proposed algorithm are shown in the Figs. 2, 2, 2; the tracking window’s scale can be properly updated and the target can be discriminated from a clutter background effectively.

In Fig. 3, the similarity between target model and candidate model is also illustrated. Comparing to traditional KBT, the proposed algorithm can keep better coherence with the tracking template, which means the target model has been updated properly.

## Conclusion and Discussion

There are two major contributions in this letter. First, a new scale updating method is proposed. By using target’s continuously adaptive distributions image, the proposed algorithm can update the tracking window’s scale. Second, by integrating the continuous updating method into KBT, the target model can also be updated within the framework of mean shift. Experiments show the proposed method can keep the tracking target even when the target scale and appearance are changed constantly in a clutter background. On the basis of the proposed algorithm, further extensions and improvements could be made by combining multiple color or texture spaces.

## Acknowledgment

This work was supported by the National Natural Science Foundation of China (Grant No. 60775022).

## References

**,” IEEE Trans. Pattern Anal. Mach. Intell., 25 564 –577 (2003). https://doi.org/10.1109/TPAMI.2003.1195991 0162-8828 Google Scholar**

*Kernel-based object tracking***,” 234 –240 (2003). https://doi.org/10.1109/CVPR.2003.1211475 Google Scholar**

*Mean-shift blob tracking through scale space***,” 790 –797 (2004). https://doi.org/10.1109/CVPR.2004.1315112 Google Scholar**

*Multiple kernel tracking with SSD***,” Image Vis. Comput., 21 623 –635 (2003). https://doi.org/10.1016/S0262-8856(03)00059-3 0262-8856 Google Scholar**

*Target tracking in airborne forward looking infrared imagery***,” 214 –219 (1998). Google Scholar**

*Computer vision face tracking as a component of a perceptual user interface*