## 1.

## Introduction

Object tracking is a common vision task to find and follow moving objects between consecutive frames. It has widespread applications in fields ranging from video coding, visual surveillance, and human computer interaction, to intelligent robotics.

Among numerous object tracking algorithms, mean shift (MS) object tracking has recently received growing interest since it was introduced by Comaniciu, Ramesh, and Meer.^{1} This method tracks an object region represented by a spatially weighted intensity histogram. An object function that compares target and candidate kernel densities is formulated using the Bhattacharyya coefficient, and tracking is achieved by optimizing this objective function using the iterative MS algorithm. Though the MS object tracking algorithm performs well on sequences with relatively small object displacement, its performance is not guaranteed when the objects move fast or undergo partial or full occlusion.

To overcome this disadvantage of the MS tracking method, an improved MS object tracking algorithm was proposed in Ref. 2 by initializing MS with the predicted value of a Kalman filter (KF). In Ref. 3, the exact target center is obtained by combining the two estimated target centers obtained by the KF and MS algorithm respectively at each frame. Since the prediction and measurement errors of KF are set as constant, this algorithm is not robust enough. A new object tracking scheme is proposed in Ref. 4 that combines the sum-of-squared-differences object tracking method and MS object tracking method in the KF framework. In this method, to handle partial occlusion, the whole object is represented by a number of elementary MS modules embedded within the object, rather than a single global MS tracker. Therefore, this scheme is time consuming.

In this work, a novel object tracking algorithm based on MS and KF is proposed. First, the system model of KF is constructed, and the center of the object predicted by KF is used as the initial value of the MS algorithm. Then the searching result of MS is fed back as the measurement of KF, and the estimated parameters of KF are adjusted by the Bhattacharyya coefficient adaptively. The proposed algorithm can accurately capture the object’s position when the object undergoes large displacements or occlusion.

The remainder of this work is organized as follows. In Sec. 2, we review MS object tracking briefly. The proposed object tracking algorithm is presented in Sec. 3. Experimental results are given in Sec. 4, followed by conclusions in Sec. 5.

## 2.

## Mean Shift Object Tracking

In the MS object tracking method,^{1} the target model is defined as its normalized color histogram
$\mathbf{q}={\left\{{q}_{u}\right\}}_{u=1,\dots ,m}$
, where
$m$
is the number of bins. The normalized color distribution of a target candidate
$\mathbf{p}\left(\mathbf{y}\right)={\left\{{p}_{u}\left(\mathbf{y}\right)\right\}}_{u=1,\dots ,m}$
centered at
$\mathbf{y}$
in the current frame can be calculated as

## Eq. 1

$${p}_{u}\left(\mathbf{y}\right)={C}_{h}\sum _{i=1}^{{n}_{h}}k\left({\Vert \frac{\mathbf{y}-{\mathbf{x}}_{i}}{h}\Vert}^{2}\right)\delta [b\left({\mathbf{x}}_{i}\right)-u],$$The Bhattacharyya coefficient, which evaluates the similarity of the target model and the target candidate model, is defined as

## Eq. 2

$$\rho \left(\mathbf{y}\right)=\rho [\mathbf{p}\left(\mathbf{y}\right),\mathbf{q}]=\sum _{u=1}^{m}{\left[{p}_{u}\left(\mathbf{y}\right){q}_{u}\right]}^{1\u22152}.$$To find the location corresponding to the target in the current frame, the Bhattacharyya coefficient in Eq. 2 should be maximized as a function of $\mathbf{y}$ , which can be solved by running the MS iterations. We assume that the search for the new target location in the current frame starts at the location ${\mathbf{y}}_{0}$ . At each step of the iterative process, the estimated target moves from ${\mathbf{y}}_{0}$ to the new location ${\mathbf{y}}_{1}$ , defined as

## Eq. 3

$${\mathbf{y}}_{1}=\frac{\sum _{i=1}^{{n}_{h}}{\mathbf{x}}_{i}{w}_{i}g\left({\Vert ({\mathbf{y}}_{0}-{\mathbf{x}}_{i})\u2215h\Vert}^{2}\right)}{\sum _{i=1}^{{n}_{h}}{w}_{i}g\left({\Vert ({\mathbf{y}}_{0}-{\mathbf{x}}_{i})\u2215h\Vert}^{2}\right)},$$## Eq. 4

$${w}_{i}=\sum _{u=1}^{m}{\left[\frac{{q}_{u}}{{p}_{u}\left({\mathbf{y}}_{0}\right)}\right]}^{1\u22152}\delta [b\left({\mathbf{x}}_{i}\right)-u],$$## 3.

## Adaptive Kalman Filter for Object Tracking

In this work, the MS object tracking method is integrated into the KF framework, and an adaptive KF algorithm for object tracking is proposed. First, MS initialized by the predicted value of KF is used to search the target position. Then the searching result of MS is fed back as the measurement of KF, and the estimated parameters of KF are adjusted by the Bhattacharyya coefficient adaptively. For faster implementation, two independent trackers of KF were defined for horizontal and vertical movement.

## 3.1.

### Model of the Kalman Filter

We define the variable as the discrete time $t$ , state vector $\mathbf{X}\left(t\right)$ , measurement vector $\mathbf{Z}\left(t\right)$ , state transition matrix $\mathbf{A}$ , measurement matrix $\mathbf{C}$ , state noise $\mathbf{\nu}\left(t\right)$ , and measurement noise $\mathbf{\mu}\left(t\right)$ . The system is expressed as:

## Eq. 5

$$\{\begin{array}{l}\mathbf{X}\left(t\right)=\mathbf{A}\mathbf{X}(t-1)+\mathbf{\nu}(t-1)\\ \mathbf{Z}\left(t\right)=\mathbf{C}\mathbf{X}\left(t\right)+\mathbf{\mu}\left(t\right)\end{array}\phantom{\}}.$$We assume that $\mathbf{\nu}(t-1)$ and $\mathbf{\mu}\left(t\right)$ are Gaussian random variable with zero mean, so their probability density functions are $N[0,\mathbf{Q}(t-1)]$ and $N[0,\mathbf{R}\left(t\right)]$ , where the covariance matrix $\mathbf{Q}(t-1)$ and $\mathbf{R}\left(t\right)$ are referred to as the transition noise covariance matrix and measurement noise covariance matrix.

We design a model to track object (the details are as follows). The state vector is $\mathbf{X}={(x,v,a)}^{T}$ , where $x,v$ , and $a$ represent the (horizontal or vertical) center, velocity, and acceleration, respectively. The measurement vector is $\mathbf{Z}=x$ . The state transition matrix is

## Eq. 6

$$\mathbf{A}=\left(\begin{array}{ccc}1& \Delta t& 0.5\Delta {t}^{2}\\ 0& 1& \Delta t\\ 0& 0& 1\end{array}\right),$$## Eq. 7

$$\mathbf{Q}(t-1)=\left[\begin{array}{ccc}{\sigma}_{1}^{2}(t-1)& 0& 0\\ 0& 0.5{\sigma}_{1}^{2}(t-1)& 0\\ 0& 0& 0.2{\sigma}_{1}^{2}(t-1)\end{array}\right],$$## 3.2.

### Adaptive Kalman Filter

In the KF algorithm, the measurement error covariance
$\mathbf{R}\left(t\right)$
and Kalman gain are in inverse ratio. As the covariance matrix
$\mathbf{R}\left(t\right)$
approachs zero, the Kalman gain weights the residual more heavily. In this case, the measurement is trusted more and more, while the predicted result is trusted less and less. On the other hand, as the *a-priori* estimate error covariance of KF approaches zero, the Kalman gain weights the residual less heavily. The actual measurement is trusted less and less, while the predicted result is trusted more and more.^{5} Therefore, the system will achieve a near optimal result if we can decide which one to trust. In this work, the so-called adaptive KF allows the estimated parameters
$\mathbf{R}\left(t\right)$
and
$\mathbf{Q}(t-1)$
of KF to adjust automatically according to the Bhattacharyya coefficient of MS object tracking.

In the MS object tracking method, the Bhattacharyya coefficient evaluates the similarity of the target and candidate models. When the tracked object is occluded by other objects or background, the Bhattacharyya coefficient will descend dramatically. Thus, we define a threshold ${T}_{h}$ to determine whether the occlusion happens or not.

Assuming the searching result of MS is ${\widehat{\mathbf{y}}}_{t}$ in the current frame $t$ , the Bhattacharyya coefficient $\rho \left({\widehat{\mathbf{y}}}_{t}\right)$ evaluates the similarity of the target model and the candidate model centered at ${\widehat{\mathbf{y}}}_{t}$ . Since the search result of MS is used as a measurement of KF, in a correction step the Bhattacharyya coefficient is used to adjust the estimate parameters of adaptive KF. If the Bhattacharyya coefficients $\rho \left({\widehat{\mathbf{y}}}_{t}\right)$ is more than the threshold ${T}_{h}$ , then the value of ${\sigma}_{1}^{2}(t-1)$ is set as $\rho \left({\widehat{\mathbf{y}}}_{t}\right)$ , and ${\sigma}_{2}^{2}\left(t\right)$ is $1-\rho \left({\widehat{\mathbf{y}}}_{t}\right)$ . Otherwise, it is reasonable to let ${\sigma}_{1}^{2}(t-1)$ and ${\sigma}_{2}^{2}\left(t\right)$ be zero and infinity, respectively, thus the Kalman gain is a zero value. To smooth temporal variations, the parameters associated with the current frame are obtained through temporal filtering,

## Eq. 8

$$\{\begin{array}{l}{\sigma}_{1}^{2}(t-1)=(1-\lambda ){\widehat{\sigma}}_{1}^{2}(t-1)+\lambda {\sigma}_{1}^{2}(t-2)\\ {\sigma}_{2}^{2}\left(t\right)=(1-\lambda ){\widehat{\sigma}}_{2}^{2}\left(t\right)+\lambda {\sigma}_{2}^{2}(t-1)\end{array}\phantom{\}},$$## Eq. 9

$${\widehat{\sigma}}_{1}^{2}(t-1)=\{\begin{array}{cc}\rho \left({\widehat{\mathbf{y}}}_{t}\right)& \text{if}\phantom{\rule{0.3em}{0ex}}\rho \left({\widehat{\mathbf{y}}}_{t}\right)\u2a7e{T}_{h}\\ 0& \text{otherwise}\end{array}\phantom{\}},$$## Eq. 10

$${\widehat{\sigma}}_{2}^{2}\left(t\right)=\{\begin{array}{cc}1-\rho \left({\widehat{\mathbf{y}}}_{t}\right)& \text{if}\phantom{\rule{0.3em}{0ex}}\rho \left({\widehat{\mathbf{y}}}_{t}\right)\u2a7e{T}_{h}\\ T& \text{otherwise}\end{array}\phantom{\}}.$$*posteriori*estimate of KF approximates to its predicted value, and $\lambda \u220a[0,1]$ is the forgetting factor. The lower $\lambda $ is, the faster the update of ${\sigma}_{1}^{2}(t-1)$ and ${\sigma}_{2}^{2}\left(t\right)$ becomes.

According to the Bhattacharyya coefficient, the KF system can be adjusted automatically to estimate the center of the tracked object. For the sake of clarity, we present here the whole algorithm.

Input: state vector ${\mathbf{X}}_{x}\left(t\right)$ of the target’s horizontal center; state vector ${\mathbf{X}}_{y}\left(t\right)$ of the target’s vertical center and the target model $\mathbf{q}={\left\{{q}_{u}\right\}}_{u=1,\dots ,m}$ .

Step 1: predict the target’s horizontal center and vertical center by using the state equation of KF, respectively.

Step 2: employ MS initialized by the predicted value of KF to search the center of the object in the current frame $t+1$ , then get the search results ${\widehat{\mathbf{y}}}_{t+1}=({\widehat{x}}_{t+1},{\widehat{y}}_{t+1})$ .

Step 3: compute the Bhattacharyya coefficient $\rho \left({\widehat{\mathbf{y}}}_{t+1}\right)$ .

Step 4: according to Eqs. 8, 9, 10, compute the parameters $\mathbf{Q}\left(t\right)$ and $\mathbf{R}(t+1)$ .

Step 5: using ${\widehat{x}}_{t+1}$ and ${\widehat{y}}_{t+1}$ as the measurements of two KFs, compute ${\mathbf{X}}_{x}(t+1)$ and ${\mathbf{X}}_{y}(t+1)$ by the correction step of KF, respectively.

## 4.

## Experimental Results

To demonstrate the robustness and validity of the proposed algorithm, we describe the experiment results on real-life tracking scenarios, and compare the tracking results of the proposed algorithm with the MS object tracking algorithm and the typical KF algorithm. In the typical KF algorithm, the system model of KF is the same as in Sec. 3.1. Both the prediction and measurement errors are set as constant. They are given as 0.8 and 0.2 experimentally. In the experiment, the RGB color space was taken as feature space, and it was quantized into $16\times 16\times 16$ bins. We chose the parameters $T=1000$ , ${T}_{h}=0.6$ , and $\lambda =0.1$ experimentally. The Epanechnikov profile is used for histogram computations.

The test video sequence has 140 frames of $360\times 640\phantom{\rule{0.3em}{0ex}}\text{pixels}$ . The results of frames 17, 105, 120, and 140 are shown in Fig. 1. The target was initialized with a hand-drawn elliptical region of size $45\times 25$ . When the person walks slowly, the proposed algorithm, MS, and typical KF algorithm can accurately capture the target’s position. At frame 79, the person begins to increase his velocity suddenly, and the MS algorithm lost the target completely at frame 105. From frame 117 to 129, the person is occluded by a tree. The MS and typical KF algorithm fail after full occlusion, whereas the proposed algorithm accurately captures the target. The Bhattacharyya coefficient values in the proposed algorithm are shown in Fig. 2. It can be seen that the Bhattacharyya coefficient values descend dramatically when the person is occluded by a tree.

## 5.

## Conclusion

In this work, the MS object tracking method is integrated into the KF framework and an adaptive KF algorithm is proposed. First, MS initialized by the predicted value of KF is used to track the target position. Then the tracking result of MS is fed back as the measurement of KF, and the estimate parameters of KF are adjusted by the Bhattacharyya coefficient adaptively. According to the Bhattacharyya coefficient, the KF can be adjusted automatically to estimate the center of the tracked object. The experimental results demonstrate the robustness and validity of the proposed algorithm.

## References

**,” IEEE Trans. Pattern Anal. Mach. Intell., 25 (5), 564 –577 (2003). https://doi.org/10.1109/TPAMI.2003.1195991 0162-8828 Google Scholar**

*Kernel-based object tracking***,” 70 –73 (2000). Google Scholar**

*Mean shift and optimal prediction for efficient object tracking***,” Proc. SPIE, 7252 72520F (2009). https://doi.org/10.1117/12.806150 0277-786X Google Scholar**

*Hybrid real-time tracking of non-rigid objects under occlusions***,” Image Vis. Comput., 25 (8), 1205 –1216 (2007). https://doi.org/10.1016/j.imavis.2006.07.016 0262-8856 Google Scholar**

*Robust tracking with motion estimation and local kernel-based color modeling***,” (2001). http://www.cs.unc.edu/~welch/publications.html Google Scholar**

*An introduction to the Kalman filter, SIGGRAPH 2001 course 8 in computer graphics*