1 May 2009 Kernel-based visual tracking with continuous adaptive distribution
Author Affiliations +
Abstract
The template updating problem of Kernel-based tracking (KBT) includes two aspects: target-scale update and target-model update. The proposed algorithm can update both tracking window's scale and target model by making use of continuous adaptive distribution. The ability of KBT can be complemented within its own framework with modest computation cost. The proposed tracking algorithm tries to get a balance between the stability of KBT and adaptability of CAMSHIFT for creating a robust tracker.
Han, Jing, and Li: Kernel-based visual tracking with continuous adaptive distribution

1.

Introduction

Kernel-based tracking (KBT), which is also called “mean-shift” tracking, has attracted more and more attention because of its speediness and simplicity.1 Generally, a visual target is tracked by extracting some kinds of target’s feature, which can be called a template in the first frame, and then finding the image region that matches the template as closely as possible in remaining frames. One of the most critical challenges in creating a robust tracker is how to update the template. In KBT, the target’s model can be regarded as tracking template and tracking window’s scale is the only parameter. However, in traditional KBT, the target model is not updated during the whole tracking period, which leads to poor localization when the object changes its scale or appearance. Thus, the template updating problem of KBT includes two aspects: target-scale update and target-model update.

There is a simple solution to periodically update the target model according to a constant threshold on the similarity metric used in tracking. The model can be updated if the correlation of the model with the target is higher than a predefined threshold. However, using this simple solution always leads to tracking failure. Because each time the target model is updated, small errors are introduced in the location of the template. With each update, these errors accumulate and the tracking region steadily drifts away from the object. To deal with template-updating problem of KBT, some more effective updating methods have been proposed. The technique of “tracking through scale space” is proposed in Ref. 2. The method uses Lindeberg’s theory to select the best scale of tracking window. In Ref. 3, the multiple kernels–tracking method is proposed, and the dimensionality of the measurement space can be increased by using multiple kernels. In Ref. 4, the target model can be updated according to the distributions of the target intensity and the local standard deviation measures. In Ref. 5, the mean shift is modified to deal with dynamically changing color probability distributions derived from video frame sequences. The modified algorithm is called the continuously adaptive mean shift (CAMSHIFT). In CAMSHIFT, the continuous adaptive distribution image can be directly used to track the human face. However, compared to CAMSHIFT, KBT is more robust than CAMSHIFT in much other application, because the KBT uses a more static kernel histogram as the target model. If the CAMSHIFT and KBT are regarded as two extreme conditions of mean shift, then the proposed tracking algorithm tries to get a balance between the stability of KBT and adaptability of CAMSHIFT for creating a more robust tracker.

In KBT, both the target model and target candidate are characterized by a kernel-based histogram vector. This vector is a kind of image-probability distribution by weighting the histogram with a simple monotonically decreasing kernel profile. A kernel-based histogram can be denoted as follows:

1

q=[qu]u=1,,m,andqu=1Chi=1nKernel[Xic0]δ[b(Xi),u],
where {Xi}i=1,,n are the pixel locations of the target, Kernel is a spatially weighting function centered at c0 . The δ() is the Kronecker delta function, b(Xi) is a binning function that maps the color of {Xi}i=1,n into a histogram bin u with u={1,,m} , and Ch is a normalization term which makes u=1mqu=1 . The initial center of the target is denoted as c0 . Similarly, the tracking features, called the “candidate model,” can be denoted as follows:

2

p(ck)=[pu]u=1,,m,
where pu=1Chi=1nKernel(Xick)δ[b(Xi),u] , and the center of candidate region in sequent frame k is denoted as ck .

Originally, KBT is derived from second-order Taylor expansion of Bhattacharyya coefficient, which is defined as follows:

3

B[p(ck),q]=u=1mpu(ck)qu.
The Bhattacharyya coefficient is a popular likelihood between two vectors. And the kernel-based method realizes target model tracking through maximizing the Bhattacharyya coefficient, as in

4

Δc*=argmaxΔckB[q,p(ck+Δck)].
The KBT algorithm outputs the Δc* , which determines the displacement of the object, the Δc* is also called the “mean shift” vector, which can be computed using the following formulas, iteratively:

5

Δck=i=1nKernel(Xick)w(Xi)(Xick)i=1nKernel(Xick)w(Xi),
where

6

w(Xi)=qupu(ck).

For finding the proper Δc* in 4, an iterated computation is needed by computing the w(Xi) using 6 and deriving Δck using 5. The Bhattacharyya coefficient is computed after the algorithm completion to evaluate the similarity between the target model and the chosen candidate model.

Continuously Adaptive Distribution

The primary difference between CAMSHIFT and mean shift is that CAMSHIFT uses a continuously adaptive probability distribution, which means the kernel-based histogram is recomputed for each frame. In KBT, the candidate model can also be recomputed and used to create the probability distribution image; thus, the candidate model can be regarded as a continuously adaptive distribution in this letter. And then, the target’s probability distribution image can be created by histogram backprojection (HBP), which replaces the pixel values of the input image with the value of the corresponding bin of the histogram. Thus, if the candidate model can keep proper coherence with the target model in the tracking process, the target can be enhanced by HBP. In order to provide the range of probability values between 0 and 255, the histogram bin values are rescaled as follows:

7

p̂=[p̂u]u=1mandp̂u=min(255max(p)pu,255)u=1,,m.
According to 7, the histogram bin pu are rescaled from [0,max(p)] to the new range [0, 255].

Zeroth Moment of HBP Image

In CAMSHIFT, the scale of tracking window is determined by zeroth moment. After HBP, given I(x,y) is the intensity of HBP image at (x,y) within the tracking window. The zeroth moment is computed as follows:

8

M00=xyI(x,y).
And the tracking window’s scale can be determined as follows:

9

Sx=αM00

10

Sy=βSxk
where α is the scale proportion of tracking window’s scale to target’s scale. The β is the proportion of Sy to Sx . Because the size of the search window is rounded up to the current or next-greatest odd number, in practice, the tracking window is set larger than the target. In this paper, α=1.1 .

Visual Tracking with Continuous Adaptive Distribution

The proposed algorithm can be divided into three phases. First, estimate the tracked target’s position using the KBT algorithm. Second, determine the tracking window’s scale using the zeroth moment, which is computed by CAMSHIFT. Third, update tracking window’s scale and target model. The proposed algorithm is illustrated by Fig. 1.

Fig. 1

Procedure of the proposed algorithm.

050501_1_1.jpg

As Fig. 1 shows, the target region is denoted as TRk1 and corresponding target model is denoted as qk1 . The target’s template information can be completely included in TRk1 and qk1 . After KBT is executed, the tracking window is moved to the candidate region, which is denoted as CRk . In the candidate region, candidate model is denoted as pk . On the basis of CRk and pk , a probability distribution image can be created, and then the CAMSHIFT can be executed to get the target’s new scale information (Sx,Sy) . In the end, pk , qk1 , and (Sx,Sy) are used to update the template. Here, we give the proposed algorithm in detail. After the initial stage, the tracking starts from the No.k(k=2) frame as follows:

Input: the target’s initial center position xck1,yck1 ; scale (Sxk1,Syk1) in the previous frame.

Step 1: Determine tracking region: TRk1=RECT(xck1,yck1,Sxk1,Syk1) .

Step 2: Compute mean-shift vector Δck=(Δxk,Δyk) using 5, 6. Update candidate region: CRk=RECT(xck1+Δxk,yck1+Δyk,Sxk1,Syk1) . Extract new candidate model pk using 2 in CRk .

Step 3: Transform CRk by Histogram-Projection according to 7 and pk .

Step 4: Compute the M00 using 8 iteratively until CAMSHIFT converges.

Step 5: Compute tracking window scale (Sx,Sy) using 9, 10.

Step 6: According to 3, compute B(qk1,pk) with new scale: (Sx,Sy) .

Output: If B(qk1,pk1)<B(qk1,pk)

Sxk=Sx;Syk=Sy
qk=B(qk1,pk)*pk+(1B(qk1,pk)*qk1).

Else

Sxk=Sxk1;Syk=Syk1;qk=qk1.

End.

In the proposed algorithm, except for using the target’s kernel histogram as the tracking feature, the target’s continuously probability distribution image is also used. The zeroth moment of the HBP image is used to estimate the target scale based on CAMSHIFT. However, differences are that the target’s position is computed by KBT in the tracking phase and the Bhattacharyya coefficient is used as an updating indicator and weighting factor in the updating phase. As the updating phase shows the target model has become a dynamic distribution that is adaptive to likelihood between current and previous distributions.

Results

In the test video sequence, the target’s scale and appearance are changed constantly in a dynamic clutter background. As Figs. 2, 2, 2 show, traditional KBT can not fit the target’s varying scale in tracking process. Figures 2, 2, 2 show that the tracking performance of CAMSHIFT is also inefficient. Because of the clutter background, CAMSHIFT always gets a tracking window that is too large. The tracking results of the proposed algorithm are shown in the Figs. 2, 2, 2; the tracking window’s scale can be properly updated and the target can be discriminated from a clutter background effectively.

Fig. 2

Tracking results comparison: the proposed algorithm can update the tracking window’s scale properly while KBT and CAMSHIFT cannot fit the target’s varying scale properly.

050501_1_2.jpg

In Fig. 3, the similarity between target model and candidate model is also illustrated. Comparing to traditional KBT, the proposed algorithm can keep better coherence with the tracking template, which means the target model has been updated properly.

Fig. 3

Bhattacharyya coefficient values produced by KBT , and the proposed algorithm.

050501_1_3.jpg

Conclusion and Discussion

There are two major contributions in this letter. First, a new scale updating method is proposed. By using target’s continuously adaptive distributions image, the proposed algorithm can update the tracking window’s scale. Second, by integrating the continuous updating method into KBT, the target model can also be updated within the framework of mean shift. Experiments show the proposed method can keep the tracking target even when the target scale and appearance are changed constantly in a clutter background. On the basis of the proposed algorithm, further extensions and improvements could be made by combining multiple color or texture spaces.

Acknowledgment

This work was supported by the National Natural Science Foundation of China (Grant No. 60775022).

References

1.  D. Comaniciu, V. Ramesh, and P. Meer, “Kernel-based object tracking,” IEEE Trans. Pattern Anal. Mach. Intell.0162-8828 25, 564–577 (2003). 10.1109/TPAMI.2003.1195991 Google Scholar

2.  R. Collins, “Mean-shift blob tracking through scale space,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 234–240 IEEE Computer Society, Washington, DC (2003). 10.1109/CVPR.2003.1211475 Google Scholar

3.  G. D. Hager, M. Dewan, and C. V. Stewart, “Multiple kernel tracking with SSD,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol. 1, pp. 790–797, IEEE Computer Society, Washington, DC (2004). 10.1109/CVPR.2004.1315112 Google Scholar

4.  Y. Alper, S. Khurram, and S. Mubarak, “Target tracking in airborne forward looking infrared imagery,” Image Vis. Comput.0262-8856 21, 623–635 (2003). 10.1016/S0262-8856(03)00059-3 Google Scholar

5.  G. R. Bradski, “Computer vision face tracking as a component of a perceptual user interface,” in Workshop on Applications of Computer Vision, pp. 214–219 (1998). Google Scholar

© (2009) Society of Photo-Optical Instrumentation Engineers (SPIE)
Risheng Han, Zhongliang Jing, Yuanxiang Li, "Kernel-based visual tracking with continuous adaptive distribution," Optical Engineering 48(5), 050501 (1 May 2009). https://doi.org/10.1117/1.3125423 . Submission:
JOURNAL ARTICLE
3 PAGES


SHARE
Back to Top