1 July 2005 Performance analysis for tracking of variable scale objects using mean-shift algorithm
Author Affiliations +
Optical Engineering, 44(7), 070505 (2005). doi:10.1117/1.1985487
Classic mean-shift trackers have no integrated scale adaptation, which limits their performance in tracking variable scale objects. By analyzing the similarity of object kernel histograms, we found that the changes of object scale and position within the fixed kernel make the Bhattacharyya coefficient monotonic decreasing. The work plays a guiding role in solving scaling problems within the mean-shift framework.
Peng, Yang, and Liu: Performance analysis for tracking of variable scale objects using mean-shift algorithm

The mean-shift algorithm1 is an efficient method for mode seeking without doing an exhaustive search, which leads to a real-time property. It has been introduced recently for tracking applications.2, 3, 4, 5 However, the fixed kernel bandwidth is always leading to poor localization in tracking objects changing in scale. A moment is used to compute the size of the tracking windows.2 However, the computational complexity is too high to meet the real-time requirement. In general, an object scale is detected by calculating the Bhattacharyya coefficient for three different sizes (same scale, ±5% change) and choosing the size that gives the highest similarity to the target model.5 Since it is a naive method for scale adaptation without considering the underlying relationship between the similarity and the object scale changes, the size of the tracking windows cannot always keep up with the object scale changes. In this paper, this relationship is theoretically analyzed for a possible total solution in the future.

Definition 1. A round region T containing the whole object region F and some background region B is called a tracking window. Function c(T) and c(F) denote the center of T and F , respectively. Their distance is measured by d(T,F)=c(T)c(F) .

Definition 2. Let {xi}i=1n be the pixel locations with c(T) as the origin point. The kernel histogram5 of T with m bins is defined by P={pμ}μ=1m where


Here k is the kernel function and r is the kernel bandwidth, which determines the radius of T . Function q:R2{1m} associates the pixel at location xi to the index q(xi) of the kernel-histogram bin corresponding to the color of that pixel. C is derived by imposing the constraint μ=1mpμ=1 . Suppose the color distribution of F is distinguished from B . It can be approximately satisfied in many applications, e.g., traffic surveillance, and described by μ=1mpμipμj=0 where the color distribution of F and B are represented by Pi={pμi}μ=1m and Pj={pμj}μ=1m , respectively.

Definition 3. The similarity of two kernel histograms Pi and Pj with m bins is measured by the Bhattacharyya coefficient5


where pμi and pμj are the value of bin μ in Pi and Pj , respectively.

Theorem 1. Given T1 with c(F1)=c(T1) in frame i and T2 with the same position of T1 in frame i+1 where object scale and position are changed, T3frame i+1 , if d(T2,F2)<d(T3,F3) then ρ(2,1)> ρ(3,1) .

Proof. By assuming without loss of generality that (1) the object shrinks its scale from frame i to i+1 . (2) Fi , i=1,2,3 consists of u subregions with different intensity levels, i.e., Fi={fj}j=1u , while Bi , i=1,2,3 consists of νi subregions with different intensity levels, i.e., Bi={bj}j=1νi . (3) Consider Ti ; suppose its kernel histogram Pi={pμi}μ=1m consists of two entries, sets {fpji}j=1u and {bpji}j=1νi , corresponding to the subregion {fj}j=1u and {bj}j=1νi , respectively, where u+max(ν1,ν2,ν3)m .

The continuous form of Eq. 1 is as follows:

By using integral theorem of mean, we have


where Sfi2 and Sfi3 are areas of subregion fj in F2 and F3 , respectively.

The fixed kernel bandwidth leads to C2=1σ=T2k(xr2)dσ=C3 , and it is clear that Sfj2=Sfj3 owing to F2=F3 . Since k is monotonic decreasing1 and d(T2,F2)<d(T3,F3) , we have k(ξfj2r2)> k(ξfj3r2) . Consequently, we obtain fpj2> fpj3 . Moreover, j=1ν2bpj2<j=1ν3bpj3 holds owing to the constraint


Since the scale of F2 is less than F1 , the area of B2 is greater than B1 . Thus, j=1ν1bpj1<j=1ν2bpj2 holds and then j=1ufpj2<j=1ufpj1 . Therefore,


{1> j=1ufpj1> j=1ufpj2> j=1ufpj3> 00<j=1v1bpj1<j=1v2bpj2<j=1v3bpj3<1}.
According to Eqs. 2, 4, the geometric interpretation of the Bhattacharyya coefficient is the cosine of the angle between the m -dimensional unit vectors (p1ipmi) and (p1jpmj) . The smaller angle they have, the more similar the two kernel histograms are. For the target tracking application, this angle is equal to the angle between two 2-D unit vectors: Zi=[(j=1ufpji)12,(j=1νibpji)12] and Zj=[(k=1ufpkj)12,(k=1νjbpkj)12] . Then, ρ(i,j) can be measured by (Zi,Zj) . Using Eqs. 4, 5 in conjunction with the geometric relationship, it is clear that (Z3,Z1)> (Z2,Z1) . Finally, ρ(2,1)> ρ(3,1) .

Using theorem 1, we can easily determine that the Bhattacharyya coefficient ρ(t,1) is monotonic decreasing and achieves its maximum in the case where d(Tt,Ft)=0 . It means the image in Tt [d(Tt,Ft)=0] is most similar to the image in T1 . As long as some parts of the object in the next frame reside inside the kernel, theorem 1 ensures mean-shift iterations converge to the object center.2, 5

In our experiments, the object kernel histogram computed by the Gaussian kernel has been derived in the RGB space with 32×32×32 bins. Figure 1 shows two video clips where the size of tracking window (white circle) is unchanged. The top row shows the tracking results where the object expands its scale, while the bottom row demonstrates the results for the object shrinking its scale. In the first frame of each clip, the initial kernel histogram is obtained from the initial tracking window whose center overlaps the object center. Figure 2 shows the Bhattacharyya coefficients corresponding to the tracking windows centered in a 60×60 neighborhood around the object center. Figures 2a and 2b correspond to Figs. 1b and 1d, respectively. The Bhattacharyya coefficient in Fig. 2b is monotonic decreasing and the maximum corresponds to the object center, which validates our theorem. In the case where the object expands its scale and can not be enwrapped by the tracking window, the monotonic decreasing profile in Fig. 2b no longer holds and poor localization potentially occurs; see also top row in Fig. 1. The reason lies in the fact that there are more local maxima in Fig. 2a and any location of a tracking window that is too small will yield a similar value of the Bhattacharyya coefficient.

Fig. 1

Tracking results with fixed kernel bandwidth (left to right).


Fig. 2

Surface plot of Bhattacharyya coefficient around the object center.


In conclusion, the changes of object scale and position within the fixed kernel will not impact the localization accuracy of the mean-shift tracking algorithm. When the object scale exceeds the size of the tracking window, the tracker outputs poor localization. On the contrary, when the object shrinks its scale, the center of the tracking window locates the object center all the time. Indeed, our previous work4 for tracking rigid objects with scale changes is based on this conclusion. We hope this paper will valuable for fully solving scaling problems within the mean-shift framework in the future.


1.  K. Fukanaga and L. D. Hostetler, “The estimation of the gradient of a density function, with applications in pattern recognition,” IEEE Trans. Inf. Theory0018-9448 10.1109/TIT.1975.1055330 21, 32–40 (1975). Google Scholar

2.  G. R. Bradski, “Computer vision face tracking for use in a perceptual user interface,” in IEEE Workshop on Applications of Computer Vision, p. 214–219, Princeton, NJ (1998). Google Scholar

3.  A. Yilmaz, K. Shafique, and M. Shah, “Target tracking in airborne forward looking infrared imagery,” Image Vis. Comput.0262-8856 10.1016/S0262-8856(03)00059-3 21, 623–635 (2003). Google Scholar

4.  N. S. Peng, J. Yang, and J. X. Chen, “Kernel-bandwidth adaptation for tracking object changing in size,” Lecture Notes in Computer Science 3212, 581–588 (2004). Google Scholar

5.  D. Comaniciu, V. Ramesh, and P. Meer, “Kernel-based object tracking,” IEEE Trans. Pattern Anal. Mach. Intell.0162-8828 5, 564–575 (2003). Google Scholar

Ning Song Peng, Jie Yang, Zhi Liu, "Performance analysis for tracking of variable scale objects using mean-shift algorithm," Optical Engineering 44(7), 070505 (1 July 2005). https://doi.org/10.1117/1.1985487


Adaptive model MeanShift tracking
Proceedings of SPIE (March 13 2013)
Consecutive pedestrian tracking in large scale space
Proceedings of SPIE (September 28 2016)
Group tracking of occluded targets
Proceedings of SPIE (August 21 2001)

Back to Top