The mean-shift algorithm1 is an efficient method for mode seeking without doing an exhaustive search, which leads to a real-time property. It has been introduced recently for tracking applications.2, 3, 4, 5 However, the fixed kernel bandwidth is always leading to poor localization in tracking objects changing in scale. A moment is used to compute the size of the tracking windows.2 However, the computational complexity is too high to meet the real-time requirement. In general, an object scale is detected by calculating the Bhattacharyya coefficient for three different sizes (same scale, change) and choosing the size that gives the highest similarity to the target model.5 Since it is a naive method for scale adaptation without considering the underlying relationship between the similarity and the object scale changes, the size of the tracking windows cannot always keep up with the object scale changes. In this paper, this relationship is theoretically analyzed for a possible total solution in the future.
Definition 1. A round region containing the whole object region and some background region is called a tracking window. Function and denote the center of and , respectively. Their distance is measured by .
Definition 2. Let be the pixel locations with as the origin point. The kernel histogram5 of with bins is defined by whereis the kernel function and is the kernel bandwidth, which determines the radius of . Function associates the pixel at location to the index of the kernel-histogram bin corresponding to the color of that pixel. is derived by imposing the constraint . Suppose the color distribution of is distinguished from . It can be approximately satisfied in many applications, e.g., traffic surveillance, and described by where the color distribution of and are represented by and , respectively.
Definition 3. The similarity of two kernel histograms and with bins is measured by the Bhattacharyya coefficient5and are the value of bin in and , respectively.
Theorem 1. Given with in frame and with the same position of in frame where object scale and position are changed, , if then .
Proof. By assuming without loss of generality that (1) the object shrinks its scale from frame to . (2) , consists of subregions with different intensity levels, i.e., , while , consists of subregions with different intensity levels, i.e., . (3) Consider ; suppose its kernel histogram consists of two entries, sets and , corresponding to the subregion and , respectively, where .
The continuous form of Eq. 1 is as follows:and are areas of subregion in and , respectively.
The fixed kernel bandwidth leads to , and it is clear that owing to . Since is monotonic decreasing1 and , we have . Consequently, we obtain . Moreover, holds owing to the constraintis less than , the area of is greater than . Thus, holds and then . Therefore,2, 4, the geometric interpretation of the Bhattacharyya coefficient is the cosine of the angle between the -dimensional unit vectors and . The smaller angle they have, the more similar the two kernel histograms are. For the target tracking application, this angle is equal to the angle between two 2-D unit vectors: and . Then, can be measured by . Using Eqs. 4, 5 in conjunction with the geometric relationship, it is clear that . Finally, .
Using theorem 1, we can easily determine that the Bhattacharyya coefficient is monotonic decreasing and achieves its maximum in the case where . It means the image in is most similar to the image in . As long as some parts of the object in the next frame reside inside the kernel, theorem 1 ensures mean-shift iterations converge to the object center.2, 5
In our experiments, the object kernel histogram computed by the Gaussian kernel has been derived in the RGB space with bins. Figure 1 shows two video clips where the size of tracking window (white circle) is unchanged. The top row shows the tracking results where the object expands its scale, while the bottom row demonstrates the results for the object shrinking its scale. In the first frame of each clip, the initial kernel histogram is obtained from the initial tracking window whose center overlaps the object center. Figure 2 shows the Bhattacharyya coefficients corresponding to the tracking windows centered in a neighborhood around the object center. Figures 2a and 2b correspond to Figs. 1b and 1d, respectively. The Bhattacharyya coefficient in Fig. 2b is monotonic decreasing and the maximum corresponds to the object center, which validates our theorem. In the case where the object expands its scale and can not be enwrapped by the tracking window, the monotonic decreasing profile in Fig. 2b no longer holds and poor localization potentially occurs; see also top row in Fig. 1. The reason lies in the fact that there are more local maxima in Fig. 2a and any location of a tracking window that is too small will yield a similar value of the Bhattacharyya coefficient.
In conclusion, the changes of object scale and position within the fixed kernel will not impact the localization accuracy of the mean-shift tracking algorithm. When the object scale exceeds the size of the tracking window, the tracker outputs poor localization. On the contrary, when the object shrinks its scale, the center of the tracking window locates the object center all the time. Indeed, our previous work4 for tracking rigid objects with scale changes is based on this conclusion. We hope this paper will valuable for fully solving scaling problems within the mean-shift framework in the future.