## 1.

## Introduction

Scale-adaptive tracking using only color information is a difficult and important problem. One of the most successful approaches using only color information is the kernel-based approach, which has become popular due to its tracking speed.^{1, 2} However, due to the instability of color-based tracking, research to stabilize the tracking has been done by introducing the scale space,^{3} changing the kernel function,^{4, 5} using multiple kernels or patches,^{6, 7, 8, 9} incorporating temporal filtering,^{10} or discriminating local reliable regions.^{11} Most color-based scale-adaptive tracking methods have utilized the histogram to determine the target size.^{1, 2, 3, 4, 12} However, in certain cases, the histogram fails to provide a good estimate of the target size (e.g., in cases of occlusion or the appearance of colors in the background similar to the target colors). This is due to the fact that the histogram is related to the numbers of pixels corresponding to the target colors. Therefore, sometimes colors in the background similar to the target colors cause the target region to spread out to the background region. In other cases, the target window sometimes shrinks too much, because small partial regions of the target show a similar distribution of colors as the entire target region.

In this letter, we propose a stable scale-adaptive tracking algorithm that combines the color centroids–based tracking algorithm in Ref. 13 with a new scale-adaptation algorithm also based on the use of centroids. The advantages of using centroids in the computation of the scale are as follows: first, centroids have a direct relationship with the scale of the target, which makes the estimation of the scale simple and fast. Second, centroids can be filtered by filtering algorithms, leaving only the reliable centroids to be used in the rescaling. The proposed tracking algorithm produces stable scale-adaptive tracking results even in difficult cases when the background is cluttered or contains target colors, or when occlusion occurs.

## 2.

## Scale-Adaptive Centroid-Based Tracking Algorithm

In this section, we first introduce the color centroids–based target localization algorithm. Then, we propose the scale-adaptation algorithm, which is also based on the use of the color centroids. The combination of the two algorithms will be explained at the end of this section.

## 2.1.

### Target Window Shifting Using Centroids of the Target Colors

In Ref. 13, we proposed a stable target-shifting algorithm that shifts the current target location to the next target location based on the use of centroids of the target colors. The target position in the current frame ( [TeX:] \documentclass[12pt]{minimal}\begin{document}$\mathop {{\bf {\hat{y}}}}\nolimits _0$\end{document} ${\widehat{\mathbf{y}}}_{0}$ ) is obtained by the area-weighted mean of the color centroids in the current frame, i.e.,

## Eq. 1

[TeX:] \documentclass[12pt]{minimal}\begin{document} \begin{equation} \mathop {{\bf {\hat{y}}}}\nolimits _0 = {\frac{\sum \nolimits _{u = 1}^m \,\mathop {\hat{q}}\nolimits _u {\bf {C}}_u^n}{\sum \nolimits _{u = 1}^m \,\mathop {\hat{q}}\nolimits _u }}, \end{equation} \end{document} $${\widehat{\mathbf{y}}}_{0}=\frac{{\sum}_{u=1}^{m}\phantom{\rule{0.16em}{0ex}}{\widehat{q}}_{u}{\mathbf{C}}_{u}^{n}}{{\sum}_{u=1}^{m}\phantom{\rule{0.16em}{0ex}}{\widehat{q}}_{u}},$$*u*in the current frame, and [TeX:] \documentclass[12pt]{minimal}\begin{document}$\mathop {\lbrace {\mathop {\hat{q}}\nolimits _u } \rbrace }\nolimits _{u = 1\ldots m}$\end{document} ${\left\{{\widehat{q}}_{u}\right\}}_{u=1...m}$ represents the

*m*−bin histogram of the target model obtained in the initial frame. Here, the weights are [TeX:] \documentclass[12pt]{minimal}\begin{document}$\mathop {\lbrace {\mathop {\hat{q}}\nolimits _u } \rbrace }\nolimits _{u = 1,\ldots ,m}$\end{document} ${\left\{{\widehat{q}}_{u}\right\}}_{u=1,...,m}$ , which correspond to the areas in the initial frame that the color bins [TeX:] \documentclass[12pt]{minimal}\begin{document}$\mathop {\lbrace {\mathop {\hat{q}}\nolimits _u } \rbrace }\nolimits _{u = 1,\ldots ,m}$\end{document} ${\left\{{\widehat{q}}_{u}\right\}}_{u=1,...,m}$ cover.

The location of the target in the next frame ( [TeX:] \documentclass[12pt]{minimal}\begin{document}$\mathop {{\bf {\hat{y}}}}\nolimits _1$\end{document} ${\widehat{\mathbf{y}}}_{1}$ ) can be calculated in the same way as [TeX:] \documentclass[12pt]{minimal}\begin{document}$\mathop {{\bf {\hat{y}}}}\nolimits _0$\end{document} ${\widehat{\mathbf{y}}}_{0}$ using [TeX:] \documentclass[12pt]{minimal}\begin{document}${\bf {C}}_u^{n+1}$\end{document} ${\mathbf{C}}_{u}^{n+1}$ , the color centroids in the next frame, instead of [TeX:] \documentclass[12pt]{minimal}\begin{document}${\bf {C}}_u^n$\end{document} ${\mathbf{C}}_{u}^{n}$ in Eq. 1. Then, the shifting vector [TeX:] \documentclass[12pt]{minimal}\begin{document}$\mathop {{\bf {\hat{y}}}}\nolimits _{\rm shift}$\end{document} ${\widehat{\mathbf{y}}}_{\mathrm{shift}}$ which shifts the location of the target in the current frame to the location in the next frame, is computed as [TeX:] \documentclass[12pt]{minimal}\begin{document}$\mathop {{\bf {\hat{y}}}}\nolimits _{\rm shift} = \mathop {{\bf {\hat{y}}}}\nolimits _1 - \mathop {{\bf {\hat{y}}}}\nolimits _0$\end{document} ${\widehat{\mathbf{y}}}_{\mathrm{shift}}={\widehat{\mathbf{y}}}_{1}-{\widehat{\mathbf{y}}}_{0}$ .

## 2.2.

### Scale Adaptation Based on Centroid Difference Vectors

Although most scale-adaptation algorithms utilize histograms of the target colors, we utilize the centroids of the colors for the rescaling of the new target region. The reason for this is based on the following facts:

The centroids contain the spatiality information and have a direct relationship to the scale of the target region.

Centroids that deviate much from their real positions due to the background colors can be easily detected and eliminated from the estimation of the scale.

The rescaling based on the centroids is not very sensitive to the appearance of background colors similar to the target colors as that based on histogram similarity.

For scale adaptation, we first divide the target region (which is centered at [TeX:] \documentclass[12pt]{minimal}\begin{document}$\mathop {{\bf {\hat{y}}}}\nolimits _1$\end{document} ${\widehat{\mathbf{y}}}_{1}$ ) into four equal subregions, as shown in Fig. 1. Figures 1, 1 show the target in the current frame and Figs. 1, 1 show the target in the next frame. The scale of the target changes in the next frame. After dividing the target region into subregions, we calculate the centroids in each subregion, independently. Figures 1, 1 show the calculated centroids in the current frame, and Figs. 1, 1 show those in the next frame. As shown in Figs. 1 and 1 or in Figs. 1, 1, the distances between the centroids in different subregions corresponding to the same colors have a direct relationship to the size of the target region. For example, in Figs. 1, 1, the ratio between [TeX:] \documentclass[12pt]{minimal}\begin{document}$d_{AD,u,x}^n$\end{document} ${d}_{AD,u,x}^{n}$ and [TeX:] \documentclass[12pt]{minimal}\begin{document}$d_{AD,u,x}^{n + 1}$\end{document} ${d}_{AD,u,x}^{n+1}$ can be used as an estimate of the ratio of the target's width in the next frame to that in the current frame.

We first make some notations. We denote by
[TeX:]
\documentclass[12pt]{minimal}\begin{document}$d_{A,B,u,x}^{n + 1}$\end{document}
${d}_{A,B,u,x}^{n+1}$
the absolute difference in the x-coordinates between the centroids in subregions A and B corresponding to the color *u*:

## Eq. 2

[TeX:] \documentclass[12pt]{minimal}\begin{document} \begin{equation} d_{{\rm A,B},u,x}^{n + 1} = \big|C_{{\rm A},u,x}^{n + 1} - C_{{\rm B},u,x}^{n + 1}\big|, \end{equation} \end{document} $${d}_{\mathrm{A},\mathrm{B},u,x}^{n+1}=|{C}_{\mathrm{A},u,x}^{n+1}-{C}_{\mathrm{B},u,x}^{n+1}|,$$*u*in subregions A and B, respectively, and the superscript

*n*+ 1 denotes the next frame. Likewise, we can define [TeX:] \documentclass[12pt]{minimal}\begin{document}$d_{{\rm A},{\rm B},u,x}^{n + 1}$\end{document} ${d}_{\mathrm{A},\mathrm{B},u,x}^{n+1}$ , [TeX:] \documentclass[12pt]{minimal}\begin{document}$d_{{\rm A},{\rm D},u,x}^{n + 1}$\end{document} ${d}_{\mathrm{A},\mathrm{D},u,x}^{n+1}$ , [TeX:] \documentclass[12pt]{minimal}\begin{document}$d_{{\rm C},{\rm B},u,x}^{n + 1}$\end{document} ${d}_{\mathrm{C},\mathrm{B},u,x}^{n+1}$ , and [TeX:] \documentclass[12pt]{minimal}\begin{document}$d_{{\rm C},{\rm D},u,x}^{n + 1}$\end{document} ${d}_{\mathrm{C},\mathrm{D},u,x}^{n+1}$ . Then, we compute the weighted average of the distances [TeX:] \documentclass[12pt]{minimal}\begin{document}$d_{{\rm A},{\rm B},u,x}^{n + 1}$\end{document} ${d}_{\mathrm{A},\mathrm{B},u,x}^{n+1}$ , [TeX:] \documentclass[12pt]{minimal}\begin{document}$d_{{\rm A},{\rm D},u,x}^{n + 1}$\end{document} ${d}_{\mathrm{A},\mathrm{D},u,x}^{n+1}$ , [TeX:] \documentclass[12pt]{minimal}\begin{document}$d_{{\rm C},{\rm B},u,x}^{n + 1}$\end{document} ${d}_{\mathrm{C},\mathrm{B},u,x}^{n+1}$ , and [TeX:] \documentclass[12pt]{minimal}\begin{document}$d_{{\rm C},{\rm D},u,x}^{n + 1}$\end{document} ${d}_{\mathrm{C},\mathrm{D},u,x}^{n+1}$ to obtain [TeX:] \documentclass[12pt]{minimal}\begin{document}$d_{u,x}^{n + 1}$\end{document} ${d}_{u,x}^{n+1}$ , where the weight is related to the numbers of pixels in the color bin

*u*in the following subregions:

## Eq. 3

[TeX:] \documentclass[12pt]{minimal}\begin{document} \begin{eqnarray} d_{u,x}^{n + 1} &=& {W_{{\rm A},{\rm B},u}}d_{{\rm A},{\rm B},u,x}^{n + 1} + {W_{{\rm A},{\rm D},u}}d_{{\rm A},{\rm D},u,x}^{n + 1} + {W_{{\rm C},{\rm B},u}}d_{{\rm C},{\rm B},u,x}^{n + 1}\nonumber\\ && +\, {W_{{\rm C},{\rm D},u}}d_{{\rm C},{\rm D},u,x}^{n + 1} \end{eqnarray} \end{document} $$\begin{array}{ccc}\hfill {d}_{u,x}^{n+1}& =& {W}_{\mathrm{A},\mathrm{B},u}{d}_{\mathrm{A},\mathrm{B},u,x}^{n+1}+{W}_{\mathrm{A},\mathrm{D},u}{d}_{\mathrm{A},\mathrm{D},u,x}^{n+1}+{W}_{\mathrm{C},\mathrm{B},u}{d}_{\mathrm{C},\mathrm{B},u,x}^{n+1}\hfill \\ & & +\phantom{\rule{0.16em}{0ex}}{W}_{\mathrm{C},\mathrm{D},u}{d}_{\mathrm{C},\mathrm{D},u,x}^{n+1}\hfill \end{array}$$## Eq. 4

[TeX:] \documentclass[12pt]{minimal}\begin{document} \begin{eqnarray} {W_{{\rm A},{\rm B},u}} = {{{N_{{\rm A},u}}{N_{{\rm B},u}}} \over {{N_{{\rm A},u}}{N_{{\rm B},u}} + {N_{{\rm A},u}}{N_{{\rm D},u}} + {N_{{\rm C},u}}{N_{{\rm B},u}} + {N_{{\rm C},u}}{N_{{\rm D},u}}}},\nonumber\!\!\!\!\! \\ \end{eqnarray} \end{document} $$\begin{array}{c}\hfill {W}_{\mathrm{A},\mathrm{B},u}=\frac{{N}_{\mathrm{A},u}{N}_{\mathrm{B},u}}{{N}_{\mathrm{A},u}{N}_{\mathrm{B},u}+{N}_{\mathrm{A},u}{N}_{\mathrm{D},u}+{N}_{\mathrm{C},u}{N}_{\mathrm{B},u}+{N}_{\mathrm{C},u}{N}_{\mathrm{D},u}},\end{array}$$*W*

_{A, D, u},

*W*

_{CB, u}, and

*W*

_{C, D, u}. Here,

*N*

_{A, u},

*N*

_{B, u},

*N*

_{C, u}, and

*N*

_{D, u}denote the numbers of pixels corresponding to the color

*u*in the subregions A, B, C, and D.

The zooming factor corresponding to the color *u* is then defined as

## Eq. 5

[TeX:] \documentclass[12pt]{minimal}\begin{document} \begin{equation} z_{u,x}^{n + 1} = {{d_{u,x}^{n + 1}} \over {d_{u,x}^n}}. \end{equation} \end{document} $${z}_{u,x}^{n+1}=\frac{{d}_{u,x}^{n+1}}{{d}_{u,x}^{n}}.$$*u*. The zooming factor with respect to the

*y*-axis can be obtained in the same way. The pair of zooming factors [TeX:] \documentclass[12pt]{minimal}\begin{document}$z_{u,x}^{n + 1}$\end{document} ${z}_{u,x}^{n+1}$ and [TeX:] \documentclass[12pt]{minimal}\begin{document}$z_{u,y}^{n + 1}$\end{document} ${z}_{u,y}^{n+1}$ can be used as the estimates of the resizing ratios of the target, where [TeX:] \documentclass[12pt]{minimal}\begin{document}$z_{u,x}^{n + 1}$\end{document} ${z}_{u,x}^{n+1}$ can be used as the resizing ratio of the width and [TeX:] \documentclass[12pt]{minimal}\begin{document}$z_{u,y}^{n + 1}$\end{document} ${z}_{u,y}^{n+1}$ as the resizing ratio of the height.

However, it is not guaranteed that the centroid that corresponds to a certain color *u* is reliable. Therefore, we perform filtering on the estimated zooming factors to obtain a pair of reliable zooming factors. The filtering first trims off all the extreme values because the size of the target cannot change dramatically. After the extreme zooming factor values have been trimmed off, the remaining zooming factor values are averaged. The filtering suitable for this kind of problem is the α-trimmed mean filtering,^{14} which is performed by first sorting the remaining zooming factor values to get

## Eq. 6

[TeX:] \documentclass[12pt]{minimal}\begin{document} \begin{equation} \big\lbrace z_x^n(1),\ldots ,z_x^n(k)\big\rbrace , \,\, \big\lbrace z_y^n(1),\ldots ,z_y^n(k)\big\rbrace , \end{equation} \end{document} $$\left\{{z}_{x}^{n}\left(1\right),...,{z}_{x}^{n}\left(k\right)\right\},\phantom{\rule{0.16em}{0ex}}\phantom{\rule{0.16em}{0ex}}\left\{{z}_{y}^{n}\left(1\right),...,{z}_{y}^{n}\left(k\right)\right\},$$## Eq. 7

[TeX:] \documentclass[12pt]{minimal}\begin{document} \begin{equation} z_x^n = {1 \over {k - 2[\alpha k]}}\sum _{j = [\alpha k] + 1}^{k - [\alpha k]} \,z_x^n(j), \end{equation} \end{document} $${z}_{x}^{n}=\frac{1}{k-2\left[\alpha k\right]}\sum _{j=\left[\alpha k\right]+1}^{k-\left[\alpha k\right]}\phantom{\rule{0.16em}{0ex}}{z}_{x}^{n}\left(j\right),$$## 2.3.

### Scale-Adaptive Tracking Algorithm

The principle steps of the tracking algorithm that combines the centroid-based target shifting with the centroid-based scale adaptation can be described as follows:

Initialize the target window in the initial frame by any motion detection algorithm.

Set a search window having the same center as the target window but with a larger size in the next frame and compute the color centroids within this window.

Using the computed centroids, estimate the location of the target in the next frame and shift the target window to the estimated location.

Partition the target window into four subregions with respect to the center and calculate all the zoom factors in the four subregions.

Filter the zoom factors to obtain a pair of reliable zoom factors, and resize the target window according to the zoom factors.

Acquire the next frame and repeat steps 2–5.

## 3.

## Experimental Results

We performed several experiments and compared the results to the Camshift,^{1} the meanshift blob,^{3} the mean-shift particle filtering,^{10} and the Kwon–Lee algorithm,^{9} which we chose as a representative of patch-based tracking. For all the algorithms, the background colors were not removed when obtaining the initial target colors. Figure 2 shows the experimental results when the background is cluttered. This is a very difficult case for color-based tracking, because colors in the background similar to the target's colors are included in the target window. In this case, all the algorithms except the proposed one fail in the tracking. Figure 3 shows the case where the object experiences fast enlarging with a complex background. The mean-shift blob and mean-shift particle-filtering tracking fail in this case and are not shown here. The Kwon–Lee algorithm can track the object with a small initial target window, including not too much of the background colors. In comparison, the proposed scheme succeeds in the tracking even with a rather large target window. Table 1 shows the quantitative error results for the rain sequence.

## Table 1

Summarization of the localization and scaling errors.

Localization error | Scale error | ||
---|---|---|---|

Sequence | Algorithm | (distance) | (%) |

rainy | Camshift | 181.98 | 367.2 |

Mean-shift particle | 70.34 | 70.82 | |

Mean-shift blob | 71.98 | 73.2 | |

Kwon–Lee method1 | 68.98 | 69.4 | |

Proposed | 21.42 | 15.86 |

## Acknowledgments

This work was supported by the National Research Foundation of Korea grant funded by the Korea government (MEST) (Grant No. 2011-0000096) and The “Dongseo Frontier Project” Research Fund of 2009 by Dongseo University.

## References

**,” Intel Technol. J., 2 1 –15 (1998). Google Scholar**

*Computer vision face tracking for use in a perceptual user interface***,” IEEE Trans. Pattern Anal. Mach. Intell., 25 564 –575 (2003). https://doi.org/10.1109/TPAMI.2003.1195991 Google Scholar**

*Kernel-based object tracking***,” (2007). Google Scholar**

*Object tracking by asymmetric kernel mean shift with automatic scale and orientation selection***,” (2005). Google Scholar**

*Efficient mean-shift tracking via a new similarity measure***,” (2007). Google Scholar**

*Robust object tracking using local kernels and background information***,” (2009). Google Scholar**

*Tracking of a non-rigid object via patch-based dynamic appearance modeling and adaptive basin hopping monte carlo sampling***,” (2006). Google Scholar**

*Structured combination of particle filter and kernel mean-shift tracking***,” (2010). Google Scholar**

*Discriminative spatial attention for robust tracking***,” IEEE Trans. Pattern Anal. Mach. Intell., 29 (9), 1661 –1667 (2007). https://doi.org/10.1109/TPAMI.2007.1112 Google Scholar**

*Adaptive object tracking based on an effective appearance filter***,” IET Comput. Vis., 4 (2), 73 –84 (June 2010). https://doi.org/10.1049/iet-cvi.2008.0017 Google Scholar**

*Motion tracking based on area and level set weighted centroid shifting***,” IEEE Trans. Image Process., 13 627 –639 (2004). https://doi.org/10.1109/TIP.2003.821115 Google Scholar**

*Adaptive alpha-trimmed mean filters under deviations from assumed noise model*