Real-time unmanned aerial vehicle tracking of fast moving small target on ground

Abstract. To solve the problems of occlusion and fast motion of small targets in unmanned aerial vehicle target tracking, an adaptive algorithm that fuses the improved color histogram tracking response and the correlation filter tracking response based on multichannel histogram of oriented gradient features is proposed to realize small target tracking with high accuracy. The state judgment index is used to determine whether the target is in a fast motion or an occlusion state. In the fast motion state, the search area is enlarged, and the color optimal model that suppresses the suspected area is used for rough detection. Then, redetection in the location of multiple peaks in the rough detection response is carried out using the correlation filter to accurately locate the target. In an occlusion state, the model stops updating, the search area is expanded, and the current color model is used for rough detection. Then, redetection in the place of multiple peaks in the rough detection response is carried out using the correlation filter to accurately locate the target. Experimental results show that the proposed method can track small targets accurately. The frame rate of the proposed method is 40.23 frames/s, indicating usable real-time performance.


Introduction
During unmanned aerial vehicle (UAV) target tracking, the target is far away from the camera; hence, the target pixel size in the image (the number of pixels occupied by the target) is small.In addition, when the UAV moves swiftly, the camera is actively adjusted, and the target position shift between adjacent frames may exceed 20 pixels.In a target tracking review article with more than 1000 citations, 1 the above two cases are classified as low resolution and fast motion, respectively.Low resolution can also cause the target to be blocked easily.These factors make it challenging to track the ground moving small target accurately and in real time.
When the target occupies a small number of pixels, limited feature information is obtained from target pixels.High-level features that are capable of more powerful feature expression are favored in such circumstances to ensure the robustness of the tracking method.Danelljan et al. 2 effectively improved the tracking effect of the method in their paper 3 using multichannel color features instead of grayscale features.However, a single color feature is not sufficient for capturing all illumination changes.Henriques et al. 4 used the multichannel histogram of oriented gradient (HOG) 5 to represent the target, which can well represent the local shape feature of the target.Hence, the tracking effect of the correlation filtering was significantly improved.However, trackers based on HOG often perform poorly when the target has movements or serious deformations.Experiments showed that the above two single feature models cannot favorably cope with small targets, resulting in target drifting.Danelljan et al. 6 won the 2016 VOT-challenge with a comprehensive combination of the multichannel color feature, a well-trained convolution neural network (CNN) feature, and the HOG feature.However, due to the limited number of online training samples in the target tracking, the overdimensioned feature vector easily leads to over fitting; this method requires the updating of more than 800,000 model parameters with each use, making it difficult to fulfill the real-time requirement of target tracking.
Expanding the search area to obtain a larger sampling area is one of the ways to deal with fast moving targets.However, the amount of computation is increased and the false alarm rate rises due to the introduction of objects similar to the target.To cope with fast movements of the target, Ma et al. 7 introduced an online random fern classifier, which is similar to training learning detection, 8 to redetect targets.However, the redetection module is based on the grayscale features, so it is difficult to achieve good redetection performance in a large area.Zhang et al. 9 used multiple trackers as an expert group to conduct semisupervised loss judgment on the expert group's tracking results to select the optimal tracking result and improve the reliability of the tracker.However, it is still difficult for the method to deal with disturbing objects in the search area based on a single grayscale feature.Additionally, each frame requires multiple tracking and detection, making it difficult to achieve real-time performance.Zhu et al. 10 used edge boxes 11 to obtain areas with more closed edge information as a global candidate area instead of using a local search area.However, when the target is small, its edge information is relatively limited, making it difficult for edge boxes to accurately locate the target.In addition, the edge box method requires sampling of a large number of areas to improve the probability that the target is detected, compromising real-time performance.
Small targets are easily obscured, which increases the difficulty of tracking.Jia 12 used a local sparse representation of the target to cope with partial occlusions of the target.Zhao et al. 13 used an innovative keypoint matching-based tracker to handle the partial occlusion problem, yet these two methods cannot cope with relatively large occlusion sizes.Also, the average frame rate of this method on the OTB2013 1 dataset is 8.5 frames/s, not satisfying the real-time requirement of target tracking.In addition, small targets that lack information are not suitable for local sparse representation.Kalal et al. 8 introduced the online random fern classifier to redetect targets, but the redetection module is based on simple grayscale features, so it is difficult to obtain good redetection results.In the case that the target is completely obscured, Yan et al. 14 used the Kalman filter method to estimate the target position to achieve the target tracking, though the position estimation method cannot accurately estimate how the target would separate from the occlusion.
In this paper, to solve the problems of fast motions and occlusions of the target in UAV target tracking, an adaptive algorithm that fuses the improved color histogram tracking response and the correlation filter tracking response based on multichannel HOG features is proposed to realize stronger feature expression for small targets.The state judgment index is used to determine whether the target is in a fast motion or an occlusion state.In the fast motion state, the search area is enlarged, and the color optimal model that suppresses the suspected area is used for rough detection.Then, redetection in the location of multiple peaks in the rough detection response is carried out using the correlation filter to accurately locate the target.In the occlusion state, the model stops updating, the search area is expanded, and the current color model is used for rough detection.Then, redetection in the location of multiple peaks in the rough detection response is carried out using the correlation filter to accurately locate the target.The block diagram of realtime UAV tracking of fast moving small target on ground is shown in Fig. 1.

Target Tracking Method by Fusing Two Tracking Models
The HOG is a statistical feature based on the local gradient direction, which cannot cope with target deformations well.The global color distribution of the target does not change greatly with target deformations.Therefore, the global color feature can better deal with target deformations.By contrast, the color feature cannot deal with illumination changes very well, whereas the HOG uses gamma correction to normalize the contrast of the original image and can better deal with illumination changes.The color feature and the HOG complement each other.Hence, a tracking model based on the fusion of these two features is expected to represent the small target more powerfully and track it more accurately.A flowchart of the proposed target tracking method by the fusion of two tracking models is shown in Fig. 2.

Tracking Response of the Correlation Filter
Model Based on Local Multichannel HOG Features The correlation filter tracking method based on multichannel local HOG features is divided into the training stage and the detection stage.In the training stage, the optimal correlation filter is obtained by training the sample set, and the optimal filter is updated according to the fast updating strategy.Multichannel local HOG features are extracted for each pixel in the local search area of the previous frame, which are then used to form a matrix, and the rows and columns of the matrix are cyclically shifted to obtain a training sample set.According to the characteristics of the circulant matrix, the discrete Fourier domain was used to solve the correlation filter instead of the Ridge regression to avoid matrix inversion, reducing the complexity of the algorithm by several orders of magnitude and achieving real-time performance. 4n the detection stage, multichannel local HOG features are extracted for each pixel in the local search area of the current frame, which are then used to form a matrix, and the rows and columns of the matrix are cyclically shifted to obtain a to-be-detected sample set.The correlation filter response score for each sample set is obtained according to the updated optimal filter, and the coordinates of the sample with the highest score are set as the center location.

Characteristics of the circulant matrix
In this section, the one-dimensional (1-D) single channel signal, which is methodologically similar to the twodimensional (2-D) multichannel signal, is used to describe the acceleration characteristic of the circulant matrix. 4uppose that the 1-D single channel signal is represented by a vector of n × 1, denoted as x ¼ ½x 0 ; x 1 ; x 2 ; Kx n−1 , then the circulant X is obtained by cyclic shift CðXÞ of x, shown as ; t e m p : i n t r a l i n k -; e 0 0 1 ; 6 3 ; 5 1 9 : Circulant X is the training sample set.Each row vector x q is a sample, and its corresponding label vector is y: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 6 3 ; 3 6 0 y ¼ ½y 0 ; y 1 ; y The goal of training is to find a function fðXÞ ¼ w T X that minimizes the squared error between samples x q and their label value y q , as shown in Eq. (1).Here, λ is a regularization parameter that controls overfitting.Note that E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 6 3 ; 2 5 4 min where w is the coefficient to be solved for.The linear regression least squares of w can be computed as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 4 ; 6 3 ; where the superscript H is the conjugate transpose.Directly solving Eq. (4) involves matrix inversion, which demands a huge amount of computation, compromising the real-time performance of the tracking method.
To reduce computational complexity, Eq. ( 4) is transformed into the frequency domain.According to Ref. 15, the circulant matrix is diagonalized by the discrete Fourier transform matrix F: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 3 2 6 ; 5 1 9 where x represents the discrete Fourier transformation of x, x ¼ FðxÞ.Here, ; t e m p : i n t r a l i n k -; e 0 0 6 ; 3 2 6 ; 4 7 4 In Eq. ( 6), diagðxÞdiagðxÞ ¼ diagðx⊙xÞ; δ is an all-1 vector and is omitted in the following equations x represents the conjugation of x.
Then, use the following according to the convolution property of the circulant matrix discussed in Ref. 16: where x represents the reverse order of x.A Fourier transform is carried out on both sides of Eq. ( 6) to solve for w: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 8 ; 3 2 6 ; 1 8 1 the width and height of the target of the previous frame. 17he multichannel local HOG feature dh N w;h is extracted for each pixel (w; h) in D t−1 , where N is the number of channels of the feature.A W × H matrix DH N is constructed using dh N w;h .Each element dh N w;h in the matrix is an N-dimensional vector.Training samples fDH N w;h jw ∈ f0;1; : : : ; W − 1g; h ∈ f0;1; : : : ; H − 1gg are generated by a cyclic shift operation on DH N .Training samples are used to train the optimal correlation filter h N cf so that it has the highest filtering response to the sample centered on (w; h) in D t−1 .The training process is a ridge regression process.Its purpose is to minimize the loss, as shown in Eq. ( 9): ; t e m p : i n t r a l i n k -; e 0 0 9 ; 6 3 ; 6 1 1 arg min where * represents the convolution operation, DH n w;h ðn ¼ 1;2;  and λ P N n¼1 kh n cf k 2 is the regular item to prevent overfitting, which must be >0.Note that λ is a regularization parameter and is assigned the optimal value 0.001 derived in Ref. 4. The idea discussed in Sec.2.1.1 is applied to solve Eq. ( 9), and the optimal correlation filter of the previous frame in the frequency domain is obtained: The optimal filter is updated according to the fast updating strategy proposed in Ref. 16: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 2 ; 6 3 ; 2 9 5 B t ¼ ð1 where η is an update parameter, which determines the update rate.A larger η means a greater impact of the current frame on the module, indicating faster model update.In this paper, η is assigned the optimal value of 0.01 derived in Ref. 16.Detection stage: The local searching area D t for training is selected by setting the center pixel P t−1 of the target tracking box of the previous frame I t−1 as the center of D t .The multichannel local HOG feature dh N w;h is extracted for each pixel (w; h) in D t−1 , where N is the number of channels of the feature.A W × H matrix DH N is constructed using dh N w;h .Each element dh N w;h in the matrix is an N-dimensional vector.Detecting samples fDH N w;h jw ∈ f0;1; : : : ; W − 1g; h ∈ f0;1; : : : ; H − 1gg are generated by a cyclic shift operation on DH N .According to the updated optimal filter, the correlation filtering response score of each sample is obtained as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 3 ; 3 2 6 ; 7 5 2 The position of the target center pixel x in D t is set to be the coordinates of the point (w max ; h max ) with the highest response score, and the correlation filtering tracking response score is S cf ðxÞ.

Target Tracking Response Based on Improved
Global Color Feature In the color histogram tracking, the probability of the pixel x belonging to the target in the current local search area D t is obtained by constructing the target normalized RGB color histogram and looking up the table.According to the normalized color histogram Hist fg of the foreground and the normalized color histogram Hist bg of the background of the current frame, the probability p fg ðxÞ that the pixel x belongs to the foreground and the probability p bg ðxÞ that the pixel x belongs to the background are, respectively, calculated.
E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 4 ; 3 2 6 ; 5 3 1 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 5 ; 3 2 6 ; 4 9 8 where i x indicates that the pixel x belongs to the i'th bin in the color histogram.According to Ref. 18, the probability that pixel x belongs to the target in the search area is denoted as To adapt the representation to changing object appearance and illumination conditions, we update the object model on a regular basis using linear interpolation The probability integral graph I in the search area D t is calculated, and the response score S hist ðxÞ of the target box in D t with pixel x as the center and the target size area as the size of the box is obtained: where W and H are the width and height of the current target, respectively, and (i; j) represents the horizontal and vertical coordinates of the pixel x.
The position of the target center pixel x in D t is set to be the coordinates of the point (i max ; j max ) with the highest response score, and the color tracking response score is S hist ðxÞ.
If the target is relatively small, drifting to areas with a similar color is likely to happen.To cope with drifting, the current method suppresses areas with suspected color similarities to reduce interference from these areas.
When the response score S hist ðxÞ of the box area satisfies Eq. ( 18), it is considered to be a suspected area: where x dis represents the central position of the suspected rectangular area and θ 0 is the threshold parameter, which is arbitrarily set to be 0.8 here.The suspected area is sorted according to its response score.The normalized color histogram set fHist n dis jn ¼ 1 • • • Ng for the first N suspect areas is calculated.Then, the probability that pixel x belongs to each suspected area is calculated, followed by recalculation of the probability that pixel x in D t belongs to the target as shown in Eq. ( 19): E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 9 ; 6 3 ; 6 4 2 P t ðx ∈ OjD t Þ ¼ Then, the color tracking response score S hist ðxÞ of the target tracking box in the search area D t is recalculated using Eq. ( 17).
To test whether the suppression of color suspicious areas can effectively reduce interference from suspected areas, a comparative experiment on images with small targets and color-like areas in the UAV123 dataset is carried out.Some experimental results are shown in Fig. 3.
The second column in Fig. 3 is the probability map of pixels belonging to the target without suppression.Probability values at the target area are high, yet those of color suspicious areas are also high, causing interference to target tracking.The third column in Fig. 3 is the probability map of pixels belonging to the target with suppression.Responses in suspected areas are suppressed.The decrease in probability values at the target area in the map is less than that at suspected areas, which makes the probability value of the target more prominent.Thus, experiments show that suppression of color-like areas can effectively reduce interference from suspected areas.

Fusion Dual Model Tracking Response
Adaptive fusion of the improved color histogram tracking response and the correlation filter tracking response based on multichannel HOG feature was carried out to determine the center position P t of the target tracking box in the current frame I t : E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 0 ; 3 2 6 ; 7 3 0 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 1 ; 3 2 6 ; 6 9 0 S f ðxÞ ¼ S cf ðxÞ þ f½S hist ðxÞ; where S f ðxÞ is the tracking response score at x of the fusion dual model.When there are many suspected areas, the target tracking box, which is determined by the target tracking response based on improved global color feature, is likely to drift to areas with a similar color.To guarantee the exact location of the target tracking box, which is determined by the fusion dual model tracking response, we need to reduce the color tracking response score.Therefore, the value of the score is reduced to lower the impact of the color tracking response on the overall fusion probability.Hence, the color tracking response score S hist ðxÞ is adaptively adjusted as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 2 ; 3 2 6 ; 5 3 0 where N is the number of suspected areas.The color tracking response score is affected by the number of suspected areas.
When N is large, it is considered that the color tracking response score is not credible enough.Hence, the value of the score is reduced to lower the impact of the color tracking response on the overall fusion probability.On the contrary, if N is small, indicating that the color tracking response score is credible, it is appropriate to increase its value for better tracking results.The aim of these modifications is to achieve better tracking results.Target tracking results of the adaptive fusion dual model are compared with target tracking results of single models as shown in Fig. 4. In each image in Fig. 4, the green box demonstrates the tracking result of the correlation filter model The overlap score (OS) is used to measure the accuracy of target tracking results.OS is calculated as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 3 ; 6 3 ; 3 8 1 OS ¼ where B gt represents the true target position and B t represents the target location identified by the tracking method.Higher OS scores indicate higher accuracy.Figure 4 and Table 1 show that the proposed target tracking method that fuses multiple tracking models can achieve better OS, indicating higher tracking accuracy.

Fast Moving or Occluded Target Tracking
In the process of target tracking, rapid movement of the target leads to rapid location changes of the target in the video.Consequently, the target easily moves out of the local search area, resulting in tracking failure.In addition, the small size of the target makes it an easy victim of occlusion.In this paper, a tracking method that copes with target fast motions and target occlusions is proposed.

Target Tracking under Target Fast Motion
In a target tracking review article with >1000 citations, 1 the two cases described above are classified as fast motion and low resolution.A target is considered to be in the fast motion state when its position offsets >20 pixels between adjacent frames.
E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 4 ; 6 3 ; 8 9ð (24) (i t−1 ; j t−1 ) and (i t−2; j t−2 ) are the coordinates of the centers P t−1 and P t−2 , respectively, of the tracking box in frame t − 1 and the frame t − 2, respectively.

Update the model parameters of the correlation filter model and color feature model
When the target is in the fast motion state, model parameters of the correlation filter model and the color histogram model need to be adjusted to cope with changes in target posture and illumination.The correlation filter model parameters A n and B, the foreground histogram feature Hist fg , and the fusion response peak value maxðS f Þ of the frame, which is 2FR (FR indicates video frame rate) frames before the current frame, form a set which is then divided into L segments in time order A n l , B l , and Hist fgðlÞ corresponding to the frame with the largest response peak value from each segment; these are selected to form an expert group [A n l , B l , Hist fgðlÞ ].Weighted summation was performed on the members of the expert group to obtain the optimal correlation filter model and the foreground histogram feature needed in color tracking.Taking into account the temporal correlation between frames in a video sequence, greater weights are assigned to parameters in frames that are closer to the current frame: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 5 ; 3 2 6 ; 3 5 4 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 6 ; 3 2 6 ; 3 0 3 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 7 ; 3 2 6 ; 2 5 8 (27)

Redetect to find the true target
To track the fast moving target in real time, the local search area in color tracking is expanded to 2D t , and color primary detection is performed to obtain the color tracking response score S hist ðxÞ.
In color tracking, the true target area can be mistaken as a suspected area and, hence, be suppressed.Also, when the true target area and the suspected object are close, the suspected object can be omitted because it yields a subpeak response closed to the peak response area.In either case, multiple peak areas appear in the final color tracking response, and the highest peak response does not necessarily  represent the true target.For instance, as shown in Fig. 5, in the upper left image, the red box marks the true target, whereas the green box is the suspected object, and the response value of the suspected object in the color tracking response is higher than that of the true target.
To accurately track the true target, this paper uses the optimal correlation filter obtained in Sec.3.1.1 to redetect the multipeak position in the color tracking response to determine the true target.The multipeak position fx n p jn ∈ 1; : : : ; N 1 g of the color tracking response is determined first, where N 1 is the number of peak positions.The multipeak position is determined using Eq. ( 28): E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 8 ; 6 3 ; 2 2 8 S hist ðx p Þ > θ 1 max½S hist ðxÞ; (28 where θ 1 is assigned 0.8.Then, the local search area fD n t jn ∈ 1; : : : ; N 1 g is selected in x n p .The optimal correlation filter is used to redetect the local area to obtain the multipeak redetection response set fS n cf ðxÞjn ∈ 1; : : : ; N 1 g, where the peak position of max½S n cf ðxÞ is the target center position P t .In the first column of Fig. 6, the target is not occluded and there is only a single peak corresponding to the target in the tracking response map.Except for the sharp peak at the center of the target, the rest of the map is relatively smooth.

Target Tracking in Occlusion
In the second column, the target is partially occluded, and there are many peaks in the tracking response map, with no single maximum peak value and with large fluctuations.
In the third column, the target is more occluded, and an additional peak appears in the tracking response map, with an even larger overall fluctuation.
To cope with this problem, an indicator OCC for judging the degree of occlusion of the target is proposed: where W and H represent the width and the height, respectively, of the response map corresponding to the local search area.This indicator reflects the degree of smoothness of the response map and the confidence level that the peak is in the center of the target.
In the process of target tracking, the value of OCC, which is used to judge the degree of occlusion of the target, is shown in the third row of Fig. 6.The first column has the largest OCC value.The second column has a smaller OCC value, and the third column has an even smaller OCC value.
Figure 5 shows that the OCC value can be used to judge the degree of occlusion of the target.In this paper, when the OCC value of the I t frame is less than β times the OCC value of the I t−1 frame, the target is considered occluded.Here, β is assigned 0.8.

Redetect the occluded target
A small target is easily obscured.When it is judged that the target is occluded by the occlusion indicator OCC, the local search area is expanded to 2D t .First, the color primary detection is performed to determine the multipeak position fx n p jn ∈ 1; : : : ; N 1 g, where N 2 is the number of peak positions.The multipeak position is determined by Eq. ( 30): E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 3 0 ; 6 3 ; 2 7 4 S hist ðx p Þ > θ 2 max½S hist ðxÞ; (30 where θ 2 is assigned a value of 0.7.Then, the local search area fD n t jn ∈ 1; : : : ; N 1 g is selected in x n p .The preocclusion correlation filter is used to redetect the local area to obtain the multipeak redetection response set fS n cf ðxÞjn ∈ 1; : : : ; N 1 g, where the peak position of max½S n cf ðxÞ is the target center position P t .

Flowchart of the Proposed Method
The flowchart of the proposed real-time UAV tracking of a fast moving small target on the ground is shown in Fig. 7.
In the target tracking process, first, the center of the local search area D t in the current frame I t is set to be the center position P t−1 of the target tracking result of the I t−1 frame.The correlation filter tracking response S cf ðxÞ and the color tracking response S hist ðxÞ of the pixel x in D t are calculated, and the score S f ðxÞ is obtained by adaptively combining the two responses.The peak position of S f ðxÞ is the same as the target center position P t of the I t frame.In the fast motion state, the proposed method uses the optimal correlation filter to redetect the multipeak position of the color tracking response to determine the true target.The optimal correlation filter is used to redetect the local area to obtain the multipeak redetection response set fS n cf ðxÞjn ∈ 1; : : : ; N 1 g.The peak position of max½S n cf ðxÞ is the target center position P t .In the occlusion state, the color primary detection is performed.The pre-occlusion correlation filter is used to redetect the local area to obtain the multipeak redetection response set fS n cf ðxÞjn ∈ 1; : : : ; N 1 g in the multipeak position.The peak position of max½S n cf ðxÞ is the target center position P t .

Experimental Results and Analysis
A set of video sequences containing small targets, fast motion, and occlusion characteristics from the database UAV123 are selected for the experiment, including a total of 15 groups and 6611 images. 19The image size is 1280 × 720 pixels.The tracking targets include people, cars, and other objects.All targets have fine manual annotation.The proposed method of this paper is compared with eight other state-of-the-art methods, including the CN tracker that uses color attributes as effective features, 2 the KCF tracker that uses the multichannel HOG feature, 4 the DSST tracker that relieves the scaling issue using the feature pyramid and the three-dimensional (3-D) correlation filter, 16 the LCT tracker that uses the online random fern classifier as the redetection component for long-term tracking, 7 the DAT tracker that uses the color histogram feature and suppresses the background area, 20 and the Staple tracker that fuses the color tracker and correlation filter tracker linearly. 18The above-mentioned six methods have outstanding tracking results, and the speed of tracking meets the real-time requirement.Also, the MEEM 8 tracker All experimental results and related performance evaluations were obtained using the same data and initialization conditions.Experimental environment: Matlab2016; experimental platform: 3.60 GHz, Intel i7 CPU, 64-bit win7 operating system, with 8GB of memory.

Comparison of Tracking
Results for the Different Methods Tracking performance of the proposed method and CN, KCF, DSST, DAT, LCT, Staple, MEEM, and C-COT are compared in the video sequence set in which the target is small, fast moving, and occluded as shown in Fig. 8.The white box is the location of the true value of the target, which is used to compare with the tracking position obtained by the algorithm.
The targets in the first and second rows of Bike2 and Truck4 were <200 pixels in size, and the target in the third row in Car11 were <100 pixels in size.Experimental results show that the tracking boxes of the other eight methods easily lose the target or drift to suspected objects.The proposed method is able to better characterize small targets with little feature information because of the adaptive fusion of multifeature models, improving the success rate of target tracking.In the third and the fourth rows of Bike3 and Car11, the targets are blocked and the other eight methods were unable to deal with occlusion, resulting in failure in tracking.The proposed method efficiently judges whether the target is occluded and initiates the corresponding tracking method when occlusion is detected, ensuring successful target tracking.In the fifth and the sixth rows of Wakeboard5 and Car14, the between-frame target position distance is >20 pixels.The other eight methods lost the target under this situation.The proposed method efficiently judges whether the target is in fast motion and initiates the corresponding tracking method when fast motion is detected, ensuring successful target tracking.In the seventh and the eighth rows of Car13 and Truck3, there are strong interfering objects near the true target, and most of the eight methods failed to track.The proposed method suppressed the suspected areas effectively and greatly reduced the interference from the suspected areas.It can resist the impact of strong interfering objects on small targets and track the target successfully.

Experiment of overlap success rate
If the overlap score of the tracking result of the I t frame is beyond a given threshold, it is considered that the proposed method has successfully tracked the target in the I t frame.The overlap success rate 1 is the ratio of the number of successful tracking frames to the total number of frames.The overlap score is defined in Eq. ( 23).
The comparison of the overlap success rate of the proposed method with that of the other eight methods is shown in Fig. 9.In this paper, the area under curve (AUC) of the overlap success rate curve was used to evaluate the performance of the tracking methods because it is considered a more accurate evaluation of the overall tracking performance.The AUC values of all methods tested are listed after each method name in the figure legend of Fig. 9.
As shown in Fig. 9, the proposed method in this paper has the highest AUC, indicating that the performance of the proposed method has a high overlap success rate.When the overlap threshold is <0.5, the overlap success rate of the proposed method is substantially higher than that of the other methods.However, the success rate of the proposed method is slightly lower than that of the CCOT when the overlap threshold is high.This is because the proposed method mainly aims at small targets; hence, it does not adopt a complex scale adaptive strategy.

Experiment of distance precision rate
If the Euclidean distance between the center of the I t frame tracking result and the given target center is within a given location error threshold, it is considered that the proposed method has tracked the target precisely in the I t frame.
The distance precision rate is the ratio of the number of precise tracking frames to the total number of frames.The comparison of the distance precision rate of the proposed method with that of the other eight methods is shown in Fig. 10.
The horizontal axis denotes the location error threshold and the vertical axis denotes the distance precision rate.The AUC value is again used as the evaluation index because it more accurately evaluates the overall performance of the methods.The AUC values of all methods tested are listed after each method name in the figure legend of Fig. 10.
As shown in Fig. 10, the proposed method in this paper has the highest AUC, indicating that the performance of the proposed method has a high distance precision rate.When the location error threshold is <5, the distance precision rate of the proposed method is substantially higher than that of the other methods.However, the success rate of the proposed method is slightly lower than that of the CCOT when the location error threshold is small.This is because the proposed method mainly aims at small targets; hence, it does not adopt a complex scale adaptive strategy.

Experiment of average center location error
The center location error is the average Euclidean distance between the center of the tracking result and the given target center.Table 2 shows the center location error of the proposed method and the other eight methods.
Table 2 shows that the average center location error of the proposed method is much smaller than that of the other eight methods.It shows that the tracking performance of this method is better than that of the other methods.

Comparison of the real-time performance between methods
The proposed method is compared with the other eight methods for real-time performance.The frames per second (fps) is   used to evaluate real-time performance.The fps of each method is shown in Table 3.
According to Table 3, when compared with CCOT, MEEM, DAT, DSST, and LCT, the proposed method has higher fps, indicating that the proposed method has better real-time performance.The fps of methods STAPLE, KCF, and CN are higher than the proposed method; however, the proposed method performance is superior to them due to its multifeature model and strategies for coping with small target fast motion and occlusion.

Conclusion
An adaptive algorithm that fuses the improved color histogram tracking response and the correlation filter tracking response based on multichannel HOG features is proposed to realize small target tracking with high accuracy.The state judgment index is used to determine whether the target is in fast motion or an occlusion state.In the fast motion state, the search area is enlarged, and the color optimal model that suppresses the suspected area is used for rough detection.Then, redetection in the place of multiple peaks in the rough detection response is carried out using the correlation filter to accurately locate the target.In the occlusion state, the model stops updating, the search area is expanded, and the current color model is used for rough detection.Then, redetection in the place of multiple peaks in the rough detection response is carried out using the correlation filter to accurately locate the target.The proposed method of this paper is compared with the other state-of-the-art methods using the UAV123 dataset.Experimental results show that the proposed method can accurately track a fast moving small target in real time.The fps of the proposed method is 40.23 indicating good real-time performance.In this paper, single target tracking is studied.In future research, multitarget tracking will be studied.Based on multitarget timedomain information and airspace information, an accurate real-time tracking method for UAV multitarget tracking will be developed.

Fig. 3
Fig. 3 Color tracking responses with and without color suspicious area suppression: (a) original image (the target is within the black box), (b) probability map of pixels belonging to the target without suppression, and (c) probability map of pixels belonging to the target with suppression.

Fig. 4
Fig. 4 Comparison of target tracking results of the adaptive fusion dual model and two single models.

3 . 2 . 1
Judge the degree of occlusion of the target Fusion dual model target tracking response [S f ðxÞ] is shown in Fig. 6.Original images are shown in the first row in Fig. 6.The target is a pedestrian and is obscured by the car during tracking.The fusion tracking response maps [S f ðxÞ] corresponding to the local search area are shown in the second row.

Fig. 9
Fig.9Comparison of the overlap success rate of the proposed method with the other eight methods.

Fig. 10
Fig. 10 Comparison of the distance precision rate of the proposed method with the other eight methods.
Block diagram of real time UAV tracking of fast moving small target on ground.

Table 1
Comparison of OS scores of target tracking results of the adaptive fusion dual model and two single models.

Table 2
Comparison of the center location error of the proposed method with that of the other eight methods.

Table 3
Comparison of the frames per second of the proposed method with that of the other eight methods.