Foreground object detection in a complex scene is an important step in many computer vision applications, such as visual surveillance,12.–3 intelligent behavior recognition45.6.–7 and vehicle motion tracking.89.10.11.–12 It is always desirable to accurately extract the foreground object(s). These applications are highly dependent on the results of foreground object detection. A common and efficient approach for extracting foreground objects from complex background is using background subtraction methods.1314.15.–16 Background subtraction methods detect foreground objects by comparing each current frame with its background model. A difficulty with the background subtraction method is that complex scenes are usually dynamic. Complex scenes could be caused by waving trees, falling rain, illumination changes, and other background changes. To process these complex scenes, the technology of a robust background model is crucial.
Early background models171819–20 had the advantage of low memory consumption and high processing speed. These approaches work well with stationary scenes, but usually they cannot handle complex scenes properly. Therefore, a number of background modeling methods have been proposed, and the most common one is the Mixture of Gaussians Model (MoG).21 By using more than one Gaussian distribution per pixel, it is able to handle complex scene, but MoG consumes more memory space and processing time. Therefore, some methods2223.–24 proposed to improve this drawback. Instead of MoG, Kim13 presented a real-time object detection algorithm based on codebook model that is efficient in either memory or processing time, but it does not take into account dependence between adjacent pixels. Heikkila and Pietikainen25 presented a novel approach to background subtraction, in which the background is modeled by texture features. It is capable of real-time processing under image size. Wang3 presented a background modeling method called SACON that computes sample consensus and estimates a statistical model of the background scene. Chan and Chien26 used a multi-background registration technique to calculate weight value for each pixel to update their background model. According to the weight value, the updating mechanism determines whether the pixel is replaced or not. thus it consumes less memory and computation time. Chiu27 proposed a fast background model construct algorithm that improved the original weighting average method. It uses a probability-based algorithm to construct the background model and to detect the object. This approach works well with slight background changes, but it needs connect-component labeling methods to overcome the challenges of complex scenes.
In recent years, video compression technology and neural networks have been used to solve many problems in video surveillance. Wang28 proposed a background modeling approach in the frequency domain, which constructs background model using DCT coefficients to achieve lower processing time. However, it has difficulty handling complex scenes. Maddalean and Petrosino29 presented a new self-organizing method for modeling background by learning motion patterns that was employed to model complex scenes. This method records the weight vector at each pixel by using a large memory to store the neuronal map. Tsai30 proposed a Fourier spectrum background model that can adapt to illumination changes, but it is only suitable for indoor work and grayscale images. Ref. 31 proposed a background subtraction algorithm called Vibe that can be initialized with a single frame. This algorithm was embedded in a digital camera which has a low speed ARM processor. Ref. 32 presents a real-time approach for background subtraction which can overcome gradual and sudden illumination changes. This approach segments each pixel by using a probabilistic mixture-based and non-parametric model.
In this paper, a hybrid background modeling approach is based on stable and multi-layer astable records. It can effectively offer foreground object detection in complex scenes. Our approach is applicable to those background pixels which vary over time. In the detection phase, it takes into account the dependence of adjacent pixels in the astable background record by using homogeneous background subtraction; therefore, we can extract the foreground object with a low error rate. In this way, our hybrid pixel-based background (HPB) model and detection method are resistant against erroneous detection in the complex scene.
Proposed HPB Model and Homogenous Background Subtraction
In our proposed foreground object detection system, the block diagram is illustrated in Fig. 1, which includes learning and detection phases. In the beginning, the video input is switched to the learning phase. According to the pixel variation and analysis, the steady pixels are kept in a stable record and the varying pixels are saved by the multi-layer astable records. After using the above records to construct an HPB model completely, the system switches the input sequence into the detection phase and starts to perform the foreground object detection.
Creating a Stable Background Record
In the learning phase, stable pixel analysis is applied to make a stable background record (SBR). For a video sequence with -pixel frame size, let be denoted as the pixel at location () in RGB at time , as shown in Eq. (1). Equation (5) is used to assess the similarity between two pixel and . These two pixels are regarded as similar if their difference is smaller than threshold value (). Stable Time, , is used to store the duration of those pixels that are alike; a value of stands for the pixel has unchanged for a period of time. If is not similar to , Stable Time will be reset to 0 and counting is resumed. Thus, the value of implies the stability of a pixel, and stable background record can be built according to Eqs. (1) and (2),3).
Constructing the Astable Background Record
Building the astable background record (ABR) is important since a stable background record is not sufficient to obtain a foreground object precisely. The astable background record consists of multi-layer 2-dimension buffers to store the complex scene. In the learning phase, the record stands for one background pixel stored in () buffer of the th layer buffer. Let represent the match counter to count how many is matched to . The multi-layer astable background record is gradually established frame by frame.
Step 1: Initialization at ;
Step 2: Major learning for the pixel of frames from to ;
Step 3: Release the based on the criteria (a) or (b)
In the beginning frame, the input pixel is stored in the first layer of the . For the second frame, the pixel is compared with the corresponding . If they are similar, then will be increased. If there is no match, then will be stored in the second layer of , and so on. At the end of learning phase, we have to delete the useless which meet criteria (a) or (b) to reduce the memory requirements. When is less than , it means that this could be a foreground element appearing temporarily or noise. If is similar to , the corresponding should be deleted to save memory space.
Figure 2 shows an example of finding a stable background record (SBR) and 3-layer ABR. Figure 2(a) illustrates a static background record after 300 learning frames. Figure 2(b) shows the , in which shaking leaves, falling rain, and water were recorded correctly as an astable background. Figures 2(c) and 2(d), which show the other shaking leaves, falling rain, are the second and the third layers of the astable background record, respectively.
Foreground Object Detection with Homogenous Background Subtraction
After the construction of HPB model, foreground objects can be obtained by background subtraction. However, according to Figs. 2(b)–2(d), it can be observed that the astable background records have composition of homogeneous blob movements in the shaky area. In order to reduce detection error and save recording memory, the characteristic of homogeneity in an area has to be taken into account while performing the background subtraction. Thus, the input is compared to the neighbors in area (. The neighboring area can be and centered at . A foreground object (FO) can be detected by homogenous background subtraction, as in Eq. (4):
In the detection phase, we must update the HPB model over time to prevent detection errors resulting from outdated background information. Since the SBR and ABR are established in different ways, methods for updating them are different as well. The SBR is composed of stable pixels, therefore only background information updates are needed. On the other hand, updating the background information for ABRs needs a more complex replacement mechanism.
In Eq. (5), when and match, we use the running average to update the corresponding ,
Similarly, as in Eq. (6), when and match, we also use the running average to update the ABR. However, when and do not match, we use an exponential distribution probability density function to determine if should be replaced. Thus, represents the probability of whether or not the replacement should occur. The ABR update and replacement procedure is as follows: Compare the input pixel with for all ,,.
Step 1: If matches to .
Step 2: If not match to .
where is a threshold for probability. And then, the probability value can be obtained as8),
Experimental Results and Comparison
To evaluate the performance of background subtraction, three test video sequences including waving trees, torrential rain and wind, and PETS’2001 Dataset 333 were used in the experiments. The performance of the proposed method was compared with that of Codebook,13 MOG (Wu),23 SACON,3 Chien,26 and ViBe.31 A pixel-based error rate based on ground truth is a fair and often adapted assessment method,27 and was used to evaluate each method’s performance. The error rate is given in Eq. (9),
It is important to choose a proper number of background layers to trade off between the hardware memory and the number of ABR layers based on the scene requirements. Figures 3(c)–3(g) show the foreground detection results for the waving trees video images, based on our proposed HPB model with 1 to 5-layer ABRs. In Fig. 4, the error rates of the 3-layer, 4-layer, and 5-layer ABRs are small. However, the 3-layer ABR uses much less memory.
In Figs. 5Fig. 6–7, we demonstrate that the proposed approach exhibits better foreground detection than the other methods in three benchmarks. Furthermore, Figs. 8Fig. 9–10 demonstrate that the proposed method presents a lower error rate in the ground truth comparison. The average error rates of the results from the six methods for various sequences are depicted in Table 1. It shows that the proposed approach has lower average error rate than the other methods.
Average of error rate in various benchmarks.
|Video sequence||Codebook||MoG (Wu)||SACON||Chien||ViBe (N=20)||Proposed|
As shown in Fig. 11, we use the TI TMS320DM6446 Davinci development kit as our development platform which has a dual-core device including ARM926EJ-S and C64x+TM DSP. The resources of an embedded platform are limited, so the implementation has to consider memory consumption. Table 2 lists the real memory utilization for all six methods when applied to the different video sequences. It shows the memory requirement of our proposed method is much less than other approaches; thus, our approach can achieve the real-time operation with 23 frames per second for the waving trees video. The proposed method is suitable for implementation in an embedded platform.
Memory comparison of background models.
|Waving tree(160×120)||1638 KB||1920 KB||1459 KB||1094 KB||1152 KB||1056 KB|
|Raining(320×240)||5330 KB||7680 KB||5836 KB||4377 KB||4608 KB||4224 KB|
|PETS 2001(768×576)||26.5 MB||44.2 MB||33.6 MB||25.2 MB||26.5 MB||24.3 MB|
An efficient and precise foreground object detection method was proposed in this paper. The proposed method applies a stable background record and multi-layer astable background records to construct a correct background model. While when more layers are used, more background information can be recorded to improve the precision, it also needs more memory as well as more calculation effort. Thus, it is important to choose a proper number of background layers to trade off between the memory load and the number of dynamic background layers required by the scene. To save more memory space and calculation time, the 3-layer dynamic background model was used in our approach. According to our experimental results, the error rates of the 3-layer, 4-layer, and 5-layer HPB models are similar in many video benchmarks. The results demonstrate that the proposed method has a lower error rate with ground truth than the five other models tested. Furthermore, the proposed approach has higher precision of object detection than other methods for various sequences. The final verification was done using a 2.66 GHz CPU with a video resolution of and an execution speed of in complex background scene. In addition, the proposed method can achieve real-time for complex scene on a Davinci embedded platform.
We would like to thank Oliver Barnich and M. Van Droogenbroeck, who provided the C-like source code for his algorithms.
C. StaufferW. Grimson, “Adaptive background mixture models for real-time tracking,” in Proc. IEEE Conf. Computer Vision and Pattern Recogn., Vol. 2, pp. 246–252 (1999).Google Scholar
A. CavallaroO. SteigerT. Ebrahimi, “Tracking video objects in cluttered background,” IEEE Trans. Circuits Syst. Video Technol. 15(4), 575–584 (2005).ITCTEM1051-8215http://dx.doi.org/10.1109/TCSVT.2005.844447Google Scholar
H. WangD. Suter, “A consensus-based method for tracking modeling background scenario and foreground appearance,” Pattern Recogn. 40(3), 1091–1105 (2006).PTNRA80031-3203http://dx.doi.org/10.1016/j.patcog.2006.05.024Google Scholar
J. W. Hsiehet al., “Video-based human movement analysis and its application to surveillance systems,” IEEE Trans. Multimedia 10(3), 372–384 (2008).ITMUF81520-9210http://dx.doi.org/10.1109/TMM.2008.917403Google Scholar
C. F. Juanget al., “Computer vision-based human body segmentation and posture estimation,” IEEE Trans. Syst. Man Cybernet. A: Syst. Hum. 39(1), 119–133 (2009).1083-4427 http://dx.doi.org/10.1109/TSMCA.2009.2008397Google Scholar
W. LaoJ. HanP. H. N. de With, “Automatic video-based human motion analyzer for consumer surveillance system,” IEEE Trans. Consumer Electron. 55(22), 591–598 (2009).ITCEDA0098-3063http://dx.doi.org/10.1109/TCE.2009.5174427Google Scholar
C. H. Chuanget al., “Carried object detection using ration histogram and its application to suspicious event analysis,” IEEE Trans. Circuits Syst. Video Technol. 19(6), 911–916 (2009).ITCTEM1051-8215http://dx.doi.org/10.1109/TCSVT.2009.2017415Google Scholar
S. C. Chenet al., “Learning-based spatio-temporal vehicle tracking and indexing for transportation multimedia database systems,” IEEE Trans. Intel. Transport. Syst. 4(3), 154–167 (2003).1524-9050http://dx.doi.org/10.1109/TITS.2003.821290Google Scholar
J. W. Hsiehet al., “Automatic traffic surveillance system for vehicle tracking and classification,” IEEE Trans. Intel. Transport. Syst. 7(2), 175–187 (2006).1524-9050http://dx.doi.org/10.1109/TITS.2006.874722Google Scholar
W. ZhangQ. M. Jonathan Wu, “Moving vehicles detection based on adaptive motion histogram,” Digital Signal Process. 20(3), 793–805 (2010).DSPREJ1051-2004http://dx.doi.org/10.1016/j.dsp.2009.10.006Google Scholar
N. BuchJ. OrwellS. A. Velastin, “Detection and classification of vehicles for urban traffic scenes,” in Int. Conf. Vis. Inform. Eng., pp. 182–187 (2008).Google Scholar
P. Spagnoloet al., “Moving object segmentation by background subtraction and temporal analysis,” Image Vis. Comput. 24(5), 411–423 (2006).IVCODK0262-8856http://dx.doi.org/10.1016/j.imavis.2006.01.001Google Scholar
J. C. NascimentoJ. S. Marques, “Performance evaluation of object detection algorithms for video surveillance,” IEEE Trans. Multimedia 8(4), 761–774 (2006).ITMUF81520-9210http://dx.doi.org/10.1109/TMM.2006.876287Google Scholar
P. M. JodoinM. MignotteJ. Konrad, “Statistical background subtraction using spatial cues,” IEEE Trans. Circuits Syst. Video Technol. 17(12), 1758–1763 (2007).ITCTEM1051-8215http://dx.doi.org/10.1109/TCSVT.2007.906935Google Scholar
I. HaritaogluD. HarwoodL. S. Davis, “W4: real-time surveillance of people and their activities,” IEEE Trans. Pattern Anal. Mach. Intel. 22(8), 809–830 (2000).ITPIDJ0162-8828http://dx.doi.org/10.1109/34.868683Google Scholar
C. KimJ. N. Hwang, “Object-based video abstraction for video surveillance systems,” IEEE Trans. Circuits Syst. Video Technol. 12(12), 1128–1138 (2002).ITCTEM1051-8215http://dx.doi.org/10.1109/TCSVT.2002.806813Google Scholar
S. Y. ChienS. Y. MaL. G. Chen, “Efficient moving object segmentation algorithm using background registration technique,” IEEE Trans. Circuits Syst. Video Technol. 12(7), 577–586 (2002).ITCTEM1051-8215http://dx.doi.org/10.1109/TCSVT.2002.800516Google Scholar
C. StaufferW. Grimson, “Learning patterns of activity using real-time tracking,” IEEE Trans. Pattern Anal. Mach. Intel. 22(8), 747–757 (2000).ITPIDJ0162-8828http://dx.doi.org/10.1109/34.868677Google Scholar
D. S. Lee, “Effective Gaussian mixture learning for video background subtraction,” IEEE Trans. Pattern Anal. Mach. Intel. 27(5), 827–832 (2005).ITPIDJ0162-8828http://dx.doi.org/10.1109/TPAMI.2005.102Google Scholar
M. HeikkilaM. Pietikainen, “A texture-based method for modeling the background and detecting moving objects,” IEEE Trans. Pattern Anal. Mach. Intel. 28(4), 657–662 (2006).ITPIDJ0162-8828http://dx.doi.org/10.1109/TPAMI.2006.68Google Scholar
W. K. Chanet al., “Efficient content analysis engine for visual surveillance network,” IEEE Trans. Circuits Syst. Video Technol. 19(5), 693–703 (2009).ITCTEM1051-8215http://dx.doi.org/10.1109/TCSVT.2009.2017408Google Scholar
C. C. ChiuM. Y. KuL. W. Liang, “A robust object segmentation system using a probability-based background extraction algorithm,” IEEE Trans. Circuits Syst. Video Technol. 20(4), 518–528 (2010).ITCTEM1051-8215http://dx.doi.org/10.1109/TCSVT.2009.2035843Google Scholar
W. WangJ. YangW. Gao, “Modeling background and segmenting moving objects form compressed video,” IEEE Trans. Circuits Syst. Video Technol. 18(5), 670–681 (2008).ITCTEM1051-8215http://dx.doi.org/10.1109/TCSVT.2008.918800Google Scholar
L. MaddaleanA. Petrosino, “A self-organizing approach to background subtraction for visual surveillance applications,” IEEE Trans. Image Process. 17(7), 1168–1177 (2008).IIPRE41057-7149http://dx.doi.org/10.1109/TIP.2008.924285Google Scholar
O. BranichM. Van Droogenbreck, “ ViBe : a universal background subtraction algorithm for video sequences,” IEEE Trans. Image Process. 20(6), 1709–1724 (2011).IIPRE41057-7149http://dx.doi.org/10.1109/TIP.2010.2101613Google Scholar
A. LanzaS. SaltiL. Di Stefano, “ Background subtraction by non-parametric probabilistic clustering,” in IEEE Int. Conf. Adv. Video Signal-Based Surveil., IEEE Society, Klagenfurt, pp. 243–248 (2011).Google Scholar
PETS’2001 Dataset 3. http://www.cvg.cs.rdg.ac.uk/PETS2001/pets2001-dataset.html Last modified 6 July (2009).Google Scholar
Wen-Kai Tsai is a PhD candidate at Graduate School of Engineering Science and Technology, National Yunlin University of Science and Technology, Taiwan. He received BS and MS degrees in electronics engineering from National Yunlin University of Science and Technology, Taiwan, in 2004 and 2006, respectively. His research interests include digital signal processing and image processing.
Chung-Chi Lin received the MS degree in computer science from University of Houston, TX, USA, in 1983, and the PhD degree in engineering science and technology from National Yunlin University of Science & Technology, Taiwan, in 2009. Since 2009, he has been an Associate Professor with the Department of Computer Science, Tunghai University, Taiwan. His current research interests include image processing, digital signal processing, and System-on-chip design.
Ming-Hwa Sheu received the BS degree in electronics engineering from National Taiwan University of Science and Technology, Taipei, Taiwan, in 1986, and the MS and PhD degrees in Electrical Engineering from National Cheng Kung University, Tainan, Taiwan, in 1989 and 1993, respectively. From 1994 to 2003, he was an Associate Professor at National Yunlin University of Science and Technology, Touliu, Taiwan. Currently, he is a Professor in the Department of Electronics Engineering, National Yunlin University of Science and Technology. His research interests include CAD/VLSI, digital signal processing, algorithm analysis, and system-on-chip design.