## 1.

## Introduction

Video surveillance is often more important in the dark environment since many activities of interest often occur in the dark environment.^{1, 2} Video enhancement plays a key part in night-time video surveillance so that the objects or activities of interest can be clearly monitored. The problem of video enhancement for low quality video has become increasing by acute.^{3} The goal of the enhancement^{4} is to improve the visual appearance of the video, or to provide a “better” representation for further processing, such as analysis, detection, segmentation, and recognition. However, it is still a challenging problem for night-time video applications. To traditional algorithms, when the background is enhanced, the contrast often remains low, or the noise is often greatly enhanced. Since day-time videos are also available in many surveillance applications, there are many attempts to take this advantage to combine the day-time and the night-time scenes to enhance the night-time videos.^{5, 6, 7} However, since there are different moving objects in the day-time video and night-time video, and also the condition in the background may be different, it is not easy to produce good results with fusion.

To further improve the enhancement quality, it is desirable to give larger weight to the moving objects. However, accurate moving foreground extraction is difficult, especially with low contrast and noisy videos. Reference 8 used a multi-color background model per pixel to enhance the foreground moving objects. However, the method suffers from slow learning at the beginning, especially under busy background. It also cannot distinguish moving shadows from moving objects. In addition, the model does not update with time and therefore often fails under outdoor environments where the scene lighting changes frequently with time. Reference 9 presented a method which improves this adaptive background mixture model.

In this paper, a new method is proposed to fuse video frames from high quality day-time and night-time backgrounds with low quality night-time video frames. With the proposed algorithm, day-time images and night-time images are combined together to provide a much more enhanced background. In order to enhance the moving objects as well, we also propose a moving objects of region fusion method for improving the sharpness of the moving objects.

## 2.

## The Proposed Algorithms

The detailed procedures of the proposed method can be described as in Fig. 1. It should be noted that in Fig. 1, the ‘day-time image’ as well as the ‘night-time image’ are the high-quality images for providing a better enhanced background and the night-time video is the actually low-quality input that we need to enhance. In the following, we will describe algorithm.

## 2.1.

### Illumination Segmentation

We first decouple an input color image *f*(*x*, *y*) into intensity *I*(*x*, *y*), and color layer *C*(*x*, *y*) where our algorithm is mainly processed on the intensity layer *I*(*x*, *y*). Then an image is separated into the illumination layer *L*(*x*, *y*) and the reflectance layer *R*(*x*, *y*) of the day-time and night-time background image can be obtained by Retinex theory.^{10} It is assumed that the available luminance data in the image is the product between illumination and reflectance. The input image *I*(*x*, *y*) is represented by the product of the illumination and the reflectance as follows:

## 1

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} I(x,y) = R(x,y) \, \times \, L(x,y) \end{equation}\end{document} $$I(x,y)=R(x,y)\phantom{\rule{0.16em}{0ex}}\times \phantom{\rule{0.16em}{0ex}}L(x,y)$$*L*(

*x*,

*y*) is assumed to be contained in the low frequency components of the image while the reflectance

*R*(

*x*,

*y*) mainly represents the high frequency components of the image. The Gaussian low-pass filtered result of the intensity mage is used as the estimation of the illumination. The filtering process is actually a 2D discrete convolution with Gaussian kernel, which can be mathematically expressed as

^{11}:

## 2

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} L(x,y) = \sum\limits_{m = 0}^{M - 1} {\sum\limits_{n = 0}^{N - 1} {I(m,n)G(m + x,n + y)} } \end{equation}\end{document} $$L(x,y)=\sum _{m=0}^{M-1}\sum _{n=0}^{N-1}I(m,n)G(m+x,n+y)$$*G*is the 2D Gaussian function with size

*M*×

*N*. Gaussian kernel

*G*is defined as:

## 3

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} G(x,y) = q.\exp \left( {\frac{{ - (x^2 + y^2 )}}{{c^2 }}} \right) \end{equation}\end{document} $$G(x,y)=q.\mathrm{exp}\left(\frac{-({x}^{2}+{y}^{2})}{{c}^{2}}\right)$$*q*is a normalization factor:

## 4

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} \sum\limits_x {\sum\limits_y {q\cdot\exp \left( {\frac{{ - (x^2 + y^2 )}}{{c^2 }}} \right)} } = 1 \end{equation}\end{document} $$\sum _{x}\sum _{y}q\xb7\mathrm{exp}\left(\frac{-({x}^{2}+{y}^{2})}{{c}^{2}}\right)=1$$*c*is a scale constant (

*c*= 2 ∼ 5 is commonly used). Here, c is set to 3.

## 2.2.

### Enhanced Background

We adopt the weighted-average image-fusion algorithm to enhance nighttime background using illumination images *L*(*x*, *y*). The proposed fusion equation is as follows.

## 5

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} B_L (x,y) = \alpha *N_L (x,y) + (1 - \alpha )*D_L (x,y) \end{equation}\end{document} $${B}_{L}(x,y)=\alpha *{N}_{L}(x,y)+(1-\alpha )*{D}_{L}(x,y)$$## 2.3.

### Enhanced (Night-time) Video

Due to the low contrast, we can not clearly extract moving objects from the dark background. We propose an enhanced-video step to facilitate the extraction moving objects. The tone-mapping approach is used to enhance the video frames and to separate an image into details and large scale features. More specifically, the nonlinear tone mapping function is used to attenuate image details and to adjust the contrast of large scale features,^{12} as in Eqn. 6.

## 6

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} m(x,\psi ) = \frac{{\log \left( {\frac{x}{{xMax}}(\psi - 1) + 1} \right)}}{{\log (\psi )}} \end{equation}\end{document} $$m(x,\psi )=\frac{\mathrm{log}\left(\frac{x}{xMax}(\psi -1)+1\right)}{\mathrm{log}\left(\psi \right)}$$*xMax*and ψ controls the attenuation profile. This mapping function exhibits a similar characteristic as the traditional Gamma correction.

## 2.4.

### Motion Segmentation

After enhanced the night-time videos, motion detection is performed to extract the foreground moving objects,^{9} as in Eqn. 7. Each pixel in the scene is modeled by a mixture of *K* Gaussian distributions. The probability that a certain pixel has a value of
[TeX:]
$X_N$
${X}_{N}$
at time *N* can be written as.

## 7

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} p(X_N ) = \sum\limits_{j = 1}^K {w_j \eta (X_N ;\theta _j )} \end{equation}\end{document} $$p\left({X}_{N}\right)=\sum _{j=1}^{K}{w}_{j}\eta ({X}_{N};{\theta}_{j})$$## 8

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{eqnarray} \eta (X;\theta _k ) &=& \eta \left(X;u_{k,} \sum\nolimits_K \right)\nonumber\\ &=& \frac{1}{{(2\pi )^{D/2} \big|\!\sum\nolimits_K \!\big|^{1/2} }}e^{ - \frac{1}{2}(X - \mu _K )^T \sum\nolimits_K^{ - 1} {X - \mu _K } } \end{eqnarray}\end{document} $$\begin{array}{ccc}\hfill \eta (X;{\theta}_{k})& =& \eta \left(X;{u}_{k,}{\sum}_{K}\right)\hfill \\ & =& \frac{1}{{\left(2\pi \right)}^{D/2}|{\sum}_{K}{|}^{1/2}}{e}^{-\frac{1}{2}{(X-{\mu}_{K})}^{T}{\sum}_{K}^{-1}X-{\mu}_{K}}\hfill \end{array}$$The *K* distributions are ordered based on the fitness value
[TeX:]
$w_k /\sigma _k$
${w}_{k}/{\sigma}_{k}$
and the first *B* distributions are used as a model of the background of the scene where *B* is estimated as

## 9

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{equation} B = \mathop {\arg \min }\limits_b \left(\sum\limits_{j = 1}^b {w_j > T} \right) \end{equation}\end{document} $$B=\underset{b}{\mathrm{arg}\mathrm{min}}\left(\sum _{j=1}^{b}{w}_{j}>T\right)$$*T*is the minimum fraction of the background model. Under this method, a pixel will be detected as a foreground pixel if it is more than 2.5 standard deviations away from any of the

*B*distributions.

## 2.5.

### Final Fusion and Enhancement

After getting the weighting based fusion background image and the resulting motion detection video frames, we will perform the final video enhancement by a combination of illumination and based on moving objects region fusion. The proposed combination of illumination and region fusion mathematic equation is as follows:

## 10

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{eqnarray} F_L (x,y) &=& \beta M(x,y) + \gamma N_L (x,y)\nonumber\\ && +\, (1 - \gamma )B_L (x,y) \end{eqnarray}\end{document} $$\begin{array}{ccc}\hfill {F}_{L}(x,y)& =& \beta M(x,y)+\gamma {N}_{L}(x,y)\hfill \\ & & +\phantom{\rule{0.16em}{0ex}}(1-\gamma ){B}_{L}(x,y)\hfill \end{array}$$*M*(

*x*,

*y*) is motion-detection video frame, [TeX:] $N_L (x,y)$ ${N}_{L}(x,y)$ is the night-time illumination image and [TeX:] $B_L (x,y)$ ${B}_{L}(x,y)$ is enhanced background illumination as in Eqn. 5. β and γ are the weights for these input images, respectively. In our algorithm, β and γ are determined in the following way.

## 11

[TeX:] \documentclass[12pt]{minimal}\begin{document}\begin{eqnarray} \!\!\left\{ \begin{array}{@{}l} \beta\,{=}\,{\rm 1\ and\ }\gamma\,{=}\,{\rm 1\; if\; } M(x,y) + N_L (x,y) + B_L (x,y) \ge 1{\rm } \\[6pt] \beta\,{=}\,{\rm 0\ and\ }\gamma\,{=}\,{\rm K\; if\; } M(x,y) + N_L (x,y) + B_L (x,y) < 1{\rm } \\ \end{array}\!\! \right.\nonumber\\ \end{eqnarray}\end{document} $$\begin{array}{c}\hfill \left\{\begin{array}{c}\beta \phantom{\rule{0.16em}{0ex}}=\phantom{\rule{0.16em}{0ex}}1\phantom{\rule{4pt}{0ex}}\mathrm{and}\phantom{\rule{4pt}{0ex}}\gamma \phantom{\rule{0.16em}{0ex}}=\phantom{\rule{0.16em}{0ex}}1\phantom{\rule{0.28em}{0ex}}\mathrm{if}\phantom{\rule{0.28em}{0ex}}M(x,y)+{N}_{L}(x,y)+{B}_{L}(x,y)\ge 1\hfill \\ \beta \phantom{\rule{0.16em}{0ex}}=\phantom{\rule{0.16em}{0ex}}0\phantom{\rule{4pt}{0ex}}\mathrm{and}\phantom{\rule{4pt}{0ex}}\gamma \phantom{\rule{0.16em}{0ex}}=\phantom{\rule{0.16em}{0ex}}\mathrm{K}\phantom{\rule{0.28em}{0ex}}\mathrm{if}\phantom{\rule{0.28em}{0ex}}M(x,y)+{N}_{L}(x,y)+{B}_{L}(x,y)<1\hfill \end{array}\right.\end{array}$$*K*= 0.4. On the other hand, if [TeX:] $M(x,y) + N_L (x,y) + B_L (x,y) \ge 1$ $M(x,y)+{N}_{L}(x,y)+{B}_{L}(x,y)\ge 1$ , it is assumed that there are moving objects in the current pixel. And in this case, both β and γ are set to be 1. Furthermore, in order to avoid [TeX:] $F_L (x,y)$ ${F}_{L}(x,y)$ exceeds the illumination range of [0,255], a pixel will be set to be 255 if its value exceeds 255.

## 3.

## Final Experimental Results and Conclusions

In this paper, we propose a novel algorithm to enhance night-time video. We focus on addressing the following two key issues for video enhancement. (1) illumination-based fusion for enhancement background image. (2) Moving objects of region fusion for improving sharpness of the moving objects. Figures 2 and 3 show the experimental results. The result demonstrates that the proposed algorithm can use the color resources (i.e. color levels) more efficiently and it is robust and effective.

## Acknowledgments

This work is partly supported by National High-Tech Program 863 of China (Grant No. 2007AA010407 and 2009GZ0017), National Research Program of China (Grant No. 9140A06060208DZ0207), National Science Foundation of China ( 61001146), and China Scholarships Council.