Multisensor image fusion is the process of combining two or more images of a scene to create a single image that is more informative than any of the input images.1 Image-fusion technology is employed in numerous applications including visual interpretation, image drawing, geographical information gathering, and military target reconnaissance and surveillance. In particular, research into techniques for image fusion by contrast reversal in local image regions has important theoretical and practical significance.1
Image-fusion methods are classified as spatial- or transform-domain techniques. Spatial-domain methods are simple, but generally result in images with insufficient detail. Transform-domain strategies based on image-fusion arithmetic and wavelet transformations (WTs) represent the current state of the art. Wavelets can be used to resolve an original image into a series of subimages with different spatial resolutions and frequency-domain characteristics. This representation fully reflects local variations in the original image. In addition, WTs can affect multiresolution analysis,2,3 perfect refactoring, as well as orthogonal features.4 Image-fusion arithmetic based on WT coefficients can flexibly resolve multidimensional low-frequency and high-frequency image components. Wavelet transforms can also realize multisensor image fusion using rules that emphasize critical features of the scene.5,6
Traditional convolution-based WT methods for multiresolution analysis have been widely applied to image fusion for images with a large number of pixels, but the memory and the computational requirements for these techniques, and their Fourier-domain equivalents, can be substantial. Attempts to create more efficient algorithms in the transform domain have employed the lifting wavelet transform (LWT).78.–9 Also known as the second-generation WT,10 the LWT is not dependent upon the Fourier transform. Rather, all operations are carried out in the spatial domain. Image reconstruction is achieved by simply adjusting the calculation and sign orders in the decomposition process,11 thereby reducing two-dimensional image data computation by half, and the data storage to about 75%.
One important motivation for the use of WTs in image processing is their ability to segregate low-frequency content that is critical for interpretation. Traditional image-fusion methods are based on selecting these significant wavelet decomposition coefficients.1213.–14 Even with the effective separation and processing of low-frequency components afforded by WT decomposition, such an approach fails to take into full account the relationships among multiple input images. The result can be adverse fusion effects. Significant information can be lost when local area variance corresponding to pixels across images is small.8,9
Other algorithms use principal component analysis (PCA) to estimate the wavelet coefficients. This method works well in low-noise environments, but PCA breaks down when corruption is severe, even if only very few of the observations are affected.15 For example, consider the two PCA simulation results shown in Fig. 1. Suppose that the light line in Fig. 1(a) represents an object in an image, and that the “” markers represent samples of that object that have been corrupted by low-level Gaussian noise. The reconstruction of the object from the samples using the classical PCA approach is shown as a heavy line. The results of a similar experiment are shown in Fig. 1(b) where the PCA reconstruction is seriously in error as the result of a single noise outlier in the sampling process.
To remedy shortcomings in the current methods, this paper presents an improved image-fusion algorithm based on the LWT. For low-frequency image components represented in the LWT decomposition, scale coefficients are determined through matrix completion16 instead of PCA. For the high-frequency detail and edge information, the LWT coefficients are chosen through self-adaptive regional variance estimation.
Matrix Completion and Robust Principal Component Analysis
The matrix completion problem has been the subject of intense research in recent years. Candés et al.17 verify that the -norm optimization problem is equal to -norm optimization under a restricted isometry property. Candés and Recht16 demonstrate exact matrix completion using convex optimization. The “nuclear norm” of the matrix ,16 prove that if the number, , of sampled entries obeys
Lin and Ma15 report a fast, scalable algorithm for solving the robust PCA (RPCA) problem. The method is based on recovering a low-rank matrix with an unknown fraction of corrupted entries. The mathematical model for estimating the low-dimensional subspace is to find a low-rank matrix. The algorithm proceeds as follows: given a matrix with , the rank is the target dimension of the subspace. The observation matrix is modeled as
The objective of matrix completion is to recover in the low-dimensional subspace the truly low-rank matrix from , under the working assumption that is zero. That is, we seek16 Further, the recovery is robust to noise with small magnitude bounds; that is, when the elements of are small and bounded. For example, if is a white noise matrix with standard deviation , and Frobenius norm , then the recovered will be in a small neighborhood of with high probability if .18
Robust Principal Component Analysis
Conventional PCA is often used to estimate a low-dimensional subspace via the following constrained optimization problem: In the observation model Eq. (5), minimize the difference in the matrices and by solving
RPCA employs an identity operator and sparse matrix which differ from those in the matrix completion and PCA approach. Wright et al.19 and Candés et al.20 have shown that, for a sufficiently sparse error matrix, a low-rank matrix can be recovered exactly from the observation matrix by solving the following convex optimization problem:21,22
In the present paper, RPCA is coupled with the “inexact augmented Lagrange multiplier” (IALM)15 method to determine the low-frequency LWT coefficients for fusion of corrupted images. The IALM method is described in Sec. 3.2 after introducing the general procedure.
Frequency-Domain Fusion Rules
By adopting separate fusion strategies for high- and low-frequency components, the WT can differentially preserve the critical features that accompany these separate bands. The procedure that exploits this property is shown in Fig. 2. The source images are converted to frequency-domain coefficients by the LWT. Frequency-band-dependent fusion rules are applied to the low- and high-frequency components of each image. The inverse lifting wavelet transform (ILWT) is used to reconstruct the fused image.
Low-Frequency Fusion Based on Inexact Augmented Lagrange Multiplier
Weighted average coefficients are often employed to fuse low-frequency wavelet coefficients. This method is effective when the coefficients of the fused images are similar. However, when contrast reversal occurs in local regions of an image, this procedure results in a loss of image detail in the fused image due to reduced contrast. Further, erroneous or missing regions of corrupted images strongly affect PCA results. These inadequacies of the weighted average method and PCA provide the motivation for using RPCA to determine the weighting of low-frequency coefficients.
There is ordinarily little difference in the low-frequency coefficient values extracted by the LWT from different images of the same scene. RPCA coefficients are used to represent low-frequency content in an attempt to preserve fidelity and coherency between the subbands. Algorithms have been developed in this research to solve the RPCA problem that is the basis for the recovery of the low-rank matrix and the estimation of the sparse matrix from the observation matrix . We employ the IALM method to compute the low-frequency subband coefficients. The method is sketched as follows.
Let denote a set of corrupted images from sensors, and let be the corresponding set of low-frequency subimages computed using the LWT. is the number of LWT layers. For simplicity, we assume square images so that . Stack all columns of each into a single vector of dimension , then use these vectors as columns of a matrix . After normalizing the data, we denote by the element of ,
A flowchart of the IALM algorithm is shown in Fig. 3. Definitions of the notation used in the flowchart appear in Table 1. The algorithm is recursive with superscript indicating the iteration number. The quantity is the recovered low-rank matrix for some sufficiently large , say . A reasonable strategy for transforming the resulting to the final low-frequency subimage is to unwrap its first column to form the original image structure. The final low-frequency subimage is denoted .
Notation used in the IALM∂ algorithm.
|Low-frequency subimage observation matrix|
|Error (sparse) matrix, iteration|
|Recovered low-rank subimage matrix, iteration|
|Lagrange multiplier matrix, iteration|
|Mean-squared-error tolerance bound|
|Singular value decomposition (SVD) of general matrix|
|and||Customary notation for orthogonal matrices of SVD|
|Customary notation for diagonal matrix of singular values|
|Soft-shrinkage operator applied to scalar 15|
In this process, is initialized to ; and is initialized to zero matrix as the same size of ; is initialized to where is the column size of ; tolerance for stopping criterion is initialized to ; and is set to zero for loop computation.
High-Frequency Fusion Based on Self-Adapting Regional Variance Estimation
Processing of high-frequency wavelet coefficients has a direct effect on salient details which affect the overall clarity of the image. As the variance of a subimage characterizes the degree of gray level change in a corresponding image region, the variance is a key indicator in processing of high-frequency components. In addition, there is generally a strong correlation among adjacent pixels in a local area, so that there is significant amount of shared information among neighboring pixels. When variances in corresponding local regions across subimages vary widely, a high-frequency fusion rule for selecting the source image of greatest variance has been shown to be effective at preserving image features.8,9 However, if the local variances of two source images are similar, this method can result in the loss of information by discarding subtle variations among different subimages. An empirical procedure has been developed in which a thresholding procedure is used to segregate local areas that have sufficiently large variance. This allows the entire set to be represented by the single maximum-variance set member. The selection of this difference threshold, , is discussed below.
Let us return to the original set of images . Denote by the gray-scale value at pixel in the ’th image. Also let denote a matrix associated with image in which matrix element contains the normalized sample variance of the window of pixels centered on pixel . The normalized sample variance means that all variance values are in the interval [0,1]. Without loss of generality, we select images and with which to describe the steps of the high-frequency fusion algorithm:
1. Compute normalized sample variance matrices and . Then denotes the normalized variance value of pixel in image for , 2.
2. Implement the LWT over layers against , , , and . Multiresolution structures for each matrix are obtained: , , , and , in which the superscript takes one of three designators of direction—horizontal (), vertical () or diagonal ()—associated with structure matrix
Let denote the sum of the differences in the horizontal, vertical, and diagonal directions
3. Compare the threshold value and . If take the pixel value with bigger variance as the wavelet coefficient after fusion; otherwise use a weighted sum to compute the wavelet coefficient, is the multiresolution structure after fusion, namely
In this study, the value of is set to 0.8. This means that when the normalized variance of the pixel in one image is much greater than another, the source image of greater variance is selected. Otherwise, the coefficient is obtained by averaging as in Eq. (13). This fusion rule for high-frequency subimages not only results in the retention of details, but it also prevents the loss of image information caused by redundant data. It ensures the consistency of the fused image.
In summary, IALM is used to determine the low-frequency component to be fused, and self-adapting regional variance is employed to estimate the high-frequency contribution. The fused wavelet coefficients are combined by ILWT to create the final result.
Experimental Results and Analysis
Comparison of Robust Principal Component Analysis Algorithms
To validate the new procedure, four groups of experiments results are reported. The objective of the first is to compare the performance of RPCA algorithms with that of IALM. The results are shown in Table 2. Two mainstream algorithms are compared—singular value thresholding (SVT), accelerated proximal gradient with IALM.
Comparison of RPCA algorithms.
In this table, the input dataset named observation matrix of Eq. (6) is of dimension . It has some random missing or broken pixels. For fair comparison, we set , the rank of , to , and define the normalized mean squared error (NMSE) as
In Table 2, the column labeled #SVD indicates the number of iterations. The “times” column displays the number of seconds to run the algorithm. The oversampling rate is six, implying downsampling of the data appearing in the observation matrix, in which, indicates the number of degrees of freedom in the rank matrices: . elements from are then sampled uniformly to form the known samples in .16
Among the three algorithms, IALM exhibits superiority performance in all three measures. The results indicate that time increases proportionately with . Note, however, that #SVD is not dependent upon .
Fusion of Clean Images
For convenience, we will refer to the new algorithm as . The next two groups of experiments involve processing of left-focus–right-focus images and visible-light–infrared-light images, comparing different image-fusion algorithms with . The source images are not corrupted by noise or errors. The spline wavelet basis23 was selected for the LWT process. Through factorization, the equivalent lifting wavelet was obtained. The experimental results are shown in Figs. 4 and 5.
The first group of source images involves those with eccentric focus, the second contains images of visible contrasting and infrared light. Fig. 4(a) shows a left-focused source image, whereas Fig. 4(b) is right-focused; Fig. 5(a) is a visible-light source image, while Fig. 5(b) uses an infrared source; in Figs. 4(c)–4(f) and 5(c)–5(f) are, respectively, the fusion results by the weighted average over low frequencies and the absolute value maximum method over high frequencies (WA_AM), weighted average over low frequencies and the local area maximum method over high frequencies (WA_AM), improved pulse-coupled neural networks (PCNN) method,24,25 and PCA-weighted over low frequencies, the self-adaptive regional variance estimation method over high frequencies (), and the algorithm developed in this paper ().
The processed images empirically suggest that a clearer fused image is obtained through (). More detailed information is evident, e.g., in Figs. 4(e) and 4(f) in which the image information on the left edge of the large alarm clock is apparently richer than the same feature in the other three fused images. This also means that algorithm is equally effective to algorithm , even though the algorithm has more detailed information (Table 2). Furthermore, the new algorithm achieves a fusion result with finer detail. For example, the barbed wire in Fig. 5(d) is more clearly visible than the same feature in (c). In Fig. 5, the person in 5(c) is better defined than in 5(d), while in 5(e) and 5(f), both the barbed wire and the person, and even the smoke in the upper-right corner of the image, are easier to identify than in the others. This enhanced clarity admits more effective subsequent processing.
The following objective criteria were evaluated:
1. The “mutual information” (MI) is a measure of statistical dependence that can be interpreted as the amount of information transmitted from the source images to the fused image.26 To assess the MI between source image and the fused image, say , we use the estimator
2. The “average gradient” (AG), or “clarity,” reflects the preservation of gray level changes in the image. With dimensions , larger values of AG imply greater clarity and edge preservation. Gray-level differentials are important, e.g., in texture rendering. The AG is defined as
3. The “correlation coefficient” (CC) is used to compare two images of the same object (or scene). CC, which measures the correlation (degree of linear coherence) between the original and the fused images, is defined as
4. The “degree of distortion” (DD), a direct indicator of image fidelity, is defined as
5. The metric quantifies the amount of edge information transferred from two source images and to a fused image .26 It is calculated as
6. The “peak signal-to-noise ratio” (PSNR) is an expression for the ratio between the maximum possible power of a signal and the power of distorting noise that affects the quality of its representation. This objective metric is used to compare the effectiveness of algorithms by measuring the proximity of the fused image and the original image. The PSNR is computed as
Experimental objective evaluation measures of Fig. 4.
Evaluation comparison of Fig. 5.
Relative to the other algorithms, obtains the largest MI and AG for the fused images, suggesting that this algorithm can provide fused images with higher information content and better clarity. The objective indicators of fidelity to the source image also favor the IALM and self-adaptive regional variance estimation algorithm performance.
Fusion of Corrupted Images
To assess whether is robust to missing data and image corruption, we continue to use clean, multifocus clock images for processing. At a 0.15 error rate, 15% of the pixels of the original image are corrupted, and an additional 15% are missing (gray-level values set to zero). This implies an effective data corruption rate or 30%. The results of the test of the four algorithms are shown in Fig. 6. Figures 6(a) and 6(b) show, respectively, Fig. 4(a) with errors and Fig. 4(b) with errors. Figure 6(c) shows the result of using without a denoising filter, while Fig. 6(d), labeled , shows the result of using with an adaptive median filter. The result of using PCNN with an adaptive median filter is labeled and appears in Fig. 6(e). To achieve this outcome, we use the adaptive median filtering strategy proposed by Chen and Wu27 to identify pixels corrupted by impulsive noise and replace each damaged pixel by the median of its neighborhood. The adaptive median filter can employ varying window sizes to accommodate different noise conditions and to reduce distortions like excessive thinning or thickening of object boundaries. Figure 6(f) shows results using without denoising. The clarity of result 6(f) relative to those in 6(c), 6(d), and 6(e) is quite apparent. The empirical image quality tracks the improvement in PSNR as reported in the captions. Figures 6(g) and 6(h) show 400% blow ups of portions of 6(e) and 6(f).
These results demonstrate the ability of to recover the missing or erroneous data, while preserving image detail in both corrupted and clean images.
Traditional convolution-based wavelet transform processing for image fusion has shortcomings including large memory requirements and high computational complexity. The approach to fusion taken in this research uses different fusion rules for low-frequency and high-frequency decomposition components represented on a lifting wavelet basis set. Low-frequency components are characterized by the matrix completion and RPCA methods: IALM, whereas the high-frequency components critical for image details are represented by taking into account the variance differences among proximal neighborhoods. Furthermore, strong correlation between pixels in a local area is captured by a self-adaptive regional variance assessment.
Experimental results show that the new algorithm not only improves the amount of information and the correlation between the fused and source images, but also reduces the level of distortion. Significant clarity improvement relative to state-of-the-art methods is also demonstrated for corrupted images.
This research was supported in part by the National Natural Science Foundation of China (Grant No. 30970780) and by the General Program of Science and Technology Development Project of Beijing Municipal Education Commission of China (Grant No. KM201110005033). J.D. and D.B. efforts were supported in part by the U.S. National Science Foundation under Cooperative Agreement DBI-0939454. Any opinions, conclusions, or recommendations expressed are those of the authors and do not necessarily reflect the views of the NSF. This work was undertaken in part while Z.W. was a visiting research scholar at the Michigan State University. The authors thank the Beijing University of Technology’s Multimedia Information Processing Lab for assistance.
Zhuozheng Wang is an associate professor at Beijing University of Technology and a visiting scholar at Michigan State University sponsored by the China Scholarship Council. He received his MS and PhD degrees in electronic engineering from Beijing University of Technology in 2005 and 2013. He is the first author of more than 10 academic papers and has written one book chapter. His current research interests include image processing, electroencephalography, and virtual reality technology. He has been a reviewer and is a member of SPIE.
J. R. Deller Jr. is an IEEE fellow and professor of electrical and computer engineering at Michigan State University, where he received the distinguished faculty award in 2004. He received a PhD in biomedical engineering in 1979, an MS degree in electrical and computer engineering in 1976, and an MS degree in biomedical engineering in 1975 from the University of Michigan, and his BS degree in electrical engineering (summa cum laude) in 1974 from Ohio State University. His research interests include statistical signal processing with applications to speech and hearing, genomics, and other aspects of biomedicine.
Blair D. Fleet received her BS degree (summa cum laude) from Morgan State University, Baltimore, MD, in 2010, and her MS degree from Michigan State University in 2012, both in electrical engineering. She is a National Science Foundation graduate research fellowship award recipient, as well as a GEM (the National Consortium for graduate degrees for Minorities in Engineering and Science, Inc.) fellow. She is currently pursuing her PhD in electrical engineering at Michigan State University. Her research interests include merging signal/image processing with the evolutionary computation fields to solve challenging engineering processing problems, especially in the biomedical domain.