Pixel-level multisensor image fusion based on matrix completion and robust principal component analysis

Abstract. Acquired digital images are often corrupted by a lack of camera focus, faulty illumination, or missing data. An algorithm is presented for fusion of multiple corrupted images of a scene using the lifting wavelet transform. The method employs adaptive fusion arithmetic based on matrix completion and self-adaptive regional variance estimation. Characteristics of the wavelet coefficients are used to adaptively select fusion rules. Robust principal component analysis is applied to low-frequency image components, and regional variance estimation is applied to high-frequency components. Experiments reveal that the method is effective for multifocus, visible-light, and infrared image fusion. Compared with traditional algorithms, the new algorithm not only increases the amount of preserved information and clarity but also improves robustness.


Introduction
Multisensor image fusion is the process of combining two or more images of a scene to create a single image that is more informative than any of the input images. 1 Image-fusion technology is employed in numerous applications including visual interpretation, image drawing, geographical information gathering, and military target reconnaissance and surveillance. In particular, research into techniques for image fusion by contrast reversal in local image regions has important theoretical and practical significance. 1 Image-fusion methods are classified as spatial-or transform-domain techniques. Spatial-domain methods are simple, but generally result in images with insufficient detail. Transform-domain strategies based on image-fusion arithmetic and wavelet transformations (WTs) represent the current state of the art. Wavelets can be used to resolve an original image into a series of subimages with different spatial resolutions and frequency-domain characteristics. This representation fully reflects local variations in the original image. In addition, WTs can affect multiresolution analysis, 2,3 perfect refactoring, as well as orthogonal features. 4 Image-fusion arithmetic based on WT coefficients can flexibly resolve multidimensional low-frequency and highfrequency image components. Wavelet transforms can also realize multisensor image fusion using rules that emphasize critical features of the scene. 5,6 Traditional convolution-based WT methods for multiresolution analysis have been widely applied to image fusion for images with a large number of pixels, but the memory and the computational requirements for these techniques, and their Fourier-domain equivalents, can be substantial.
Attempts to create more efficient algorithms in the transform domain have employed the lifting wavelet transform (LWT). [7][8][9] Also known as the second-generation WT, 10 the LWT is not dependent upon the Fourier transform. Rather, all operations are carried out in the spatial domain. Image reconstruction is achieved by simply adjusting the calculation and sign orders in the decomposition process, 11 thereby reducing two-dimensional image data computation by half, and the data storage to about 75%.
One important motivation for the use of WTs in image processing is their ability to segregate low-frequency content that is critical for interpretation. Traditional image-fusion methods are based on selecting these significant wavelet decomposition coefficients. [12][13][14] Even with the effective separation and processing of low-frequency components afforded by WT decomposition, such an approach fails to take into full account the relationships among multiple input images. The result can be adverse fusion effects. Significant information can be lost when local area variance corresponding to pixels across images is small. 8,9 Other algorithms use principal component analysis (PCA) to estimate the wavelet coefficients. This method works well in low-noise environments, but PCA breaks down when corruption is severe, even if only very few of the observations are affected. 15 For example, consider the two PCA simulation results shown in Fig. 1. Suppose that the light line in Fig. 1(a) represents an object in an image, and that the "×" markers represent samples of that object that have been corrupted by low-level Gaussian noise. The reconstruction of the object from the samples using the classical PCA approach is shown as a heavy line. The results of a similar experiment are shown in Fig. 1(b) where the PCA reconstruction is seriously in error as the result of a single noise outlier in the sampling process.
To remedy shortcomings in the current methods, this paper presents an improved image-fusion algorithm based on the LWT. For low-frequency image components represented in the LWT decomposition, scale coefficients are determined through matrix completion 16 instead of PCA. For the high-frequency detail and edge information, the LWT coefficients are chosen through self-adaptive regional variance estimation.

Matrix Completion and Robust Principal
Component Analysis

Overview
The matrix completion problem has been the subject of intense research in recent years. Candés et al. 17 verify that the l 0 -norm optimization problem is equal to l 1 -norm optimization under a restricted isometry property. Candés and Recht 16 demonstrate exact matrix completion using convex optimization. The "nuclear norm" of the matrix X ∈ R N×N , E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 6 3 ; 3 3 0 kXk Ã ¼ X k σ k ðXÞ; (1) in which σ k ðXÞ denotes the k'th largest singular value, can be used to approximate the matrix rank, ρðXÞ. The method yields a convex minimization problem for which there are numerous efficient solutions. Candés and Recht 16 prove that if the number, S, of sampled entries obeys E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 6 3 ; 2 3 2 S ≥ CN 1.2 ρðXÞ log N for some positive constant C, then N × N matrix X can be perfectly recovered with probability ≈1, by solving a simple convex optimization problem. Lin and Ma 15 report a fast, scalable algorithm for solving the robust PCA (RPCA) problem. The method is based on recovering a low-rank matrix with an unknown fraction of corrupted entries. The mathematical model for estimating the low-dimensional subspace is to find a low-rank matrix. The algorithm proceeds as follows: given a matrix A ∈ R M×N with ρðAÞ ≪ minðM; NÞ, the rank is the target dimension of the subspace. The observation matrix D is modeled as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 3 2 6 ; 5 4 9 in which P Ω ð·Þ is a subsampling projection operator and E represents a matrix of unmodeled perturbations that is assumed sparse relative to A.

Matrix Completion
The objective of matrix completion is to recover in the lowdimensional subspace the truly low-rank matrix A from D, under the working assumption that E is zero. That is, we seek E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 4 ; 3 2 6 ; 4 3 5

A ¼ argmin
It has been shown that the solution to this convex relaxation represents an exact recovery of the matrix A under quite general conditions. 16 Further, the recovery is robust to noise with small magnitude bounds; that is, when the elements of E are small and bounded. For example, if E is a white noise matrix with standard deviation σ, and Frobenius norm kEk F < ϵ, then the recovered D will be in a small neighborhood of A with high probability if ϵ 2 ≤ ðM þ ffiffiffiffiffiffiffi 8M p Þσ 2 . 18

Robust Principal Component Analysis
Conventional PCA is often used to estimate a low-dimensional subspace via the following constrained optimization problem: In the observation model Eq.
where r ≪ minfM; Ng is the target dimension of the subspace, and the use of the Frobenius norm represents an assumption that the matrix elements are corrupted by additive i.i.d. Gaussian noise. PCA works well in practice as long as the magnitude of noise is small. To use PCA, the singular value decomposition (SVD) of D is used to project the columns of D onto the subspace spanned by the r principal left singular vectors of D. RPCA employs an identity operator P Ω ð·Þ and sparse matrix E which differ from those in the matrix completion and PCA approach. Wright et al. 19 and Candés et al. 20 have shown that, for a sufficiently sparse error matrix, a low-rank matrix A can be recovered exactly from the observation matrix D by solving the following convex optimization problem: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 6 ; 6 3 ; 7 1 9 A ¼ argmin where λ is a positive weighting parameter. RPCA has been used for background modeling, removing shadows from face images, alignment of the human face, and video denoising. 21,22 In the present paper, RPCA is coupled with the "inexact augmented Lagrange multiplier" (IALM) 15 method to determine the low-frequency LWT coefficients for fusion of corrupted images. The IALM method is described in Sec. 3.2 after introducing the general procedure.

Overview
By adopting separate fusion strategies for high-and low-frequency components, the WT can differentially preserve the critical features that accompany these separate bands. The procedure that exploits this property is shown in Fig. 2. The source images are converted to frequency-domain coefficients by the LWT. Frequency-band-dependent fusion rules are applied to the low-and high-frequency components of each image. The inverse lifting wavelet transform (ILWT) is used to reconstruct the fused image.

Low-Frequency Fusion Based on Inexact
Augmented Lagrange Multiplier Weighted average coefficients are often employed to fuse low-frequency wavelet coefficients. This method is effective when the coefficients of the fused images are similar. However, when contrast reversal occurs in local regions of an image, this procedure results in a loss of image detail in the fused image due to reduced contrast. Further, erroneous or missing regions of corrupted images strongly affect PCA results. These inadequacies of the weighted average method and PCA provide the motivation for using RPCA to determine the weighting of low-frequency coefficients.
There is ordinarily little difference in the low-frequency coefficient values extracted by the LWT from different images of the same scene. RPCA coefficients are used to represent low-frequency content in an attempt to preserve fidelity and coherency between the subbands. Algorithms have been developed in this research to solve the RPCA problem that is the basis for the recovery of the low-rank matrix A and the estimation of the sparse matrix E from the observation matrix D. We employ the IALM method to compute the low-frequency subband coefficients. The method is sketched as follows.
Let Γ ¼ fI k ∈ R N 1 ×N 2 g K k¼1 denote a set of corrupted images from K sensors, and letΓ ¼ fĨ be the corresponding set of low-frequency subimages computed using the LWT. L is the number of LWT layers. For simplicity, we assume square images so that Stack all N columns of eachĨ k into a single vector of dimension N 2 , then use these vectors as K columns of a matrixĨ D . After normalizing the data, we denote by i lk the ðl; kÞ element ofĨ D , E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 7 ; 3 2 6 ; 5 7 7Ĩ The cumulative low-frequency subimage matrix is modeled similarly to Eq. (3), in whichĨ A ∈ R N 2 ×K denotes the noise-free and integrated low-frequency subimage sequence matrix, andĨ E ∈ R N 2 ×K denotes the sparse error matrix from which high-frequency content has been attenuated by the selection of LWT coefficients. The low-frequency LWT coefficients are similar across multiple subimages of the same scene. According to the model,Ĩ A is noise-free and will ideally, therefore, consist of K identical columns. Accordingly,Ĩ A will be of low rank as required by the matrix completion procedure. Thus,Ĩ A can be estimated via matrix completion and RPCA by solving E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 9 ; 3 2 6 ; 3 2 5 miñ where the augmented Lagrange multiplier is E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 0 ; 3 2 6 ; 2 7 4 In this equation, λ is an estimated positive weighting parameter representing the proportion of the sparse matrixĨ E in the low-rank matrixĨ A . The default value for this fraction is 1∕N. μ is a positive tuning parameter balancing accuracy and computational effort. TrfA; Bg is the trace of the product A T B and Y is the iterated Lagrange multiplier. A flowchart of the IALM algorithm is shown in Fig. 3. Definitions of the notation used in the flowchart appear in Table 1. The algorithm is recursive with superscript j indicating the iteration number. The quantityĨ ðj 0 Þ A ∈ R N 2 ×K is the recovered low-rank matrix for some sufficiently large j, say j 0 . A reasonable strategy for transforming the resultingĨ ðj 0 Þ A to the final low-frequency subimage is to unwrap its first column to form the original N × N image structure. The final low-frequency subimage is denotedĨ ∂ .
where m is the column size ofĨ D ; tolerance for stopping criterion τ is initialized to 1 × 10 −7 ; and j is set to zero for loop computation.

3.3
High-Frequency Fusion Based on Self-Adapting Regional Variance Estimation Processing of high-frequency wavelet coefficients has a direct effect on salient details which affect the overall clarity of the image. As the variance of a subimage characterizes the degree of gray level change in a corresponding image region, the variance is a key indicator in processing of high-frequency components. In addition, there is generally a strong correlation among adjacent pixels in a local area, so that there is significant amount of shared information among neighboring pixels. When variances in corresponding local regions across subimages vary widely, a high-frequency fusion rule for selecting the source image of greatest variance has been shown to be effective at preserving image features. 8,9 However, if the local variances of two source images are similar, this method can result in the loss of information by discarding subtle variations among different subimages. An empirical procedure has been developed in which a thresholding procedure is used to segregate local areas that have sufficiently large variance. This allows the entire set to be represented by the single maximum-variance set member. The selection of this difference threshold, ξ, is discussed below.
Let us return to the original set of images Γ ¼ fI k ∈ R N 1 ×N 2 g K k¼1 . Denote by I k ðx; yÞ the gray-scale value at pixel ðx; yÞ in the k'th image. Also let V k ∈ R N 1 ×N 2 denote a matrix associated with image I k in which matrix element V k ðx; yÞ contains the normalized sample variance of the 3 × 3 window of pixels centered on pixel ðx; yÞ. The normalized sample variance means that all variance values are in the interval [0,1]. Without loss of generality, we select images I 1 and I 2 with which to describe the steps of the highfrequency fusion algorithm: 1. Compute normalized sample variance matrices V 1 and V 2 . Then V k ðx; yÞ denotes the normalized variance value of pixel ðx; yÞ in image I k for k ¼ 1, 2. 2. Implement the LWT over L ¼ 2 layers against I 1 , I 2 , V 1 , and V 2 . Multiresolution structures for each matrix are obtained: I θ 1 , I θ 2 , V θ 1 , and V θ 2 , in which the superscript θ takes one of three designators of directionhorizontal (h), vertical (v) or diagonal (d)-associated with structure matrix E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 1 ; 3 2 6 ; 1 2 7 Let Δ V ðx; yÞ denote the sum of the differences in the horizontal, vertical, and diagonal directions x >ε x þ ε; x < −ε 0; otherwise in which V θ k ðx; yÞ indicates the normalized variance of the k'th image in direction θ.
3. Compare the threshold value and jΔ V ðx; yÞj. If jΔ V ðx; yÞj ≥ ξ take the pixel value with bigger variance as the wavelet coefficient after fusion; otherwise use a weighted sum to compute the wavelet coefficient, D θ F is the multiresolution structure after fusion, namely E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 3 ; 6 3 ; 6 In this study, the value of ξ is set to 0.8. This means that when the normalized variance of the pixel ðx; yÞ in one image is much greater than another, the source image of greater variance is selected. Otherwise, the coefficient is obtained by averaging as in Eq. (13). This fusion rule for high-frequency subimages not only results in the retention of details, but it also prevents the loss of image information caused by redundant data. It ensures the consistency of the fused image.
In summary, IALM is used to determine the low-frequency component to be fused, and self-adapting regional variance is employed to estimate the high-frequency contribution. The fused wavelet coefficients are combined by ILWT to create the final result.

Comparison of Robust Principal Component
Analysis Algorithms To validate the new procedure, four groups of experiments results are reported. The objective of the first is to compare the performance of RPCA algorithms with that of IALM. The results are shown in Table 2. Two mainstream algorithms are compared-singular value thresholding (SVT), accelerated proximal gradient with IALM.
In this table, the input dataset named observation matrix D of Eq. (6) is of dimension N × N. It has some random missing or broken pixels. For fair comparison, we set r, the rank of A, to 0.05 N, and define the normalized mean squared error (NMSE) as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 4 ; 6 3 ; 2 5 In Table 2, the column labeled #SVD indicates the number of iterations. The "times" column displays the number of seconds to run the algorithm. The oversampling rate ðp∕d r Þ is six, implying ∼60% downsampling of the data appearing in the observation matrix, in which, d r indicates the number of degrees of freedom in the rank r matrices: d r ¼ rð2N − rÞ. p elements from A are then sampled uniformly to form the known samples in D. 16 Among the three algorithms, IALM exhibits superiority performance in all three measures. The results indicate that time increases proportionately with N 2 . Note, however, that #SVD is not dependent upon N.

Fusion of Clean Images
For convenience, we will refer to the new algorithm as IALM ∂ . The next two groups of experiments involve processing of left-focus-right-focus images and visiblelight-infrared-light images, comparing different image-fusion algorithms with IALM ∂ . The source images are not corrupted by noise or errors. The spline 5∕3 wavelet basis 23 was The first group of source images involves those with eccentric focus, the second contains images of visible contrasting and infrared light. Fig. 4(a) shows a left-focused source image, whereas Fig. 4(b) is right-focused; Fig. 5(a) is a visible-light source image, while Fig. 5(b) uses an infrared source; in Figs. 4(c)-4(f) and 5(c)-5(f) are, respectively, the fusion results by the weighted average over low frequencies and the absolute value maximum method over high frequencies (WA_AM), weighted average over low frequencies and the local area maximum method over high frequencies (WA_AM), improved pulse-coupled neural networks (PCNN) method, 24,25 and PCA-weighted over low frequencies, the self-adaptive regional variance estimation method over high frequencies (PCA ∂ ), and the algorithm developed in this paper (IALM ∂ ).
The processed images empirically suggest that a clearer fused image is obtained through (IALM ∂ ). More detailed information is evident, e.g., in Figs. 4(e) and 4(f) in which the image information on the left edge of the large alarm clock is apparently richer than the same feature in the other three fused images. This also means that algorithm IALM ∂ is equally effective to algorithm PCA ∂ , even though the algorithm IALM ∂ has more detailed information (Table 2). Furthermore, the new algorithm achieves a fusion result with finer detail. For example, the barbed wire in Fig. 5(d) is more clearly visible than the same feature in (c). In Fig. 5, the person in 5(c) is better defined than in 5(d), while in 5(e) and 5(f), both the barbed wire and the person, and even the smoke in the upper-right corner of the image, are easier  to identify than in the others. This enhanced clarity admits more effective subsequent processing.
The following objective criteria were evaluated: 1. The "mutual information" (MI) is a measure of statistical dependence that can be interpreted as the amount of information transmitted from the source images to the fused image. 26 To assess the MI between source image I 1 and the fused image, say I F , we use the estimator E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 5 ; 6 3 ; 6 4 4 M 1;F ¼ where h 1 ðl 1 Þ and h F ðl F Þ represent the normalized histogram of source image I 1 and fused image I F , respectively. l 1 and l F each take integers indicating one of 2 8 gray levels f0;1; : : : ; 255g. where ΔI x ðx;yÞ ¼ Iðx þ 1;yÞ − Iðx;yÞ and ΔI y ðx;yÞ ¼ Iðx;y þ 1Þ − Iðx;yÞ are the gray value differentials in the coordinate x and y directions, respectively. 3. The "correlation coefficient" (CC) is used to compare two images of the same object (or scene). CC, which measures the correlation (degree of linear coherence) between the original and the fused images, is defined as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 8 ; 6 3 ; 2 3 7 C F;1 ¼ ; (18) where I F ðx; yÞ and I 1 ðx; yÞ are the gray levels at pixel ðx; yÞ in the fused and original images, andĪ F andĪ 1 denote the average gray levels in the two images. 4. The "degree of distortion" (DD), a direct indicator of image fidelity, is defined as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 9 ; 6 3 ; 1 2 0 D F;1 ¼ jI F ðx; yÞ − I 1 ðx; yÞj; (19) in which I F ðx; yÞ and I 1 ðx; yÞ are as defined above.
5. The Q AB∕F metric quantifies the amount of edge information transferred from two source images I A and I B to a fused image I F . 26 It is calculated as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 2 0 ; 3 2 6 ; 7 1 9 where each image is of size N 1 × N 2 . α and β represent, respectively, the edge strength and orientation. Q AF ðx; yÞ is the product of Q AF α ðx; yÞ and Q AF β ðx; yÞ which represent, respectively, how well the edge strength and orientation values of a pixel are represented in the fused image I F . Similarly, Q BF ðx; yÞ is computed as the product of Q BF α ðx; yÞ and Q BF β ðx; yÞ which represent, respectively, how well the edge strength and orientation values of a pixel ðx; yÞ in I 2 are represented in the fused image I F . w A ðx; yÞ and w B ðx; yÞ, respectively, denote the proportion of Q AF ðx; yÞ and Q BF ðx; yÞ, which reflect the importance of Q AF ðx; yÞ and Q BF ðx; yÞ. The dynamic range of Q AB∕F is between [0 1], and it should be as close to 1 as possible. 6. The "peak signal-to-noise ratio" (PSNR) is an expression for the ratio between the maximum possible power of a signal and the power of distorting noise that affects the quality of its representation. This objective metric is used to compare the effectiveness of algorithms by measuring the proximity of the fused image and the original image. The PSNR is computed as where RMSE denotes the root mean square error between the reference and fused images. L ¼ 256 is the number of gray levels used in representing an image. A larger PSNR value indicates a better fusion result. Tables 3 and 4 report the objective performance evaluation measures for the four fusion algorithms. Relative to the other algorithms, IALM ∂ obtains the largest MI and AG for the fused images, suggesting that this algorithm can provide fused images with higher information content and better clarity. The objective indicators of fidelity to the source image also favor the IALM and self-adaptive regional variance estimation algorithm performance.

Fusion of Corrupted Images
To assess whether IALM ∂ is robust to missing data and image corruption, we continue to use clean, multifocus clock images for processing. At a 0.15 error rate, 15% of the pixels of the original image are corrupted, and an additional 15% are missing (gray-level values set to zero). This implies an effective data corruption rate or 30%. The results of the test of the four algorithms are shown in Fig. 6. Figures 6(a) and 6(b) show, respectively, Fig. 4(a) with errors and Fig. 4(b) with errors. Figure 6(c) shows the result of using PCA ∂ without a denoising filter, while Fig. 6(d), labeled PCA ∂;F , shows the result of using PCA ∂ with an adaptive median filter. The result of using PCNN with an adaptive median filter is labeled PCNN F and appears in Fig. 6(e). To achieve this outcome, we use the adaptive median filtering strategy proposed by Chen and Wu 27 to identify pixels corrupted by impulsive noise and replace each damaged pixel by the median of its neighborhood. The adaptive median filter can employ varying window sizes to accommodate different noise conditions and to reduce distortions like excessive thinning or thickening of object boundaries. Figure 6(f) shows results using IALM ∂ without denoising. The clarity of result 6(f) relative to those in 6(c), 6(d), and 6(e) is quite apparent. The empirical image quality tracks the improvement in PSNR as reported in the captions. Figures 6(g) and 6(h) show 400% blow ups of portions of 6(e) and 6(f).
These results demonstrate the ability of IALM ∂ to recover the missing or erroneous data, while preserving image detail in both corrupted and clean images.  (g) zoom out 400% of (e); and (h) zoom out 400% of (f).

Conclusions
Traditional convolution-based wavelet transform processing for image fusion has shortcomings including large memory requirements and high computational complexity. The approach to fusion taken in this research uses different fusion rules for low-frequency and high-frequency decomposition components represented on a lifting wavelet basis set. Low-frequency components are characterized by the matrix completion and RPCA methods: IALM, whereas the highfrequency components critical for image details are represented by taking into account the variance differences among proximal neighborhoods. Furthermore, strong correlation between pixels in a local area is captured by a selfadaptive regional variance assessment. Experimental results show that the new algorithm not only improves the amount of information and the correlation between the fused and source images, but also reduces the level of distortion. Significant clarity improvement relative to state-of-the-art methods is also demonstrated for corrupted images.