## 1.

## Introduction

Multisensor image fusion is the process of combining two or more images of a scene to create a single image that is more informative than any of the input images.^{1} Image-fusion technology is employed in numerous applications including visual interpretation, image drawing, geographical information gathering, and military target reconnaissance and surveillance. In particular, research into techniques for image fusion by contrast reversal in local image regions has important theoretical and practical significance.^{1}

Image-fusion methods are classified as spatial- or transform-domain techniques. Spatial-domain methods are simple, but generally result in images with insufficient detail. Transform-domain strategies based on image-fusion arithmetic and wavelet transformations (WTs) represent the current state of the art. Wavelets can be used to resolve an original image into a series of subimages with different spatial resolutions and frequency-domain characteristics. This representation fully reflects local variations in the original image. In addition, WTs can affect multiresolution analysis,^{2}^{,}^{3} perfect refactoring, as well as orthogonal features.^{4} Image-fusion arithmetic based on WT coefficients can flexibly resolve multidimensional low-frequency and high-frequency image components. Wavelet transforms can also realize multisensor image fusion using rules that emphasize critical features of the scene.^{5}^{,}^{6}

Traditional convolution-based WT methods for multiresolution analysis have been widely applied to image fusion for images with a large number of pixels, but the memory and the computational requirements for these techniques, and their Fourier-domain equivalents, can be substantial. Attempts to create more efficient algorithms in the transform domain have employed the lifting wavelet transform (LWT).^{7}8.^{–}^{9} Also known as the second-generation WT,^{10} the LWT is not dependent upon the Fourier transform. Rather, all operations are carried out in the spatial domain. Image reconstruction is achieved by simply adjusting the calculation and sign orders in the decomposition process,^{11} thereby reducing two-dimensional image data computation by half, and the data storage to about 75%.

One important motivation for the use of WTs in image processing is their ability to segregate low-frequency content that is critical for interpretation. Traditional image-fusion methods are based on selecting these significant wavelet decomposition coefficients.^{12}13.^{–}^{14} Even with the effective separation and processing of low-frequency components afforded by WT decomposition, such an approach fails to take into full account the relationships among multiple input images. The result can be adverse fusion effects. Significant information can be lost when local area variance corresponding to pixels across images is small.^{8}^{,}^{9}

Other algorithms use principal component analysis (PCA) to estimate the wavelet coefficients. This method works well in low-noise environments, but PCA breaks down when corruption is severe, even if only very few of the observations are affected.^{15} For example, consider the two PCA simulation results shown in Fig. 1. Suppose that the light line in Fig. 1(a) represents an object in an image, and that the “$\times $” markers represent samples of that object that have been corrupted by low-level Gaussian noise. The reconstruction of the object from the samples using the classical PCA approach is shown as a heavy line. The results of a similar experiment are shown in Fig. 1(b) where the PCA reconstruction is seriously in error as the result of a single noise outlier in the sampling process.

To remedy shortcomings in the current methods, this paper presents an improved image-fusion algorithm based on the LWT. For low-frequency image components represented in the LWT decomposition, scale coefficients are determined through matrix completion^{16} instead of PCA. For the high-frequency detail and edge information, the LWT coefficients are chosen through self-adaptive regional variance estimation.

## 2.

## Matrix Completion and Robust Principal Component Analysis

## 2.1.

### Overview

The matrix completion problem has been the subject of intense research in recent years. Candés et al.^{17} verify that the ${\ell}_{0}$-norm optimization problem is equal to ${\ell}_{1}$-norm optimization under a restricted isometry property. Candés and Recht^{16} demonstrate exact matrix completion using convex optimization. The “nuclear norm” of the matrix $\mathbf{X}\in {\mathbb{R}}^{N\times N}$,

^{16}prove that if the number, $S$, of sampled entries obeys for some positive constant $C$, then $N\times N$ matrix $\mathbf{X}$ can be perfectly recovered with probability $\approx 1$, by solving a simple convex optimization problem.

Lin and Ma^{15} report a fast, scalable algorithm for solving the robust PCA (RPCA) problem. The method is based on recovering a low-rank matrix with an unknown fraction of corrupted entries. The mathematical model for estimating the low-dimensional subspace is to find a low-rank matrix. The algorithm proceeds as follows: given a matrix $\mathbf{A}\in {\mathbb{R}}^{M\times N}$ with $\rho (\mathbf{A})\ll \mathrm{min}(M,N)$, the rank is the target dimension of the subspace. The observation matrix $\mathbf{D}$ is modeled as

## 2.2.

### Matrix Completion

The objective of matrix completion is to recover in the low-dimensional subspace the truly low-rank matrix $\mathbf{A}$ from $\mathbf{D}$, under the working assumption that $\mathbf{E}$ is zero. That is, we seek

## (4)

$$\mathbf{A}=\underset{{\mathbf{A}}^{\prime}\in {\mathbb{R}}^{N\times M}}{\mathrm{argmin}}{\Vert {\mathbf{A}}^{\prime}\Vert}_{*},\phantom{\rule[-0.0ex]{1em}{0.0ex}}\text{subject to}\text{\hspace{0.17em}\hspace{0.17em}}{P}_{\mathrm{\Omega}}(\mathbf{A})=\mathbf{D}.$$^{16}Further, the recovery is robust to noise with small magnitude bounds; that is, when the elements of $\mathbf{E}$ are small and bounded. For example, if $\mathbf{E}$ is a white noise matrix with standard deviation $\sigma $, and Frobenius norm ${\Vert \mathbf{E}\Vert}_{\mathcal{F}}<\epsilon $, then the recovered $\mathbf{D}$ will be in a small neighborhood of $\mathbf{A}$ with high probability if ${\epsilon}^{2}\le (M+\sqrt{8M}){\sigma}^{2}$.

^{18}

## 2.3.

### Robust Principal Component Analysis

Conventional PCA is often used to estimate a low-dimensional subspace via the following constrained optimization problem: In the observation model Eq. (5), minimize the difference in the matrices $\mathbf{A}$ and $\mathbf{D}$ by solving

## (5)

$$\underset{\mathbf{A},\mathbf{E}}{\mathrm{min}}\text{\hspace{0.17em}}{\Vert \mathbf{E}\Vert}_{\mathcal{F}},\phantom{\rule[-0.0ex]{1em}{0.0ex}}\text{subject to}\text{\hspace{0.17em}\hspace{0.17em}}\rho (\mathbf{A})\le r,\phantom{\rule[-0.0ex]{1em}{0.0ex}}\mathbf{D}=\mathbf{A}+\mathbf{E},$$RPCA employs an identity operator ${P}_{\mathrm{\Omega}}(\xb7)$ and sparse matrix $\mathbf{E}$ which differ from those in the matrix completion and PCA approach. Wright et al.^{19} and Candés et al.^{20} have shown that, for a sufficiently sparse error matrix, a low-rank matrix $\mathbf{A}$ can be recovered exactly from the observation matrix $\mathbf{D}$ by solving the following convex optimization problem:

## (6)

$$\mathbf{A}=\underset{{\mathbf{A}}^{\prime}}{\mathrm{argmin}}\{{\Vert {\mathbf{A}}^{\prime}\Vert}_{*}+\lambda {\Vert \mathbf{E}\Vert}_{1}\},\phantom{\rule[-0.0ex]{1em}{0.0ex}}\text{subject to}\text{\hspace{0.17em}\hspace{0.17em}}\mathbf{D}=\mathbf{A}+\mathbf{E},$$^{21}

^{,}

^{22}

In the present paper, RPCA is coupled with the “inexact augmented Lagrange multiplier” (IALM)^{15} method to determine the low-frequency LWT coefficients for fusion of corrupted images. The IALM method is described in Sec. 3.2 after introducing the general procedure.

## 3.

## Frequency-Domain Fusion Rules

## 3.1.

### Overview

By adopting separate fusion strategies for high- and low-frequency components, the WT can differentially preserve the critical features that accompany these separate bands. The procedure that exploits this property is shown in Fig. 2. The source images are converted to frequency-domain coefficients by the LWT. Frequency-band-dependent fusion rules are applied to the low- and high-frequency components of each image. The inverse lifting wavelet transform (ILWT) is used to reconstruct the fused image.

## 3.2.

### Low-Frequency Fusion Based on Inexact Augmented Lagrange Multiplier

Weighted average coefficients are often employed to fuse low-frequency wavelet coefficients. This method is effective when the coefficients of the fused images are similar. However, when contrast reversal occurs in local regions of an image, this procedure results in a loss of image detail in the fused image due to reduced contrast. Further, erroneous or missing regions of corrupted images strongly affect PCA results. These inadequacies of the weighted average method and PCA provide the motivation for using RPCA to determine the weighting of low-frequency coefficients.

There is ordinarily little difference in the low-frequency coefficient values extracted by the LWT from different images of the same scene. RPCA coefficients are used to represent low-frequency content in an attempt to preserve fidelity and coherency between the subbands. Algorithms have been developed in this research to solve the RPCA problem that is the basis for the recovery of the low-rank matrix $\mathbf{A}$ and the estimation of the sparse matrix $\mathbf{E}$ from the observation matrix $\mathbf{D}$. We employ the IALM method to compute the low-frequency subband coefficients. The method is sketched as follows.

Let $\mathrm{\Gamma}={\{{\mathbf{I}}_{k}\in {\mathbb{R}}^{{N}_{1}\times {N}_{2}}\}}_{k=1}^{K}$ denote a set of corrupted images from $K$ sensors, and let $\tilde{\mathrm{\Gamma}}={\{{\tilde{\mathbf{I}}}_{k}\in {\mathbb{R}}^{({N}_{1}\times {N}_{2})/{4}^{L}}\}}_{k=1}^{K}$ be the corresponding set of low-frequency subimages computed using the LWT. $L$ is the number of LWT layers. For simplicity, we assume square images so that ${N}_{1}/{4}^{L}={N}_{2}/{4}^{L}\stackrel{\mathrm{def}}{=}N$. Stack all $N$ columns of each ${\tilde{\mathbf{I}}}_{k}$ into a single vector of dimension ${N}^{2}$, then use these vectors as $K$ columns of a matrix ${\tilde{\mathbf{I}}}_{D}$. After normalizing the data, we denote by ${i}_{\ell k}$ the $(\ell ,k)$ element of ${\tilde{\mathbf{I}}}_{D}$,

## (7)

$${\tilde{\mathbf{I}}}_{D}=\left(\begin{array}{ccc}{i}_{11}& {i}_{12}\dots & {i}_{1K}\\ \begin{array}{c}{i}_{21}\\ \vdots \end{array}& \begin{array}{c}{i}_{22}\cdots \\ \vdots \ddots \end{array}& \begin{array}{c}{i}_{2K}\\ \vdots \end{array}\\ {i}_{{N}^{2}1}& {i}_{{N}^{2}2}\cdots & {i}_{{N}^{2}K}\end{array}\right).$$## (9)

$$\underset{{\tilde{\mathbf{I}}}_{A},{\tilde{\mathbf{I}}}_{E}}{\mathrm{min}}{\Vert {\tilde{\mathbf{I}}}_{A}\Vert}_{*}+\lambda {\Vert {P}_{\mathrm{\Omega}}({\tilde{\mathbf{I}}}_{E})\Vert}_{1}\phantom{\rule[-0.0ex]{1em}{0.0ex}}\text{subject to}\text{\hspace{0.17em}\hspace{0.17em}}{\tilde{\mathbf{I}}}_{A}+{\tilde{\mathbf{I}}}_{E}={\tilde{\mathbf{I}}}_{D},$$## (10)

$$L({\tilde{\mathbf{I}}}_{A},{\tilde{\mathbf{I}}}_{E},\mathbf{Y},\mu )={\Vert {\tilde{\mathbf{I}}}_{A}\Vert}_{*}+\lambda {\Vert {P}_{\mathrm{\Omega}}({\tilde{\mathbf{I}}}_{E})\Vert}_{1}\phantom{\rule{0ex}{0ex}}+\mathrm{Tr}\{\mathbf{Y},{\tilde{\mathbf{I}}}_{D}-{\tilde{\mathbf{I}}}_{A}-{\tilde{\mathbf{I}}}_{E}\}+\frac{\mu}{2}{\Vert {\tilde{\mathbf{I}}}_{D}-{\tilde{\mathbf{I}}}_{A}-{\tilde{\mathbf{I}}}_{E}\Vert}_{\mathcal{F}}^{2}.$$A flowchart of the IALM algorithm is shown in Fig. 3. Definitions of the notation used in the flowchart appear in Table 1. The algorithm is recursive with superscript $j$ indicating the iteration number. The quantity ${\tilde{\mathbf{I}}}_{A}^{({j}^{\prime})}\in {\mathbb{R}}^{{N}^{2}\times K}$ is the recovered low-rank matrix for some sufficiently large $j$, say ${j}^{\prime}$. A reasonable strategy for transforming the resulting ${\tilde{\mathbf{I}}}_{A}^{({j}^{\prime})}$ to the final low-frequency subimage is to unwrap its first column to form the original $N\times N$ image structure. The final low-frequency subimage is denoted ${\tilde{\mathbf{I}}}_{\partial}$.

## Table 1

Notation used in the IALM∂ algorithm.

Notation | Definition |
---|---|

${\tilde{\mathbf{I}}}_{D}$ | Low-frequency subimage observation matrix |

${\tilde{\mathbf{I}}}_{E}^{(j)}$ | Error (sparse) matrix, iteration $j$ |

${\tilde{\mathbf{I}}}_{A}^{(j)}$ | Recovered low-rank subimage matrix, iteration $j$ |

${\mathbf{Y}}^{(j)}$ | Lagrange multiplier matrix, iteration $j$ |

$\tau $ | Mean-squared-error tolerance bound |

$\nabla (\mathbf{X})$ | Singular value decomposition (SVD) of general matrix $\mathbf{X}$ |

$\mathbf{U}$ and $\mathbf{V}$ | Customary notation for orthogonal matrices of SVD |

$\mathbf{S}$ | Customary notation for diagonal matrix of singular values |

${\mathbf{S}}_{\u03f5}[x]$ | Soft-shrinkage operator applied to scalar $x$^{15}${\mathbf{S}}_{\u03f5}[x]\stackrel{\mathrm{def}}{=}\{\begin{array}{cc}x-\u03f5,& x>\u03f5\\ x+\u03f5,& x<-\u03f5\\ 0,& \text{otherwise}\end{array},x\in \mathbb{R},\u03f5>0$ |

In this process, ${\mathbf{Y}}^{(0)}$ is initialized to ${\tilde{\mathbf{I}}}_{D}/\mathrm{max}({\Vert {\tilde{\mathbf{I}}}_{D}\Vert}_{2},{\Vert {\tilde{\mathbf{I}}}_{D}\Vert}_{\infty})$; and ${\tilde{\mathbf{I}}}_{E}^{(0)}$ is initialized to zero matrix as the same size of ${\tilde{\mathbf{I}}}_{D}$; $\lambda $ is initialized to $1/\sqrt{m}$ where $m$ is the column size of ${\tilde{\mathbf{I}}}_{D}$; tolerance for stopping criterion $\tau $ is initialized to $1\times {10}^{-7}$; and $j$ is set to zero for loop computation.

## 3.3.

### High-Frequency Fusion Based on Self-Adapting Regional Variance Estimation

Processing of high-frequency wavelet coefficients has a direct effect on salient details which affect the overall clarity of the image. As the variance of a subimage characterizes the degree of gray level change in a corresponding image region, the variance is a key indicator in processing of high-frequency components. In addition, there is generally a strong correlation among adjacent pixels in a local area, so that there is significant amount of shared information among neighboring pixels. When variances in corresponding local regions across subimages vary widely, a high-frequency fusion rule for selecting the source image of greatest variance has been shown to be effective at preserving image features.^{8}^{,}^{9} However, if the local variances of two source images are similar, this method can result in the loss of information by discarding subtle variations among different subimages. An empirical procedure has been developed in which a thresholding procedure is used to segregate local areas that have sufficiently large variance. This allows the entire set to be represented by the single maximum-variance set member. The selection of this difference threshold, $\xi $, is discussed below.

Let us return to the original set of images $\mathrm{\Gamma}={\{{\mathbf{I}}_{k}\in {\mathbb{R}}^{{N}_{1}\times {N}_{2}}\}}_{k=1}^{K}$. Denote by ${\mathbf{I}}_{k}(x,y)$ the gray-scale value at pixel $(x,y)$ in the $k$’th image. Also let ${\mathbf{V}}_{k}\in {\mathbb{R}}^{{N}_{1}\times {N}_{2}}$ denote a matrix associated with image ${\mathbf{I}}_{k}$ in which matrix element ${\mathbf{V}}_{k}(x,y)$ contains the normalized sample variance of the $3\times 3$ window of pixels centered on pixel $(x,y)$. The normalized sample variance means that all variance values are in the interval [0,1]. Without loss of generality, we select images ${\mathbf{I}}_{1}$ and ${\mathbf{I}}_{2}$ with which to describe the steps of the high-frequency fusion algorithm:

1. Compute normalized sample variance matrices ${\mathbf{V}}_{1}$ and ${\mathbf{V}}_{2}$. Then ${\mathbf{V}}_{k}(x,y)$ denotes the normalized variance value of pixel $(x,y)$ in image ${\mathbf{I}}_{k}$ for $k=1$, 2.

2. Implement the LWT over $L=2$ layers against ${\mathbf{I}}_{1}$, ${\mathbf{I}}_{2}$, ${\mathbf{V}}_{1}$, and ${\mathbf{V}}_{2}$. Multiresolution structures for each matrix are obtained: ${\mathbf{I}}_{1}^{\theta}$, ${\mathbf{I}}_{2}^{\theta}$, ${\mathbf{V}}_{1}^{\theta}$, and ${\mathbf{V}}_{2}^{\theta}$, in which the superscript $\theta $ takes one of three designators of direction—horizontal ($h$), vertical ($v$) or diagonal ($d$)—associated with structure matrix

## (11)

$${\mathrm{\Delta}}_{V}^{\theta}(x,y)={\mathbf{V}}_{1}^{\theta}(x,y)-{\mathbf{V}}_{2}^{\theta}(x,y).$$Let ${\mathrm{\Delta}}_{V}(x,y)$ denote the sum of the differences in the horizontal, vertical, and diagonal directions

in which ${\mathbf{V}}_{k}^{\theta}(x,y)$ indicates the normalized variance of the $k$’th image in direction $\theta $.## (12)

$${\mathrm{\Delta}}_{V}(x,y)=[{\mathbf{V}}_{1}^{h}(x,y)-{\mathbf{V}}_{2}^{h}(x,y)]+[{\mathbf{V}}_{1}^{v}(x,y)-{\mathbf{V}}_{2}^{v}(x,y)]+[{\mathbf{V}}_{1}^{d}(x,y)-{\mathbf{V}}_{2}^{d}(x,y)],$$3. Compare the threshold value and $|{\mathrm{\Delta}}_{V}(x,y)|$. If $|{\mathrm{\Delta}}_{V}(x,y)|\ge \xi $ take the pixel value with bigger variance as the wavelet coefficient after fusion; otherwise use a weighted sum to compute the wavelet coefficient, ${D}_{F}^{\theta}$ is the multiresolution structure after fusion, namely

## (13)

$${D}_{F}^{\theta}(x,y)=\{\begin{array}{ll}{\mathbf{I}}_{1}^{\theta}(x,y),& \text{when}\text{\hspace{0.17em}\hspace{0.17em}}{\mathrm{\Delta}}_{V}(x,y)>0\text{\hspace{0.17em}}\text{and}\text{\hspace{0.17em}}|{\mathrm{\Delta}}_{V}(x,y)|\ge \xi ,\\ {\mathbf{I}}_{2}^{\theta}(x,y).& \text{when}\text{\hspace{0.17em}\hspace{0.17em}}{\mathrm{\Delta}}_{V}(x,y)<0\text{\hspace{0.17em}}\text{and}\text{\hspace{0.17em}}|{\mathrm{\Delta}}_{V}(x,y)|\ge \xi ,\\ {\mathbf{V}}_{1}^{\theta}(x,y){\mathbf{I}}_{1}^{\theta}(x,y)+{\mathbf{V}}_{2}^{\theta}(x,y){\mathbf{I}}_{2}^{\theta}(x,y),& \text{when}\text{\hspace{0.17em}\hspace{0.17em}}|{\mathrm{\Delta}}_{V}(x,y)|<\xi .\end{array}$$In this study, the value of $\xi $ is set to 0.8. This means that when the normalized variance of the pixel $(x,y)$ in one image is much greater than another, the source image of greater variance is selected. Otherwise, the coefficient is obtained by averaging as in Eq. (13). This fusion rule for high-frequency subimages not only results in the retention of details, but it also prevents the loss of image information caused by redundant data. It ensures the consistency of the fused image.

In summary, IALM is used to determine the low-frequency component to be fused, and self-adapting regional variance is employed to estimate the high-frequency contribution. The fused wavelet coefficients are combined by ILWT to create the final result.

## 4.

## Experimental Results and Analysis

## 4.1.

### Comparison of Robust Principal Component Analysis Algorithms

To validate the new procedure, four groups of experiments results are reported. The objective of the first is to compare the performance of RPCA algorithms with that of IALM. The results are shown in Table 2. Two mainstream algorithms are compared—singular value thresholding (SVT), accelerated proximal gradient with IALM.

## Table 2

Comparison of RPCA algorithms.

N | Algorithm | r | NMSE | #SVD | Time (s) |
---|---|---|---|---|---|

500 | SVT | 25 | $1.35\times {10}^{-4}$ | 78 | 13.72 |

500 | APG | 25 | 2.33$\times {10}^{-5}$ | 56 | 10.34 |

500 | IALM | 25 | 4.73$\times {10}^{-7}$ | 34 | 3.32 |

600 | SVT | 30 | 1.27$\times {10}^{-4}$ | 77 | 19.02 |

600 | APG | 30 | 2.11$\times {10}^{-5}$ | 58 | 16.92 |

600 | IALM | 30 | 4.61$\times {10}^{-7}$ | 34 | 5.64 |

700 | SVT | 35 | 1.36$\times {10}^{-4}$ | 74 | 24.77 |

700 | APG | 35 | 2.25$\times {10}^{-5}$ | 58 | 26.25 |

700 | IALM | 35 | 4.62$\times {10}^{-7}$ | 34 | 8.41 |

800 | SVT | 40 | 1.26$\times {10}^{-4}$ | 75 | 33.95 |

800 | APG | 40 | 2.14$\times {10}^{-5}$ | 59 | 42.14 |

800 | IALM | 40 | 4.30$\times {10}^{-7}$ | 34 | 12.09 |

900 | SVT | 45 | 1.27$\times {10}^{-4}$ | 75 | 42.52 |

900 | APG | 45 | 2.03$\times {10}^{-5}$ | 60 | 59.24 |

900 | IALM | 45 | 4.45$\times {10}^{-7}$ | 34 | 16.78 |

1000 | SVT | 50 | 1.25$\times {10}^{-4}$ | 73 | 52.65 |

1000 | APG | 50 | 2.16$\times {10}^{-5}$ | 60 | 72.26 |

1000 | IALM | 50 | 4.45$\times {10}^{-7}$ | 34 | 22.54 |

2000 | SVT | 100 | 1.30$\times {10}^{-4}$ | 71 | 257.17 |

2000 | APG | 100 | 2.05$\times {10}^{-4}$ | 64 | 387.42 |

2000 | IALM | 100 | 4.39$\times {10}^{-7}$ | 34 | 154.43 |

In this table, the input dataset named observation matrix $\mathbf{D}$ of Eq. (6) is of dimension $N\times N$. It has some random missing or broken pixels. For fair comparison, we set $r$, the rank of $\mathbf{A}$, to $0.05\text{\hspace{0.17em}\hspace{0.17em}}N$, and define the normalized mean squared error (NMSE) as

## (14)

$$\mathrm{NMSE}=\frac{{\Vert \mathbf{D}-\mathbf{A}-\mathbf{E}\Vert}_{\mathcal{F}}}{{\Vert \mathbf{D}\Vert}_{\mathcal{F}}}.$$In Table 2, the column labeled #SVD indicates the number of iterations. The “times” column displays the number of seconds to run the algorithm. The oversampling rate $(p/{d}_{r})$ is six, implying $\sim 60\%$ downsampling of the data appearing in the observation matrix, in which, ${d}_{r}$ indicates the number of degrees of freedom in the rank $r$ matrices: ${d}_{r}=r(2N-r)$. $p$ elements from $\mathbf{A}$ are then sampled uniformly to form the known samples in $\mathbf{D}$.^{16}

Among the three algorithms, IALM exhibits superiority performance in all three measures. The results indicate that time increases proportionately with ${N}^{2}$. Note, however, that #SVD is not dependent upon $N$.

## 4.2.

### Fusion of Clean Images

For convenience, we will refer to the new algorithm as ${\mathrm{IALM}}_{\partial}$. The next two groups of experiments involve processing of left-focus–right-focus images and visible-light–infrared-light images, comparing different image-fusion algorithms with ${\mathrm{IALM}}_{\partial}$. The source images are not corrupted by noise or errors. The spline $5/3$ wavelet basis^{23} was selected for the LWT process. Through factorization, the equivalent lifting wavelet was obtained. The experimental results are shown in Figs. 4 and 5.

The first group of source images involves those with eccentric focus, the second contains images of visible contrasting and infrared light. Fig. 4(a) shows a left-focused source image, whereas Fig. 4(b) is right-focused; Fig. 5(a) is a visible-light source image, while Fig. 5(b) uses an infrared source; in Figs. 4(c)–4(f) and 5(c)–5(f) are, respectively, the fusion results by the weighted average over low frequencies and the absolute value maximum method over high frequencies (WA_AM), weighted average over low frequencies and the local area maximum method over high frequencies (WA_AM), improved pulse-coupled neural networks (PCNN) method,^{24}^{,}^{25} and PCA-weighted over low frequencies, the self-adaptive regional variance estimation method over high frequencies (${\mathrm{PCA}}_{\partial}$), and the algorithm developed in this paper (${\mathrm{IALM}}_{\partial}$).

The processed images empirically suggest that a clearer fused image is obtained through (${\mathrm{IALM}}_{\partial}$). More detailed information is evident, e.g., in Figs. 4(e) and 4(f) in which the image information on the left edge of the large alarm clock is apparently richer than the same feature in the other three fused images. This also means that algorithm ${\mathrm{IALM}}_{\partial}$ is equally effective to algorithm ${\mathrm{PCA}}_{\partial}$, even though the algorithm ${\mathrm{IALM}}_{\partial}$ has more detailed information (Table 2). Furthermore, the new algorithm achieves a fusion result with finer detail. For example, the barbed wire in Fig. 5(d) is more clearly visible than the same feature in (c). In Fig. 5, the person in 5(c) is better defined than in 5(d), while in 5(e) and 5(f), both the barbed wire and the person, and even the smoke in the upper-right corner of the image, are easier to identify than in the others. This enhanced clarity admits more effective subsequent processing.

The following objective criteria were evaluated:

1. The “mutual information” (MI) is a measure of statistical dependence that can be interpreted as the amount of information transmitted from the source images to the fused image.

^{26}To assess the MI between source image ${I}_{1}$ and the fused image, say ${I}_{F}$, we use the estimatorwhere ${h}_{1}({l}_{1})$ and ${h}_{F}({l}_{F})$ represent the normalized histogram of source image ${I}_{1}$ and fused image ${I}_{F}$, respectively. ${l}_{1}$ and ${l}_{F}$ each take integers indicating one of ${2}^{8}$ gray levels $\{\mathrm{0,1},\dots ,255\}$. ${h}_{1,F}({l}_{1},{l}_{F})$ denote the jointly normalized histogram of ${I}_{1}$ and image ${I}_{F}$. Similarly, ${M}_{2,F}$ denotes the mutual information between image ${I}_{2}$ and the fused image ${I}_{F}$. The MI between the source images ${I}_{1}$ and ${I}_{2}$ and the fused image ${I}_{F}$ is A larger MI value indicates that the fused image includes more information from the original images.## (15)

$${M}_{1,F}=\sum _{{l}_{1},{l}_{F}}{h}_{1,F}({l}_{1},{l}_{F})\text{\hspace{0.17em}}\mathrm{log}\frac{{h}_{1,F}({l}_{1},{l}_{F})}{{h}_{1}({l}_{1}){h}_{F}({l}_{F})},$$2. The “average gradient” (AG), or “clarity,” reflects the preservation of gray level changes in the image. With dimensions ${N}_{1}={N}_{2}=N$, larger values of AG imply greater clarity and edge preservation. Gray-level differentials are important, e.g., in texture rendering. The AG is defined as

where $\mathrm{\Delta}{I}_{x}(x,y)=I(x+1,y)-I(x,y)$ and $\mathrm{\Delta}{I}_{y}(x,y)=I(x,y+1)-I(x,y)$ are the gray value differentials in the coordinate $x$ and $y$ directions, respectively.## (17)

$$\nabla \overline{g}=\frac{1}{{N}^{2}}\sum _{x=1}^{N-1}\sum _{y=1}^{N-1}\sqrt{[\mathrm{\Delta}{I}_{x}^{2}(x,y)+\mathrm{\Delta}{I}_{y}^{2}(x,y)]/2},$$3. The “correlation coefficient” (CC) is used to compare two images of the same object (or scene). CC, which measures the correlation (degree of linear coherence) between the original and the fused images, is defined as

where ${I}_{F}(x,y)$ and ${I}_{1}(x,y)$ are the gray levels at pixel $(x,y)$ in the fused and original images, and ${\overline{I}}_{F}$ and ${\overline{I}}_{1}$ denote the average gray levels in the two images.## (18)

$${C}_{F,1}=\frac{\sum _{x,y}[({I}_{F}(x,y)-{\overline{I}}_{F})][({I}_{1}(x,y)-{\overline{I}}_{1})]}{\sqrt{\sum _{x,y}{[{I}_{F}(x,y)-{\overline{I}}_{F}]}^{2}\sum _{x,y}{[{I}_{1}(x,y)-{\overline{I}}_{1}]}^{2}}},$$4. The “degree of distortion” (DD), a direct indicator of image fidelity, is defined as

in which ${I}_{F}(x,y)$ and ${I}_{1}(x,y)$ are as defined above.## (19)

$${D}_{F,1}=\frac{1}{{N}_{1}\times {N}_{2}}\sum _{x=1}^{{N}_{1}}\sum _{y=1}^{{N}_{2}}|{I}_{F}(x,y)-{I}_{1}(x,y)|,$$5. The ${Q}^{AB/F}$ metric quantifies the amount of edge information transferred from two source images ${I}_{A}$ and ${I}_{B}$ to a fused image ${I}_{F}$.

^{26}It is calculated aswhere each image is of size ${N}_{1}\times {N}_{2}$. $\alpha $ and $\beta $ represent, respectively, the edge strength and orientation. ${Q}^{AF}(x,y)$ is the product of ${Q}_{\alpha}^{AF}(x,y)$ and ${Q}_{\beta}^{AF}(x,y)$ which represent, respectively, how well the edge strength and orientation values of a pixel are represented in the fused image ${I}_{F}$. Similarly, ${Q}^{BF}(x,y)$ is computed as the product of ${Q}_{\alpha}^{BF}(x,y)$ and ${Q}_{\beta}^{BF}(x,y)$ which represent, respectively, how well the edge strength and orientation values of a pixel $(x,y)$ in ${I}_{2}$ are represented in the fused image ${I}_{F}$. ${w}^{A}(x,y)$ and ${w}^{B}(x,y)$, respectively, denote the proportion of ${Q}^{AF}(x,y)$ and ${Q}^{BF}(x,y)$, which reflect the importance of ${Q}^{AF}(x,y)$ and ${Q}^{BF}(x,y)$. The dynamic range of ${Q}^{AB/F}$ is between [0 1], and it should be as close to 1 as possible.## (20)

$${Q}^{AB/F}=\frac{{\sum}_{x=1}^{{N}_{1}}{\sum}_{y=1}^{{N}_{2}}[{Q}^{AF}(x,y){w}^{A}(x,y)+{Q}^{BF}(x,y){w}^{B}(x,y)]}{{\sum}_{x=1}^{{N}_{1}}{\sum}_{y=1}^{{N}_{2}}[{w}^{A}(x,y)+{w}^{B}(x,y)]},$$6. The “peak signal-to-noise ratio” (PSNR) is an expression for the ratio between the maximum possible power of a signal and the power of distorting noise that affects the quality of its representation. This objective metric is used to compare the effectiveness of algorithms by measuring the proximity of the fused image and the original image. The PSNR is computed as

where RMSE denotes the root mean square error between the reference and fused images. $L=256$ is the number of gray levels used in representing an image. A larger PSNR value indicates a better fusion result.## (21)

$$\mathrm{PSNR}=10\text{\hspace{0.17em}}\mathrm{lg}\text{\hspace{0.17em}}\frac{{(L-1)}^{2}}{{\mathrm{RMSE}}^{2}},$$

Tables 3 and 4 report the objective performance evaluation measures for the four fusion algorithms.

## Table 3

Experimental objective evaluation measures of Fig. 4.

Evaluation indicator | WA_LM | PCNN | PCA∂ | IALM∂ |
---|---|---|---|---|

MI | 6.1604 | 7.0814 | 7.2788 | 7.5191 |

AG | 4.2067 | 6.8089 | 6.8096 | 6.8089 |

CC | 0.9768 | 0.9749 | 0.9836 | 0.9927 |

DD | 3.9762 | 3.9089 | 3.6406 | 3.5743 |

${Q}^{AB/F}$ | 0.6133 | 0.6897 | 0.6987 | 0.6929 |

PSNR | 22.6195 | 28.1095 | 28.1270 | 31.3846 |

## Table 4

Evaluation comparison of Fig. 5.

Evaluation indicator | WA_LM | PCNN | PCA∂ | IALM∂ |
---|---|---|---|---|

MI | 2.7595 | 3.7565 | 3.8953 | 3.8938 |

AG | 6.8556 | 7.8666 | 7.9206 | 8.1982 |

CC | 0.7873 | 0.8729 | 0.8808 | 0.8976 |

DD | 17.1008 | 11.0016 | 10.9259 | 10.2100 |

${Q}^{AB/F}$ | 0.5988 | 0.6978 | 0.6798 | 0.7548 |

PSNR | 20.4271 | 24.3396 | 25.1234 | 25.3540 |

Relative to the other algorithms, ${\mathrm{IALM}}_{\partial}$ obtains the largest MI and AG for the fused images, suggesting that this algorithm can provide fused images with higher information content and better clarity. The objective indicators of fidelity to the source image also favor the IALM and self-adaptive regional variance estimation algorithm performance.

## 4.3.

### Fusion of Corrupted Images

To assess whether ${\mathrm{IALM}}_{\partial}$ is robust to missing data and image corruption, we continue to use clean, multifocus clock images for processing. At a 0.15 error rate, 15% of the pixels of the original image are corrupted, and an additional 15% are missing (gray-level values set to zero). This implies an effective data corruption rate or 30%. The results of the test of the four algorithms are shown in Fig. 6. Figures 6(a) and 6(b) show, respectively, Fig. 4(a) with errors and Fig. 4(b) with errors. Figure 6(c) shows the result of using ${\mathrm{PCA}}_{\partial}$ without a denoising filter, while Fig. 6(d), labeled ${\mathrm{PCA}}_{\partial ,\mathcal{F}}$, shows the result of using ${\mathrm{PCA}}_{\partial}$ with an adaptive median filter. The result of using PCNN with an adaptive median filter is labeled ${\mathrm{PCNN}}_{\mathcal{F}}$ and appears in Fig. 6(e). To achieve this outcome, we use the adaptive median filtering strategy proposed by Chen and Wu^{27} to identify pixels corrupted by impulsive noise and replace each damaged pixel by the median of its neighborhood. The adaptive median filter can employ varying window sizes to accommodate different noise conditions and to reduce distortions like excessive thinning or thickening of object boundaries. Figure 6(f) shows results using ${\mathrm{IALM}}_{\partial}$ without denoising. The clarity of result 6(f) relative to those in 6(c), 6(d), and 6(e) is quite apparent. The empirical image quality tracks the improvement in PSNR as reported in the captions. Figures 6(g) and 6(h) show 400% blow ups of portions of 6(e) and 6(f).

These results demonstrate the ability of ${\mathrm{IALM}}_{\partial}$ to recover the missing or erroneous data, while preserving image detail in both corrupted and clean images.

## 5.

## Conclusions

Traditional convolution-based wavelet transform processing for image fusion has shortcomings including large memory requirements and high computational complexity. The approach to fusion taken in this research uses different fusion rules for low-frequency and high-frequency decomposition components represented on a lifting wavelet basis set. Low-frequency components are characterized by the matrix completion and RPCA methods: IALM, whereas the high-frequency components critical for image details are represented by taking into account the variance differences among proximal neighborhoods. Furthermore, strong correlation between pixels in a local area is captured by a self-adaptive regional variance assessment.

Experimental results show that the new algorithm not only improves the amount of information and the correlation between the fused and source images, but also reduces the level of distortion. Significant clarity improvement relative to state-of-the-art methods is also demonstrated for corrupted images.

## Acknowledgments

This research was supported in part by the National Natural Science Foundation of China (Grant No. 30970780) and by the General Program of Science and Technology Development Project of Beijing Municipal Education Commission of China (Grant No. KM201110005033). J.D. and D.B. efforts were supported in part by the U.S. National Science Foundation under Cooperative Agreement DBI-0939454. Any opinions, conclusions, or recommendations expressed are those of the authors and do not necessarily reflect the views of the NSF. This work was undertaken in part while Z.W. was a visiting research scholar at the Michigan State University. The authors thank the Beijing University of Technology’s Multimedia Information Processing Lab for assistance.

## References

## Biography

**Zhuozheng Wang** is an associate professor at Beijing University of Technology and a visiting scholar at Michigan State University sponsored by the China Scholarship Council. He received his MS and PhD degrees in electronic engineering from Beijing University of Technology in 2005 and 2013. He is the first author of more than 10 academic papers and has written one book chapter. His current research interests include image processing, electroencephalography, and virtual reality technology. He has been a reviewer and is a member of SPIE.

**J. R. Deller Jr.** is an IEEE fellow and professor of electrical and computer engineering at Michigan State University, where he received the distinguished faculty award in 2004. He received a PhD in biomedical engineering in 1979, an MS degree in electrical and computer engineering in 1976, and an MS degree in biomedical engineering in 1975 from the University of Michigan, and his BS degree in electrical engineering (summa cum laude) in 1974 from Ohio State University. His research interests include statistical signal processing with applications to speech and hearing, genomics, and other aspects of biomedicine.

**Blair D. Fleet** received her BS degree (summa cum laude) from Morgan State University, Baltimore, MD, in 2010, and her MS degree from Michigan State University in 2012, both in electrical engineering. She is a National Science Foundation graduate research fellowship award recipient, as well as a GEM (the National Consortium for graduate degrees for Minorities in Engineering and Science, Inc.) fellow. She is currently pursuing her PhD in electrical engineering at Michigan State University. Her research interests include merging signal/image processing with the evolutionary computation fields to solve challenging engineering processing problems, especially in the biomedical domain.