## 1.

## Introduction

Due to the launch cost and technical limitations, most Earth observation satellites, such as IKONOS, QuickBird, GeoEye, WorldView-2, GaoFen-1, and GaoFen-2, provided a high-resolution (HR) panchromatic (PAN) image and a low-resolution (LR) multispectral (MS) image with several spectral bands.^{1}2.3.^{–}^{4} With the increasing requirement of higher spatial and spectral resolution remote data for various applications, such as feature extraction, land-cover classification, and climate change evaluation, the ability to capture high-quality images with lower cost is becoming increasingly more important. Pansharpening is a special data fusion, which emerged as the practical requirement. It integrates the complementary spectral and spatial characteristics from the provided MS image and the PAN image into the desired product.^{2} Up to now, a large number of pansharpening approaches have been developed. Broadly, three methodologies have been commonly used, namely, the component substitution, the high-frequency information injection, and the model-based methods.

The basic framework of component substitution methods is to transform the MS image into other space using a suitable transformation, and then the intensity channel or the first principal component is substituted by the PAN image. The classical component substitution methods are the intensity–hue–saturation (IHS)^{4} and principal component substitution.^{5} These methods may yield poor results in terms of spectral fidelity.^{6} In addition, due to an inadequate spectral model of the pansharpening method, spectral distortion is also generated. Some variant improvements to the original IHS-based method have been proposed. Rahmani et al.^{7} proposed an adaptive IHS (AIHS) method, which tried to represent the intensity component by an adaptive linear combination of the MS bands with the combination coefficients obtained by solving an optimization problem. Masoudi and Kabiri^{8} proposed a new IHS method using texture analysis and genetic algorithm adaption.

The high-frequency information injection methods can be briefly summarized as the sequential operations of details extraction and injection.^{9}10.11.^{–}^{12} Since most of the directional and structural information is contained in the PAN image, several researchers proposed using wavelets^{11} and contourlet transforms^{12} to capture it from the PAN image. Then, the details missed in the LR MS image can be extracted and supplemented from the PAN image. Compared with the component substitution methods, although this method preserved better spectral information, the spatial distortions may occur accompanied with some blurring and artifacts.

The model-based fusion approach is another important category. On the basis of the image restoration, some research regards the solution of the fused image as an inverse optimization problem.^{13}14.15.16.^{–}^{17} Recently, inspired by sparse representation techniques, some researchers have achieved great success in data fusion.^{18}19.20.^{–}^{21} The initial work is proposed by Li and Yang.^{18} Then, Jiang et al.^{19} proposed a two-step sparse coding method with patch normalization (PN-TSSC). Zhu and Bamler^{20} proposed a new pansharpening method named sparse fusion of images (SparseFI), which explores the sparse representation of MS image patches in a dictionary trained only from the PAN image. To take into account the signal correlation among individual MS channels, the jointly SparseFI (J-SparseFI) algorithm was proposed.^{21} Although these methods perform well, the sparse coefficients of LR MS image patches are assumed to be identical to the corresponding HR MS patches. Due to complex and variant image content in the real-world images, the mappings between sparse coefficients of LR MS and that of HR MS should be complex.^{22}^{,}^{23}

To address this issue, we propose a pansharpening approach via sparse regression aimed at finding the intrinsic and implicit relationship among the sparse coefficients to improve the robustness and stability of the remote sensing image pansharpening. To achieve this goal, on the one hand, we learn a pair of compact dictionaries relied on the sampled patch pairs from the high- and low-resolution images. The learned dictionary pair can greatly characterize the structural information of the LR MS and HR MS images. On the other hand, taken the complex relationship between the coding coefficients of LR MS and HR MS images into account, we model the complex relationship by a ridge regression and an elastic-net regression. The ridge regression characterizes the relationship of intrapatches. The elastic-net regression describes the relationship of interpatches. The theoretical analyses and experimental results in this paper indicate that the proposed method can generate competitive fusion results. The flowchart of the proposed method is summarized in Fig. 1.

## 2.

## Pansharpening via Mapping of Sparse Coefficients

In this section, the scheme of dictionary learning and the ridge regression of intrapatches are introduced at first. Thereafter, the interpatches regression mapping based on elastic-net model is discussed in detail. Finally, we present the high-resolution MS images reconstruction.

## 2.1.

### Dictionary Learning and the Ridge Regression of Intrapatches

The HR PAN image patches ${\{{\mathbf{X}}_{P}^{(b)}\}}_{b=1}^{B}$ and the corresponding LR PAN image patches ${\{{\mathbf{Y}}_{P}^{(b)}\}}_{b=1}^{B}$ have the sparse coefficients ${\{{\alpha}_{H}^{(b)}\}}_{b=1}^{B}$ and ${\{{\alpha}_{L}^{(b)}\}}_{b=1}^{B}$ under the corresponding dictionaries ${\mathbf{D}}_{H}$ and ${\mathbf{D}}_{L}$, respectively. Inspired by Wang et al.,^{22} we make an assumption that there is an implicit mapping function $\mathbf{M}\in {\mathbf{R}}^{N\times N}$ between the sparse coefficients of LR MS and HR MS patch pairs. In addition, the sparse coefficients of LR PAN and HR PAN patch pairs share the mapping function (see Fig. 2), which can be modeled by the following linear ridge equation:

## (2)

$$\underset{\mathbf{M}}{\mathrm{arg}\text{\hspace{0.17em}}\mathrm{min}}\sum _{b=1}^{B}{\Vert {\alpha}_{H}^{(b)}-\mathbf{M}{\alpha}_{L}^{(b)}\Vert}_{2}^{2}.$$However, OLS often does poorly in prediction. Thus, corresponding penalization techniques have been proposed to improve OLS to get a particular solution with desirable properties. For example, ridge regression^{24} and lasso^{25} are two popular and representative methods.

In our work, to get the mapping function $\mathbf{M}$ with low computational burden and stable least square solution, we impose an $F$-norm regularization penalty. The regularization term is included in the above-mentioned minimization

## (3)

$$\underset{\mathbf{M}}{\mathrm{arg}\text{\hspace{0.17em}}\mathrm{min}}\sum _{b=1}^{B}{\Vert {\alpha}_{H}^{(b)}-\mathbf{M}{\alpha}_{L}^{(b)}\Vert}_{2}^{2}+\beta {\Vert \mathbf{M}\Vert}_{F}^{2}.$$To enforce that the image patch pairs have the corresponding sparse coefficients with respect to HR dictionary ${\mathbf{D}}_{H}$ and LR dictionary ${\mathbf{D}}_{L}$, a joint learning model below is proposed to find the desired dictionary pair as well as the desired intrapatches mapping

## (4)

$$\underset{{\mathbf{D}}_{H},{\mathbf{D}}_{L},{\alpha}_{H},{\alpha}_{L},\mathbf{M}}{\mathrm{arg}\text{\hspace{0.17em}}\mathrm{min}}\sum _{b=1}^{B}{\Vert {\mathbf{X}}_{P}^{(b)}-{\mathbf{D}}_{H}{\alpha}_{H}^{(b)}\Vert}_{2}^{2}+\sum _{b=1}^{B}{\Vert {\mathbf{Y}}_{P}^{(b)}-{\mathbf{D}}_{L}{\alpha}_{L}^{(b)}\Vert}_{2}^{2}+{\lambda}_{1}\sum _{b=1}^{B}{\Vert {\alpha}_{H}^{(b)}\Vert}_{1}+{\lambda}_{2}\sum _{b=1}^{B}{\Vert {\alpha}_{L}^{(b)}\Vert}_{1}+{\lambda}_{3}\sum _{b=1}^{B}{\Vert {\alpha}_{H}^{(b)}-\mathbf{M}{\alpha}_{L}^{(b)}\Vert}_{2}^{2}+{\lambda}_{4}{\Vert \mathbf{M}\Vert}_{F}^{2}\phantom{\rule{0ex}{0ex}}\mathrm{s.t.}\text{\hspace{0.17em}\hspace{0.17em}}{\Vert {\mathbf{D}}_{H,i}\Vert}_{2}^{2}\le 1,\phantom{\rule[-0.0ex]{1em}{0.0ex}}{\Vert {\mathbf{D}}_{L,i}\Vert}_{2}^{2}\le 1,\phantom{\rule[-0.0ex]{1em}{0.0ex}}i=\mathrm{1,2},\dots ,N.$$Given ${\mathbf{X}}_{P}=[{\mathbf{X}}_{P}^{(1)},{\mathbf{X}}_{P}^{(2)},\dots ,{\mathbf{X}}_{P}^{(B)}]$, ${\mathbf{Y}}_{P}=[{\mathbf{Y}}_{P}^{(1)},{\mathbf{Y}}_{P}^{(2)},\dots ,{\mathbf{Y}}_{P}^{(B)}]$, ${\alpha}_{H}=[{\alpha}_{H}^{(1)},{\alpha}_{H}^{(2)},\dots ,{\alpha}_{H}^{(B)}]$, ${\alpha}_{L}=[{\alpha}_{L}^{(1)},{\alpha}_{L}^{(2)},\dots ,{\alpha}_{L}^{(B)}]$, ${\Vert {\alpha}_{H}\Vert}_{\mathrm{1,1}}=\sum _{b=1}^{B}{\Vert {\alpha}_{H}^{(b)}\Vert}_{1}$, and ${\Vert {\alpha}_{L}\Vert}_{\mathrm{1,1}}=\sum _{b=1}^{B}{\Vert {\alpha}_{L}^{(b)}\Vert}_{1}$, Eq. (4) can be rewritten as follows:

## (5)

$$\underset{{\mathbf{D}}_{H},{\mathbf{D}}_{L},{\alpha}_{H},{\alpha}_{L},\mathbf{M}}{\mathrm{arg}\text{\hspace{0.17em}}\mathrm{min}}\text{\hspace{0.17em}}{\Vert {\mathbf{X}}_{P}-{\mathbf{D}}_{H}{\alpha}_{H}\Vert}_{F}^{2}+{\Vert {\mathbf{Y}}_{P}-{\mathbf{D}}_{L}{\alpha}_{L}\Vert}_{F}^{2}+{\lambda}_{1}{\Vert {\alpha}_{H}\Vert}_{\mathrm{1,1}}+{\lambda}_{2}{\Vert {\alpha}_{L}\Vert}_{\mathrm{1,1}}+{\lambda}_{3}{\Vert {\alpha}_{H}-\mathbf{M}{\alpha}_{L}\Vert}_{F}^{2}+{\lambda}_{4}{\Vert \mathbf{M}\Vert}_{F}^{2}\phantom{\rule{0ex}{0ex}}\mathrm{s.t.}\text{\hspace{0.17em}\hspace{0.17em}}{\Vert {\mathbf{D}}_{H,i}\Vert}_{2}^{2}\le 1,\phantom{\rule[-0.0ex]{1em}{0.0ex}}{\Vert {\mathbf{D}}_{L,i}\Vert}_{2}^{2}\le 1,\phantom{\rule[-0.0ex]{1em}{0.0ex}}i=\mathrm{1,2},\dots ,N.$$^{26}

^{,}

^{27}Thus, initialized the mapping function $\{\mathbf{M}\}$ as an identity matrix and the dictionary pairs by DCT basis, the optimization performs in an alternative scheme over three stages. We call them “sparse coefficients update,” “dictionary update,” and “mapping function update,” which are corresponding to Eqs. (6)–(11). In other words, Eq. (5) is solved by translating to three subproblems.

In the sparse coefficients update stage, the mapping function $\mathbf{M}$ and dictionary $\mathbf{D}$ are fixed, and we can get the sparse coefficients ${\alpha}_{H}$ and ${\alpha}_{L}$ as follows:

## (6)

$$\underset{{\alpha}_{H}}{\mathrm{arg}\text{\hspace{0.17em}}\mathrm{min}}{\Vert {\mathbf{X}}_{P}-{\mathbf{D}}_{H}{\alpha}_{H}\Vert}_{F}^{2}+{\lambda}_{1}{\Vert {\alpha}_{H}\Vert}_{\mathrm{1,1}}+{\lambda}_{3}{\Vert {\alpha}_{H}-\mathbf{M}{\alpha}_{L}\Vert}_{F}^{2},$$## (7)

$$\underset{{\alpha}_{L}}{\mathrm{arg}\text{\hspace{0.17em}}\mathrm{min}}{\Vert {\mathbf{Y}}_{P}-{\mathbf{D}}_{L}{\alpha}_{L}\Vert}_{F}^{2}+{\lambda}_{2}{\Vert {\alpha}_{L}\Vert}_{\mathrm{1,1}}+{\lambda}_{3}{\Vert {\alpha}_{H}-\mathbf{M}{\alpha}_{L}\Vert}_{F}^{2}.$$## (8)

$$\underset{{\alpha}_{H}}{\mathrm{arg}\text{\hspace{0.17em}}\mathrm{min}}{\Vert {\overline{\mathbf{X}}}_{P}-{\overline{\mathbf{D}}}_{H}{\alpha}_{H}\Vert}_{F}^{2}+{\lambda}_{1}{\Vert {\alpha}_{H}\Vert}_{\mathrm{1,1}},$$^{28}

Similar to the sparse coefficients update stage, with the sparse coefficients $\{{\alpha}_{H},{\alpha}_{L}\}$ fixed, we update the dictionaries according to the following equations:

## (9)

$$\underset{{\mathbf{D}}_{H}}{\mathrm{arg}\text{\hspace{0.17em}}\mathrm{min}}{\Vert {\mathbf{X}}_{P}-{\mathbf{D}}_{H}{\alpha}_{H}\Vert}_{F}^{2}\phantom{\rule[-0.0ex]{1em}{0.0ex}}\mathrm{s.t.}\text{\hspace{0.17em}\hspace{0.17em}}{\Vert {\mathbf{D}}_{H,i}\Vert}_{2}^{2}\le 1,\phantom{\rule[-0.0ex]{1em}{0.0ex}}i=\mathrm{1,2},\dots ,N,$$## (10)

$$\underset{{\mathbf{D}}_{L}}{\mathrm{arg}\text{\hspace{0.17em}}\mathrm{min}}{\Vert {\mathbf{Y}}_{P}-{\mathbf{D}}_{L}{\alpha}_{L}\Vert}_{F}^{2}\phantom{\rule[-0.0ex]{1em}{0.0ex}}\mathrm{s.t.}\text{\hspace{0.17em}\hspace{0.17em}}{\Vert {\mathbf{D}}_{L,i}\Vert}_{2}^{2}\le 1,\phantom{\rule[-0.0ex]{1em}{0.0ex}}i=\mathrm{1,2},\dots ,N.$$^{29}can be used to solve $\{{\mathbf{D}}_{H},{\mathbf{D}}_{L}\}$.

At last, we update the mapping function $\mathbf{M}$ fixed the dictionaries and sparse coefficients

## (11)

$${\mathbf{M}}^{*}=\underset{\mathbf{M}}{\mathrm{arg}\text{\hspace{0.17em}}\mathrm{min}}{\Vert {\alpha}_{H}-\mathbf{M}{\alpha}_{L}\Vert}_{F}^{2}+\beta {\Vert \mathbf{M}\Vert}_{F}^{2},$$## 2.2.

### Learning Interpatch Regression Mapping Based on Elastic-Net Model

The above-mentioned regression mapping learning model just express the relationship of the sparse coefficients of image intrapatches. In Sec. 2.2, the relationship between sparse coefficients ${\alpha}_{H}^{(b)}$ and all of the sparse coefficients ${\left\{{\alpha}_{L}^{(b)}\right\}}_{b=1}^{B}$ of LR patches is taken into account.

Let ${\left\{{\alpha}_{L}^{(b)}\right\}}_{b=1}^{B}$ be the predictors and ${\alpha}_{H}^{(b)}$ be the response.^{30} Then, the model between response and predictors can be assumed as

## (13)

$${\alpha}_{H}^{(b)}\approx \sum _{p=1}^{B}{\alpha}_{L}^{(p)}{w}_{b}^{(p)},\phantom{\rule[-0.0ex]{2em}{0.0ex}}b=\mathrm{1,2},\dots B.$$Given $\mathbf{A}=[{\alpha}_{L}^{(1)},{\alpha}_{L}^{(2)},\dots ,{\alpha}_{L}^{(B)}]$;, the response is centered, and the predictors are standardized. Then, we reformulate Eq. (13) as

## (14)

$${\alpha}_{H}^{(b)}={\mathbf{Aw}}_{b}+\mathbf{n},\phantom{\rule[-0.0ex]{2em}{0.0ex}}b=\mathrm{1,2},\dots B.$$Based on elastic-net regression model,^{30} we proposed the following model:

## (15)

$${\widehat{\mathbf{w}}}_{b}=\underset{{\mathbf{w}}_{b}}{\mathrm{arg}\text{\hspace{0.17em}}\mathrm{min}}{\Vert {\alpha}_{H}^{(b)}-{\mathbf{Aw}}_{b}\Vert}_{2}^{2}+{\gamma}_{1}{\Vert {\mathbf{w}}_{b}\Vert}_{1}+{\gamma}_{2}{\Vert {\mathbf{w}}_{b}\Vert}_{2}^{2},$$Let us define $\overline{\mathbf{A}}={(1+{\gamma}_{2})}^{-\frac{1}{2}}\left[\begin{array}{c}\mathbf{A}\\ \sqrt{{\gamma}_{2}}\mathbf{I}\end{array}\right]$, ${\overline{\alpha}}_{H}^{(b)}=\left[\begin{array}{c}{\alpha}_{H}^{(b)}\\ \mathbf{0}\end{array}\right]$, $\gamma =\frac{{\gamma}_{1}}{\sqrt{1+{\gamma}_{2}}}$, and ${\overline{\mathbf{w}}}_{b}=\left[\begin{array}{c}{\mathbf{w}}_{b}\\ \mathbf{0}\end{array}\right]$. Then, the elastic-net model can be described as a lasso-type problem

## (16)

$${\widehat{\mathbf{w}}}_{b}=\underset{{\overline{\mathbf{w}}}_{b}}{\mathrm{arg}\text{\hspace{0.17em}}\mathrm{min}}{\Vert {\overline{\alpha}}_{H}^{(b)}-{\overline{\mathbf{A}}\overline{\mathbf{w}}}_{b}\Vert}_{2}^{2}+\gamma {\Vert {\overline{\mathbf{w}}}_{b}\Vert}_{1}.$$The forward–backward operator splitting algorithm^{31} is employed to solve the problem

## (17)

$$\{\begin{array}{l}{\mathbf{v}}_{(b)}^{(n)}={\overline{\mathbf{w}}}_{b}^{(n)}-\mu {\overline{\mathbf{A}}}^{\mathrm{T}}({\overline{\mathbf{A}}\overline{\mathbf{w}}}_{b}^{(n)}-{\alpha}_{H}^{(b)})\\ {\overline{\mathbf{w}}}_{b}^{(n+1)}=\underset{{\overline{\mathbf{w}}}_{b}}{\mathrm{arg}\text{\hspace{0.17em}}\mathrm{min}}\text{\hspace{0.17em}}\frac{1}{\mu}{\Vert {\overline{\mathbf{w}}}_{b}-{\mathbf{v}}_{(b)}^{(n)}\Vert}_{2}^{2}+\gamma {\Vert {\overline{\mathbf{w}}}_{b}\Vert}_{1}\end{array},$$^{32}

## (18)

$${\overline{\mathbf{w}}}_{b}^{(n+1)}=\text{shrink}({\mathbf{v}}_{(b)}^{(n)},\frac{\mu \gamma}{2}),$$## 2.3.

### High-Resolution Multispectral Image Reconstruction

After learned the dictionary-pairs ${\mathbf{D}}_{H}$, ${\mathbf{D}}_{L}$, the intrapatch mapping function ${\mathbf{M}}^{*}$ and interpatches mapping function ${\mathbf{W}}^{*}$ from the HR PAN image patches and the degraded ones, we can reconstruct the HR MS image under the following assumptions:

1. The input LR MS images ${\{{\mathbf{Y}}_{\mathrm{MS},k}\}}_{k=1}^{K}$ and LR PAN image share the LR dictionary ${\mathbf{D}}_{L}$.

2. The reconstructed HR MS images ${\{{\mathbf{X}}_{\mathrm{MS},k}\}}_{k=1}^{K}$ and HR PAN image share the HR dictionary ${\mathbf{D}}_{H}$.

3. The sparse coefficients of LR MS images ${\alpha}_{\mathrm{LMS}}$ and HR MS images ${\alpha}_{\mathrm{HMS}}$ share the learned mapping function ${\mathbf{M}}^{*}$ and ${\mathbf{W}}^{*}$.

The sparse coefficients of the $k$’th band of the LR MS image ${\{{\mathbf{Y}}_{\mathrm{MS},k}\}}_{k=1}^{K}$ can be calculated by following step (each band is processed independently):

## (20)

$$\underset{{\alpha}_{\mathrm{LMS},k}}{\mathrm{arg}\text{\hspace{0.17em}}\mathrm{min}}{\Vert {\mathbf{Y}}_{\mathrm{MS},k}-{\mathbf{D}}_{L}{\alpha}_{\mathrm{LMS},k}\Vert}_{F}^{2}+{\lambda}_{2}{\Vert {\alpha}_{\mathrm{LMS},k}\Vert}_{\mathrm{1,1}}.$$Then, we generate the corresponding coding coefficients associate ${\alpha}_{\mathrm{LMS},k}$ with the regression function ${\mathbf{M}}^{*}$ and ${\mathbf{W}}^{*}$, respectively

The sparse coefficients of the $k$’th band of the HR MS image ${\{{\mathbf{X}}_{\mathrm{MS},k}\}}_{k=1}^{K}$ is ${\alpha}_{\mathrm{HMS},k}=(1-p)\xb7{\overline{\alpha}}_{\mathrm{HMS},k}+p\xb7{\widehat{\alpha}}_{\mathrm{HMS},k}$, where $p$ is the weight parameter (here, $p=0.45$).Finally, each band can be reconstructed as

## 3.

## Experimental Results and Analysis

To assess the performance of the proposed method, both simulated experiments and real data experiments are carried out. The simulation experiments are based on the strategy proposed by Wald et al.^{33} First, the original PAN and MS images are blurred with a low-pass filter and downsampled by a decimation factor 4 to obtain a degraded PAN image and MS images. Then, these degraded PAN and MS images are used to yield the HR MS images with the same spatial resolution to the original MS images. Finally, the fused HR MS images are compared with the original MS images. The QuickBird and WorldView-2 data are employed to test the performance of the proposed method. Five typical evaluation metrics are adopted to quantitatively evaluate the pansharpened results. The correlation coefficient (CC)^{34} and root-mean-square error (RMSE) are calculated for each band between the fused MS images and the reference original MS image. Erreur relative globale adimensionnelle de synthèse (ERGAS)^{33} and ${\mathrm{Q}}_{4}$,^{35} which are two comprehensive evaluation indexes, provide unique measures of the fusion performance for all the MS bands. Furthermore, the spectral angle mapper (SAM) index^{34} is also considered to measure the spectral distortion. Smaller values of RMSE, SAM, and ERGAS tend to be achieved by a better fusion result, as do larger CC, and Q4 values. In the real data experiments, since there is no reference HR MS image, the “quality with no reference” (QNR) measurement is used to evaluate different pansharpened results objectively.^{35}

The proposed method is compared with five popular fusion algorithms: AIHS,^{7} Wavelet,^{11} PN-TSSC,^{19} SparseFI,^{20} and J-SparseFI.^{21} The results of AIHS method are gotten from the software developed by Rahmani et.al. The implementation of the AIHS method is available online in Ref. 36. The default parameters given in their implementations are adopted. Two levels of decompositions are used for the Wavelet method. The LR patch size in the SparseFI method is $7\times 7$. A total of 10,000 patch pairs are selected to construct the dictionary pairs. As to J-SparseFI method, patch size is $7\times 7$, and the dictionary size is 1000.

For our proposed pansharpening method, there are several parameters to be selected. In our experiments, we set the weight parameter $p=0.45$, the regularization parameters ${\lambda}_{1}=0.01$, ${\lambda}_{2}=0.01$, ${\lambda}_{3}=0.1$, ${\lambda}_{4}=0.1$ and ${\gamma}_{1}=0.3$, ${\gamma}_{2}=0.5$, the patch size $5\times 5$, and the dictionary size 512. In Sec. 3.1, we provided a recipe about how to select these parameters to achieve a promising pansharpened result. Experimental results using parameters selected according to this recipe will be presented through the evolution curves and final fused images on different datasets.

## 3.1.

### Parameter Analysis

In this section, we investigate the effect of the weight parameter $p$ and the regularization parameters ${\lambda}_{1}$, ${\lambda}_{2}$, ${\lambda}_{3}$, ${\lambda}_{4}$ and ${\gamma}_{1}$, ${\gamma}_{2}$ of the proposed method.

1. The effect of the weight parameter $p$: The pansharpened performance obtained by the proposed method has strong links with the regularization parameter $p$. To demonstrate it, we present the influence of the proposed method by choosing different weight parameters ($p$). CC, SAM, and Q4 are evaluated to determine $p$. The corresponding curved lines are plotted in Figs. 4 and 5, respectively. As shown in Fig. 4, when $p=0$, which can be regarded as the case of ridge regression of intrapatches, the performance of the proposed method is restricted. With the increase of $p$, more benefits on performance can be gained. This implies that the interpatches regression model is also essential for the fused result. However, we should also see that the value could not be set too high. Therefore, with a proper regularization parameter $p$, the proposed method will gain the best results. A similar conclusion can be drawn for the QuickBird data in Fig. 5. Among the above resulted images, quantitatively, the best pansharpening result is yielded at $p=0.45$.

2. The influence of the regularization parameters ${\lambda}_{1}$, ${\lambda}_{2}$, ${\lambda}_{3}$, ${\lambda}_{4}$ and ${\gamma}_{1}$, ${\gamma}_{2}$: In this section, we probe how the regularization parameters affect the pansharpened performance of the proposed method. To test the influence, we tune one of them with the other fixed. Tables 1 and 2 are the objective assessment results of the proposed method on WorldView-2 data according to different values of ${\lambda}_{1}$ and ${\lambda}_{2}$. Table 3 reports the influence of ${\gamma}_{1}$ on QuickBird data. But it should reinforce the point that the pansharpened results are consistent with different test datasets under different parameters. From the tables, we can see that the regularization parameters play important roles in the final pansharpened performance. Little worse performance will be obtained when small deviations of the regularization parameters are around the selected values. If one tunes them far away from the value set by us, some worse performance will appear in the final fused image. When ${\lambda}_{1}$, ${\lambda}_{2}$, and ${\gamma}_{1}$ are in the order of 0.01, 0.01, and 0.3, respectively, the proposed method can continually obtain a stable and optimal performance. The same thing happened again to other regularization parameters ${\lambda}_{3}$, ${\lambda}_{4}$, and ${\gamma}_{2}$. For balancing the reconstruction error, we set ${\lambda}_{1}=0.01$, ${\lambda}_{2}=0.01$, ${\lambda}_{3}=0.1$, ${\lambda}_{4}=0.1$, ${\gamma}_{1}=0.3$, and ${\gamma}_{2}=0.5$.

## Table 1

The effects of the proposed method under different λ1 on WorldView-2 data.

λ1 | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

0.008 | 0.009 | 0.01 | 0.011 | 0.012 | 0.015 | 0.02 | 0.05 | 0.1 | 0.2 | 0.5 | |

CC | 0.9828 | 0.9829 | 0.9830 | 0.9830 | 0.9825 | 0.9826 | 0.9798 | 0.9821 | 0.9229 | 0.8511 | 0.7392 |

RMSE | 0.0456 | 0.0452 | 0.0451 | 0.0455 | 0.0511 | 0.0498 | 0.0502 | 0.0510 | 0.7829 | 0.9525 | 1.4834 |

SAM | 3.1892 | 3.1877 | 3.1816 | 3.1782 | 3.1940 | 3.2352 | 3.2355 | 3.2408 | 4.5768 | 6.2065 | 8.8587 |

ERGAS | 3.5991 | 3.5708 | 3.5547 | 3.5912 | 3.6576 | 3.6766 | 3.6700 | 3.6701 | 4.2308 | 6.9293 | 9.0720 |

Q4 | 0.8195 | 0.8196 | 0.8197 | 0.8189 | 0.8192 | 0.8190 | 0.8191 | 0.8180 | 0.7877 | 0.7104 | 0.5266 |

The bold and italic values represent the best results.

## Table 2

The effects of the proposed method under different λ2 on WorldView-2 data.

λ2 | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

0.008 | 0.009 | 0.01 | 0.011 | 0.012 | 0.015 | 0.02 | 0.05 | 0.1 | 0.2 | 0.5 | |

CC | 0.9825 | 0.9828 | 0.9830 | 0.9829 | 0.9828 | 0.9820 | 0.9808 | 0.9779 | 0.9378 | 0.8821 | 0.7901 |

RMSE | 0.0457 | 0.0453 | 0.0451 | 0.0448 | 0.0455 | 0.0498 | 0.0673 | 0.1998 | 1.0273 | 1.6532 | 2.7903 |

SAM | 3.2211 | 3.1945 | 3.1816 | 3.1808 | 3.1949 | 3.5775 | 3.8128 | 4.0106 | 6.0818 | 6.9501 | 8.0002 |

ERGAS | 3.5987 | 3.6213 | 3.5547 | 3.6111 | 3.5654 | 3.6799 | 3.7991 | 4.2257 | 5.9802 | 6.7733 | 8.7616 |

Q4 | 0.8195 | 0.8197 | 0.8197 | 0.8196 | 0.8194 | 0.8193 | 0.8193 | 0.8192 | 0.8050 | 0.7608 | 0.6191 |

The bold and italic values represent the best results.

## Table 3

The effects of the proposed method under different γ1 on QuickBird data.

γ1 | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

0.25 | 0.27 | 0.28 | 0.29 | 0.3 | 0.31 | 0.32 | 0.33 | 0.35 | 0.4 | 0.5 | |

CC | 0.9626 | 0.9630 | 0.9633 | 0.9632 | 0.9632 | 0.9632 | 0.9629 | 0.9625 | 0.9618 | 0.9587 | 0.9427 |

RMSE | 0.0677 | 0.0661 | 0.0656 | 0.0652 | 0.0651 | 0.0653 | 0.0659 | 0.0673 | 0.0722 | 0.0865 | 0.0991 |

SAM | 4.0602 | 4.0584 | 4.0571 | 4.0563 | 4.0563 | 4.0563 | 4.0576 | 4.0583 | 4.0626 | 4.0744 | 4.1378 |

ERGAS | 5.3766 | 5.3689 | 5.3607 | 5.3595 | 5.3587 | 5.3592 | 5.3611 | 5.3639 | 5.3705 | 5.3876 | 5.4305 |

Q4 | 0.7850 | 0.7857 | 0.7861 | 0.7864 | 0.7865 | 0.7865 | 0.7862 | 0.7860 | 0.7852 | 0.7831 | 0.7804 |

The bold values represent the best results.

## 3.2.

### Effects of Patch Size

As described in Sec. 2, our proposed method is based on patches learning. Thus, the effects of patch size on the pansharpening performance are evaluated in this section. To be fair, the dictionary size is fixed as 512. Four different patch sizes for LR PAN image are studied, including $3\times 3$, $5\times 5$, $7\times 7$, and $9\times 9$. Figure 6 shows how the patch size affects the visual quality of the fused results on WorldView-2 data. We can observe that the difference of spectral distortion is very small under different patch sizes. To illustrate well the performance change with the patches sizes, the quality indexes are calculated, where the average CC and RMSE of eight bands are presented. In addition, all the values of indexes are normalized to the range [0, 1]. The normalized results with respect to the different patch size are plotted in Fig. 7, where the horizontal axis is the patch size, and the vertical axis is the normalized results. Larger CC and Q4 indicate better fused result, and smaller RMSE, SAM, and ERGAS mean better result. Based on the curves in Fig. 7, it can be seen that when the patch size is $7\times 7$, the best performance of the proposed method is obtained. The performance of patch size $5\times 5$ is slightly worse. However, the proposed method has less space complexity at this time. Taken into account a trade-off between the practical application in the future and the performance, we choose the patch size $5\times 5$ in the following experiments.

## 3.3.

### Effects of Dictionary Size

The above experimental results, we fix the dictionary size to be 512. In general, larger dictionaries should possess more expressive power and thus may yield more accurate approximation while increasing the computation cost. In this section, we evaluate the effect of dictionary size on pansharpening, i.e., the number of atoms in dictionary. From the sampled image patch pairs, we train four dictionaries of size 256, 512, 1024, and 2048 and apply them to the same remote sensing image. The results are evaluated both visually and quantitatively in Figs. 8Fig. 9–10 and Table 4.

## Table 4

The objective indexes of the pansharpened images with dictionaries of different sizes.

Dictionary size | CC | RMSE | SAM | ERGAS | Q4 |
---|---|---|---|---|---|

256 | 0.9641 | 0.0815 | 4.1271 | 5.3470 | 0.7788 |

512 | 0.9642 | 0.0801 | 4.1023 | 5.2598 | 0.7820 |

1024 | 0.9638 | 0.0786 | 4.0882 | 5.1753 | 0.7852 |

2048 | 0.9640 | 0.0758 | 3.9789 | 4.9666 | 0.7932 |

The bold values represent the best results.

Figure 8 shows the fused results for the QuickBird image using dictionaries of different sizes. Human visual system is not sensitive to the weak spectral distortions. Always we justify the distortion from the color change.^{10} Thus, in Fig. 9, we display the difference of pixel values measured between each pansharpened image and the reference HR MS image. Deep blue represents the smallest difference, whereas the red means the largest difference. Figure 9 shows the difference image under different dictionary sizes. While there are not many visual differences for the results using different dictionary sizes from 256 to 2048, we indeed observe the artifacts will gradually diminish with larger dictionaries (i.e., the subtle differences in the yellow circle in Fig. 9). In Table 4, we list five indexes of the pansharpened image for dictionaries of different sizes. As shown in the table, using larger dictionaries will yield better quantitative indexes. However, the computation is approximately linear to the size of the dictionary, i.e., larger dictionaries will result in heavier computation. Figure 10 shows the computation time in seconds with the fused image. In practice, one chooses an appropriate dictionary size as a trade-off between pansharpening quality and computation cost. We find that dictionary size 512 can yield decent outputs while allowing fast computation.

## 3.4.

### Simulation Results and Analysis

The size of the LR MS image in the simulated WorldView-2 data experiment is $128\times 128\text{\hspace{0.17em}\hspace{0.17em}}\text{pixels}$ with eight bands and the corresponding PAN image sized $512\times 512\text{\hspace{0.17em}\hspace{0.17em}}\text{pixels}$. In Fig. 11(i), there are many varieties of ground objects, such as vegetation, buildings, and roads. In the aspect of the visual effects, the Wavelet method suffers from both spectral distortion and blocky artifacts in the building regions, as shown in Fig. 11(d). Meanwhile, the proposed method and J-SparseFI method show especially significant improvement in the spatial resolution. The details in the final fused images are as clear as that in the PAN image. AIHS and PN-TSSC methods are not good enough at improving the spatial resolutions. In Figs. 11(c) and 11(e), a lot of details are lost in the buildings, and the noise is introduced into the fused images. Compared with AIHS and PN-TSSC methods, SparseFI method is good in injecting more details. However, there are phenomena of the spectral distortions. The color of the whole fused images is dark. The visual effects are not close to the reference HR MS image.

To evaluate the performance of various methods objectively, Table 5 presents the objective evaluations about different methods. On the whole, the proposed method demonstrates the best objective performance, i.e., ranking as the first for the CC, RMSE, SAM, ERGAS, and Q4. The CC value of J-SparseFI method is inferior to the proposed method. It means J-SparseFI method cannot preserve the spectral information as well as the latter. The high Q4 value represents its effectiveness of improvement of the spatial resolution. The low SAM and ERGAS values of AIHS, PN-TSSC, and SparseFI methods show their shortage of spectral information of preservation. We also noticed the values of Q4 are very high, which reflect that the SparseFI and J-SparseFI method have the advantage of improving the spatial resolution.

## Table 5

Objective performance for different pansharpening methods on WorldView-2 data.

AIHS | Wavelet | PN-TSSC | SparseFI | J-SparseFI | Proposed | |
---|---|---|---|---|---|---|

CC | 0.9712 | 0.9745 | 0.9779 | 0.9785 | 0.9816 | 0.9830 |

RMSE | 0.0691 | 0.0911 | 0.0662 | 0.0594 | 0.0490 | 0.0451 |

SAM | 5.3163 | 7.0537 | 4.0698 | 3.1437 | 3.0297 | 3.0009 |

ERGAS | 5.4726 | 7.5020 | 4.1187 | 3.6593 | 3.6208 | 3.5547 |

Q4 | 0.7405 | 0.6712 | 0.8089 | 0.8172 | 0.8196 | 0.8197 |

Figure 12 shows the difference of pixel values measured between each pansharpened image and the reference HR MS image. We can see that all the fused images by different methods have a lot of areas with blue and deep blue from the corresponding difference images. Fortunately, the proposed method shows the best performance. The least area of the red part in Fig. 12(f) indicates that the proposed method has a small number of outliers, which is consistent with the above objective and subjective analysis.

## 3.5.

### Experiments on Real Data

To verify the effectiveness of the proposed method in practical applications, we move on to conduct the experiments on the real data. Figures 13 and 14 present the fused results using six different methods. In Fig. 13, we can see that the AIHS method has the problem of textures overenhancement, as shown in the mountain. Figures 13(e)–13(g) are the pansharpened images provided by the PN-TSSC, SparseFI, and J-SparseFI, respectively, which can obtain promising results without causing obvious spectral and spatial distortion. Since our work illustrates the complex mapping relationship between the LR MS and HR MS image, experiments are also performed on complex urban scenes, as shown in Fig. 14. The final pansharpened results of various methods are presented in Fig. 14(c)–14(h). By the comparisons of all results, the component substitution method (AIHS) and the model-based methods (PN-TSSC, SparseFI, J-SparseFI, and proposed) outperform the high-frequency information injection method (Wavelet) in preserving the spectral information. Specifically, the proposed method produces the natural and satisfactory pansharpened images, which has similar spatial structures with the PAN image and similar spectral information with the MS image, as shown in Fig. 14(h). Table 6 and 7 show the quantitative assessment results.^{35} The ranking of QNR scores can confirm that the proposed method is better than the other methods in sharpening the real data.

## Table 6

Comparison of the proposed method with other methods on real data shown in Fig. 13.

AIHS | Wavelet | PN-TSSC | SparseFI | J-SparseFI | Proposed | |
---|---|---|---|---|---|---|

${D}_{\lambda}$ | 0.0524 | 0.0376 | 0.0465 | 0.0623 | 0.0596 | 0.0352 |

${D}_{s}$ | 0.1315 | 0.1003 | 0.0917 | 0.0667 | 0.0632 | 0.0618 |

QNR | 0.8230 | 0.8659 | 0.8661 | 0.8752 | 0.8810 | 0.9052 |

## Table 7

Comparison of the proposed method with other methods on real data shown in Fig. 14.

AIHS | Wavelet | PN-TSSC | SparseFI | J-SparseFI | Proposed | |
---|---|---|---|---|---|---|

${D}_{\lambda}$ | 0.0837 | 0.1212 | 0.0706 | 0.0705 | 0.0673 | 0.0611 |

${D}_{s}$ | 0.1761 | 0.1953 | 0.1418 | 0.1469 | 0.1407 | 0.1386 |

QNR | 0.7549 | 0.7072 | 0.7976 | 0.7930 | 0.8015 | 0.8088 |

## 3.6.

### Time Consumption

All the methods are implemented in MATLAB^{®} 2010b and run on an Intel Core i7@3.6-GHz PC with 32-GB RAM. For fusing a $512\times 512$ PAN image and $128\times 128\times 4$ MS images, the AIHS and Wavelet methods need less than 1 s. The PN-TSSC, SparseFI, and J-SparseFI methods take about 147 s, 115 s, and 5 min, respectively. The running time of our proposed method is 133 s. Compared with these methods, the running time of the patch-based methods is still room for improvement. However, it is also reasonable to believe that with the rapid development in computer hardware and computation techniques, the time cost of the proposed method will soon no longer be an issue. We also can learn numerous projection matrices from the LR MS feature spaces to the corresponding HR MS feature spaces, which may decrease the computational time in the pansharpening phase. This will be investigated in our future work.

## 4.

## Conclusion

In this paper, combining ridge regression and elastic net of sparse representation, a pansharpening method is proposed for merging a PAN image with an MS image. The method used HR PAN image patches and their degraded ones to construct a training database. Then, the semicoupled dictionary learning method is used to train the LR–HR dictionary pair, which is to depict the MS structural information. The relationship between coding coefficients of LR MS and HR MS image patches is predicted by the within-patch ridge regression and among-patch elastic-net model. Our method can enhance the spatial resolution of MS images while reducing the distortion of spectral. The experimental results demonstrate that the proposed method can be compared with other state-of-the-art fusion methods.

## Acknowledgments

This work was supported in part by the Fundamental Research Funds for the Central Universities (Grant No. LGZD201702), the National Natural Science Foundation of China (Grant Nos. 61302178, 61171165, 11431015, and 61571230), the Foundation of Guangxi Province (Grant No. 2014GXNSFAA118360), the Foundation of Jiangsu Province (Grant No. BK20161055), and the National Scientific Equipment Developing Project of China under Grant No. 2012YQ050250.

## References

## Biography

**Songze Tang** is currently an assistant professor in the Department of Criminal Science and Technology, Nanjing Forest Police College. He received his BS degree in information and computation science from Anhui Agriculture University, Hefei, China, in 2009, and his PhD in computer science and technology from Nanjing University of Science and Technology, Nanjing, in 2015. His research interests include computer vision and biometrics. He serves as a reviewer for three international academic journals.

**Liang Xiao** is currently a professor at the School of Computer Science, Nanjing University of Science and Technology (NUST). He received his PhD in computer science from NUST, China, in 2004. He has authored two books and around 100 technical articles in refereed journals, including the *IEEE Transactions on Image Processing*, the *IEEE Transactions on Geoscience and Remote Sensing*, and *Pattern Recognition*. His main research areas include computer vision and pattern recognition.

**Pengfei Liu** is currently an assistant professor at the School of Computer Science, Nanjing University of Posts and Telecommunications. He received his BS degree in information and computation science from Hefei Normal University, Hefei, China, in 2011, and his PhD in computer science and technology from Nanjing University of Science and Technology, Nanjing, in 2016. His research interests include computer vision and remote sensing data processing.

**Lili Huang** is currently an associate professor at Guangxi University of Science and Technology, China. She received her PhD in pattern recognition and intelligent systems from Nanjing University of Science and Technology, Nanjing, Jiangsu, China, in 2012. Her research interests cover image processing, image modeling, and superresolution reconstruction.

**Yang Xu** received his PhD in computer science from the Nanjing University of Science and Technology (NUST), Nanjing, China, in 2016. Currently, he is working as an assistant professor in Nanjing University of Science and Technology. His research interests are in the area of hyperspectral image classification, image processing, and machine learning.