Optical and SAR image fusion method with coupling gain injection and guided filtering

Abstract. Significant radiometric differences and weak grayscale correlations exist between optical and SAR images. As a result, there are severe spectral and spatial distortions in the fused images. We propose a fusion method of optical and SAR remote sensing images that couples the gain injection method and the guided filter. The proposed method is based on the fusion framework of generalized intensity-hue-saturation non-subsampled contourlet transform, and the gain injection is used for the low-frequency coefficient fusion to reduce the spectral distortion. Then, the divergence is used as the activity measure operator to calculate the initial weight template for the high-frequency coefficients. The guided filter is used to optimize the edge details of the initial weight template. The fused high-frequency coefficients are obtained by weighted average. Through comparison experiments with existing fusion methods, the results show that the proposed method has the best quality of fusion and the proposed method has the best performance.


Introduction
SAR has strong penetrating power and can generate remote sensing images without being restricted by weather and time, and the structural features of the images are apparent. However, the lack of spectral information and severe noise make SAR image interpretation difficult. Optical images are rich in texture and spectral information, but the imaging conditions are volatile and easily obscured by clouds az. 1 The pixel-level fusion of optical and SAR images can integrate both advantages and obtain complementary information, which is of great significance to overcoming the limitations of single-source remote sensing images and improving the interpretation capability of images. The fused images of SAR and optical images have been widely used in many fields to improve the interpretation of remote sensing images, such as land cover classification, 2,3 sea ice identification, 4,5 biomass estimation, 6 change detection, 7,8 flood monitoring, 9,10 and urban feature extraction and classification. 11,12 Pixel-level fusion methods of optical and SAR images can be divided into four categories: component substitution (CS) fusion methods, multi-scale decomposition (MSD) fusion methods, hybrid methods based on CS and MSD, and model-based methods. The most used method is the hybrid method, which integrates the advantages of both CS and MSD methods. Compared to single methods, hybrid methods can reduce spatial structure and spectral distortion in the fused image and are more suitable for optical and SAR image fusion. 13 Hong et al. proposed a method based on the intensity-hue-saturation (IHS) and wavelet transform. This method uses global statistical features as the active measure and then achieves fusion by directly replacing sub-bands. However, this method ignores the specificity of individual image elements and may introduce a large amount of noise, leading to significant spectral distortions in the fusion results. 14 Subsequently, Han et al. performs IHS transform and à trous wavelet transforms on the images to be fused and then uses local statistical parameters as activity measures to calculate the pixelwise fusion weight. The fusion weight estimated by this method fully considers the unique characteristic of a single pixel and the influence of neighboring pixels on the central pixel. The resulting fused image retains more spatial structure and spectral information. 15 To further reduce the spatial distortion in the fused images, Anandhi and Valli 16 calculated the fusion weights based on non-subsampled contourlet transform (NSCT) with minimum likelihood ratio, local gradient, and maximum edge intensity as the active measure operators, which can retain more edge and contour features in the fused image. Kulkarni et al. used a hybrid method of the principal component analysis (PCA) and discrete wavelet transform (DWT) transform as the base fusion framework, calculated the fusion weights using the image element local energy as the active measure operator, and performed a weighted average fusion of the components further to reduce the spectral distortion in the fused images. 17 Zhou et al. used an adaptive IHS fusion method based on phase coherence feature preservation to fuse SAR and optical remote sensing images, and more spectral and spatial structure information was retained in the results. 18 Although many scholars have used the hybrid method as the basic framework and continuously introduced better-performing activity measure operators and improved the fusion weight calculation method for multi-scale components, there are still two problems: 1. Because the fusion method of multi-scale component weighted averaging cannot overcome the nonlinear radiometric differences between SAR and optical images, significant spectral distortions are inevitable in the fusion results. 2. The noise in SAR images is serious. However, the existing multi-scale feature activity measurement operator has poor noise immunity. The fusion weight template calculated this way cannot effectively reduce the spatial and spectral distortions caused by noise.
Given the problems of existing fusion methods, this paper proposed a coupled gain injection and guided filtering method for optical and SAR image fusion. The proposed method uses generalized intensity-hue-saturation non-subsampled contourlet transform (GIHS-NSCT) as the basic fusion framework. First, GIHS extracts the luminance component I of the optical image. NSCT then decomposes the I and SAR images into multi-scale and multi-directional. Next, the low-frequency coefficients of I and SAR are fused using the gain injection method. The gain injection method is used by solving the unique features of the low-frequency coefficients of the SAR image and injecting the unique features into the low-frequency coefficients of I as gain. Fusing only the unique features of the low-frequency coefficients of SAR images can effectively reduce the spectral distortion.

Fundamental Theories and Methods
This section introduces some fundamental theories involved in the proposed fusion method, including the GIHS ensemble method, the NSCT method, and the guided filter.

GIHS Fusion Method
GIHS extends the classical IHS method of the CS class fusion method. Compared with IHS, it can acquire the luminance components of images with more than three channels. It does not require forward and inverse transformation of the image color space, which is computationally tiny and improves the fusion efficiency. 19 Therefore, GIHS is widely used in image fusion, 20,21 and we extend it to optical and SAR image fusion. The fused image calculation process of the GIHS method is as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 6 ; 1 2 3 E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 6 ; 8 0 I ¼ where F; M, and SAR represent the fused image, optical image, and SAR image, respectively. I is the brightness component of the optical image and B is the number of bands. λ and ω represent the corresponding weights of each band of the optical image, respectively.

NSCT Method
NSCT is an image MSD method proposed by Da Cunha et al. 22 It consists of the nonsubsampling pyramid filter bank (NSPFB) and the non-subsampling directional filter bank (NSDFB). NSPFB can perform MSD of images. Its non-down sampling decomposition can reduce the distortion of image elements caused by up-sampling and down-sampling processes and has translation invariance. NSDFB is a multi-directional filter bank that decomposes the image into multiple directions and preserves multi-directional detail features. The NSCT method that coupled NSPFB and NSDFB has the advantages of multi-scale, multi-directional, and nondown sampling. 23 So NSCT is widely used in image fusion. 24,25 The schematic diagram of NSCT MSD is shown in Fig. 1.

Guided Filter
The guided filter is an edge-preserving filter based on a local linear model. It works by correcting the noisy image with reference to the guiding image and has the properties of noise reduction and edge retention. Therefore, guided filters are widely used to optimize fusion weight maps in image fusion. 26,27 The guiding image is the key to determining the filtering effect, which can be the same or different from the input image but must be given in advance. The guided filter is implemented by a sliding calculation of the local window. For a square sliding window w k of size r × r, the linear relationship between the guiding image G and the output image O can be expressed as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 6 ; 2 6 2 where ða k ; b k Þ is the linearity factor of the sliding window w k . The linear coefficients are significant, and solving for them is a least-squares optimization process. Optimization aims to solve a set of ða k ; b k Þ such that the difference between the input image window T and the output image window O is minimized. Based on the above, the optimization objective function can be defined as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 4 ; 1 1 6 ; 1 7 1 where ε is the regularization parameter and ε > 0, a k and b k are calculated as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 1 1 6 ; 1 1 5 where μ k and σ 2 k are the average values and variances values of G and the window w k . T k is the mean value of the window w k in T.

Proposed Fusion Method
In this section, we elaborate on the implementation process of the proposed fusion method, including the basic framework, the rules for low-frequency coefficients, and high-frequency coefficients fusion.

Basic Framework
We use the hybrid method of GIHS-NSCT to fuse optical and SAR images. The overall methodological framework of the algorithm is shown in Fig. 2. In Fig. 2, the input optical and SAR images have been registered using the method proposed by Yan et al. 28 and can be directly used for pixel-level fusion. GIHS-NSCT first acquires the luminance image of the optical image with GIHS. Then the luminance and SAR images are multi-scale fused based on NSCT to obtain the fused luminance image. Finally, the original optical and the new luminance image are fused using GIHS. The key to determining the quality of fusion is the feature maps and fusion weight maps corresponding to the low-frequency and high-frequency coefficients. The quality of the feature map depends on the feature extraction method and the feature measurement used. The key to the fusion weight map lies in the fusion weight calculation method and the activity measurement operator. The main steps of the GIHS-NSCT fusion method are as follows: 1. Perform basic pre-processing of optical and SAR images, respectively, and upsample the optical image to the exact resolution as the SAR image.

Rule for Low-frequency Coefficients
The low-frequency sub-band approximates the image, which contains the main contour features. The low-frequency sub-bands are also crucial for determining the fused image's spectral distortion. Therefore, considering the significant nonlinear radiometric differences between optical and SAR images, we use the feature gain method for fusion when calculating the fused lowfrequency coefficients. The fusion is weighted only at specific features of the low-frequency subband of the SAR image. The weights of the SAR image elements at non-specific features are all 0, while the weights of the optical image elements are set to 1. This fusion method can effectively reduce the spectral distortion in the fused image caused by the nonlinear radiation difference. 29 The fusion process of the low-frequency sub-band is shown in Fig. 3. The images used in Fig. 3 are rendered for easy observation.
In the fusion process, the common features of the low-frequency sub-bands of SAR and I are firstly calculated according to the Eq. (7): E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 7 ; 1 1 6 ; 5 7 3 FL common ¼ minfFL I ; FL SAR g: Since the low-frequency sub-bands are the approximation of the image features, we take the lowfrequency sub-bands of I and SAR directly as the feature maps, which is to let FL I ¼ L I ; FL SAR ¼ L SAR . Thus, the common feature FL common of the low-frequency sub-bands of I and SAR images is calculated as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 8 ; 1 1 6 ; 4 9 2 FL common ¼ minfFL I ; FL SAR g ¼ minfL I ; L SAR g: Based on Eq. (8), the peculiar features of the LF sub-band of the SAR image are given as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 9 ; 1 1 6 ; 4 4 8 The method of the fused low-frequency sub-band calculated based on the feature gain injection method is given as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 0 ; 1 1 6 ; 7 1 1 where L new is the sub-band of fused low-frequency, ρ is the injection coefficient of the unique features of the low-frequency sub-band of SAR, calculated as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 1 ; 1 1 6 ; 6 5 6 where entropyð·Þdenotes the entropy of the corresponding image.

Rule for High-frequency Coefficients
The high-frequency sub-bands are multi-directional detailed images of the original image, rich in details and textures. Meanwhile, the high-frequency coefficients are also crucial in determining the degree of spatial distortion of the fused image. Therefore, when fusing high-frequency, we introduce the image divergence, which is sensitive to the points near the texture edge, as the feature activity metric to accurately extract and describe the point features of the high-frequency sub-bands. The divergence of a point in the image precisely describes its degree of clustering in the gradient field. The larger divergence value indicates a greater divergence of the point in the gradient field and a higher probability of the point being a feature point on the edge of the texture. 30 Therefore, using divergence as the active measure in high-frequency fusion can accurately describe the feature saliency of all image elements and thus acquire complete feature maps. The NSCT method allows us to obtain detailed images of the source in multiple directions at multiple scales. Each detailed image can be considered as a single-channel image in a twodimensional (2D) cartesian coordinate space, and the divergence of the image is calculated in the gradient field. For a 2D field Uðx; yÞ, the gradient at ðx; yÞ is calculated as For a 2D vector field, the divergence of Vðx; yÞat ðx; yÞ is formulated as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 3 ; 1 1 6 ; 3 3 6 divVðx; yÞ ¼ ∇Vðx; yÞ ¼ ∂V ∂x þ ∂V ∂y : Based on the Eqs. (12) and (13), the divergences of an image are given as Since SAR images are seriously polluted by noise, there is still some noise in the speckle-filtered SAR images, which may reduce the fusion quality. Unfortunately, according to the calculation principle of divergence, the image divergence is the second-order image gradient, and the gradient operator is not robust to the noise in the image. Therefore, the divergence is used as the activity measure for the fusion of high-frequency sub-bands to calculate the fusion weights, which is challenging to overcome the influence of noise on fusion quality. To address the above problems, we utilized the guided filter in the fusion process of high-frequency sub-bands, optimized the weight maps obtained from the divergence calculation, and used the strong correlation between pixels to improve the fusion quality of detail images. 31 The fusion process of high-frequency sub-bands is shown in Fig. 4

Experiment
This section introduces the datasets used in the experiments, the indicators for objective evaluation of the fusion results, and the comparative analysis of the experimental results. All experiments were performed using MATLAB2020a on a computer with NVIDIA Quadro P4000 GPU and Intel Xeon W-2102 CPU.

Datasets
We arranged three sets of experiments, and the datasets used in the experiments consisted of three groups of optical and SAR images. The datasets used in the experiments contain three groups of optical and SAR images. In experiment 1, the main scene of the data is farmland, which includes a scene of airborne SAR images and a scene of Google Optical images with sub-meter resolution. In experiment 2, the main scene of the data is the city, which includes a scene of the GaoFen-3 SAR image and a scene of the GaoFen-1 multispectral image with meter-level resolution. In experiment 3, the main scenes of the data are mountains and lakes, including one scene of Sentinel-1 SAR image and one scene of Landsat8 image, with 10-m level resolution. Through three sets of experiments, the proposed algorithm's effectiveness is verified from multi-source, multi-scale, and multi-scene perspectives. It is worth stating that the test images used in the experiment were completely pre-processed. The SAR images are processed in the SARscape toolbox in ENVI, which includes import, multilooking, speckle filtering, geocoding, and radiometric calibration. The specific parameters of the experimental data set are shown in Table 1.

Evaluation Metrics
To evaluate the performance of the fusion methods, root mean square error (RMSE), 32 Erreur relative globale adimensionnelle de synthèse (ERGAS), 33 universal image quality index (UIQI), 34 spectral angle mapper (SAM), 35 and quality with no reference (QNR) 36 are used to evaluate the quality of fusion results quantitatively. Among them, SAM measures the degree of spectral distortion of the fusion result, and the smaller the value, the smaller the spectral distortion. UIQI also called the Q index, measures the fused image's correlation, luminance, and contrast distortion. Its value ranges from [−1; 1], and a higher Q value indicates higher image quality. RMSE measures the global spectral distortion, and the smaller the value, the smaller the global spectral distortion. ERGAS can reflect the overall image quality, and the smaller the ERGAS value of the fused image, the higher the fusion quality. QNR is a comprehensive evaluation index that contains two parts: spectral distortion D λ and spatial distortionD β . The smaller the value of D λ and D β , the smaller the spectral and spatial distortion, while the larger the value of QNR, the higher the quality.

Experimental Results and Comparison
The comparison methods used in the experiments include the IHS 37 and PCA 38 methods that belong to the CS class, the NSCT-PC 39 method that belongs to the MSD class, and the IHSwavelet 14 and the NSCT-mean 40 method in the hybrid method that couples CS and MSD.

Subjective evaluation
The fusion results of experiment 1 are shown in Fig. 5. The IHS and PCA methods of the CS class can inject the spatial structure information of the SAR image into the optical image more completely. Still, simultaneously, they also cause severe spectral distortion. In contrast, the hybrid methods can effectively reduce the spectral distortion while retaining more spatial information. The IHS-wavelet and the NSCT-mean methods show different degrees of global brightness reduction than the original optical image. NSCT-PC and the proposed method have similar  results in spectral retention, while the fused image obtained by the proposed method has more distinct features; therefore, the proposed method has the best fusion performance in experiment 1.
The fusion results of experiment 2 are shown in Fig. 6. The IHS and PCA methods of the CS class can thoroughly remove the clouds when fusing SAR and optical remote sensing images affected by cloud occlusion. Although the direct component replacement method does not need to take into account the information of the optical images and can thoroughly remove the occluded clouds, it severely distorts the spectral information in the fusion results. The principle of the CS method determines this. The direct component replacement method does not need to consider the information of the optical image. It uses the SAR image replacement directly, which can remove the occluded clouds, but it also severely distorts the spectral information in the fusion result. The fusion results of the hybrid methods inject the spatial information of the SAR image into the part occluded by the cloud in the optical image. Such methods cannot thoroughly remove the obscured clouds but retain more spectral information and effectively inject spatial information. Among them, the spectral preservation of the IHS-Wavelet and NSCT-mean methods are relatively low. The spectral protection of the NSCT-PC method and the proposed method achieve similar results. Nonetheless, since the features of NSCT-PC injection are less evident than those of the proposed method, the fused image of the proposed method is of the highest quality in experiment 2 in a comprehensive view.
The fusion results of experiment 3 are shown in Fig. 7. The main scenes of the experimental data are mountains and lakes. By fusing Landsat8 and Sentinel-1 SAR images, the distinct features in the SAR images are injected into the optical images, making fusion images rich with structural and spectral information. In terms of structural feature integrity, the PCA and NSCT-PC methods of injection do not achieve the expected results of the experiment. IHS, IHS-Wavelet, NSCT-mean, and the proposed method are all capable of injecting intact, stereoscopic structural features from SAR images into optical images. Among them, the fusion results of NSCT-mean and the proposed method have similar results, but the mountainous features in the fused image of the NSCT-mean are not as evident as those of the proposed method. Thus, the proposed fusion method performs the best in experiment 3. Tables 2-4 show the evaluation results of the fusion results for the three groups of experiments. As seen from the three tables, compared with the IHS and PCA methods of the CS class, the hybrid method can retain more spectral information of the optical images while injecting the spatial information of SAR images into the optical images completely and clearly. Therefore, the hybrid method is more suitable for fusing SAR and optical remote sensing images. SAM, RMSE, ERGAS, Q, and D λ can measure the spectral distortion of the fused images. From the index results, the spectral distortion of IHS and PCA methods of the CS class is the most severe, while the spectral distortion of the proposed method is the smallest. The evaluation results of   D β show that among the hybrid methods, IHS-wavelet and NSCT-mean have similar spatial information retention. In contrast, the proposed method has the highest spatial information retention. In terms of the comprehensive quality of the fusion results, the evaluation of the QNR showed that the hybrid methods have higher comprehensive qualities of the fused images than the methods of the CS and MSD classes.

Conclusions
We have made improvements in two aspects to solve the problems that fused images of SAR and optical remote sensing images often have significant spectral and spatial distortion.
1. We use gain injection to achieve fusion in the low-frequency sub-band fusion process.
Since the gain injection is performed only at the unique features of the SAR image for feature gain injection, this effectively reduces the spectral distortion of the fused image. 2. We introduced the guide filter when fusing high-frequency sub-bands. The noise reduction features and edge preservation of the guide filter are used to optimize the fusion weight template, effectively reducing the fused image's spatial distortion.
A limitation of the proposed method in this paper is that it is time-consuming. Therefore, reducing the time consumption of the fusion process will be one direction of our next research in the future. Our experiments found that the time consumption is mainly concentrated in NSCT MSD and reconstruction. Therefore, we will try some fast MSD methods, such as fast NSCT, 41 fast finite Shearlet transform, 42 etc. In addition, we plan to improve the fusion quality by using some active metric operators that are robust to noise and nonlinear radiometric differences, such as phase congruency features. Zhuang Shi is currently working toward his PhD degree in surveying and mapping from Lanzhou Jiaotong University, Lanzhou, China. His research focuses on remote sensing image processing and analysis.
Xiaoqiang Hu is currently working toward his MS degree in surveying and mapping from Lanzhou Jiaotong University, Lanzhou, China. His research focuses on remote sensing image processing and analysis.