## 1.

## Introduction

Modern space-borne imaging sensors deliver high-spectral resolution multispectral [multispectral spectral image (MSI)] as well as corresponding high-spatial resolution panchromatic images. While one can choose between the two sets of images based on the application, the tradeoff between spectral and spatial resolutions has existed for decades. Recently, several applications, such as feature extraction, image segmentation, change detection, and land-cover classification, require both spatial and spectral images for detection of fine features in suburban or urban scenes. For example, in building detection, a typical residential 43-ft wide house occupies fewer than 7 pixels on one side (at 2-m resolution in the MSI), which makes it very difficult for detection algorithms to discern salient line or edge features. While the current instruments are not capable of providing both spatial and spectral high-resolution images either by design or by observational constraints, pan-sharpening can well serve as a tool to fuse the MSI and panchromatic images.

The literature shows a large collection of pan-sharpening methods developed,^{1}2.3.4.5.6.^{–}^{7} and recent reviews can be found in Refs. 89.–10. The existing methods can be roughly categorized into three groups: component substitution (CS) method,^{11}12.^{–}^{13} relative spectral contribution (RSC) method,^{14}^{,}^{15} and multiresolution (MR) method.^{16}17.18.^{–}^{19} More detailed categorizations are surveyed in Refs. 8–10. The CS method substitutes a high-resolution image for the selected band after spectral transformation. The RSC methods increase the spatial details through arithmetical calculations with the panchromatic image. The MR methods, based on wavelet decompositions and Laplacian pyramids, inject high-frequency components to each band of the MSI. Various methods have been proposed based on this framework to reduce spatial and spectral distortions such as context-based decision injection,^{16} mean-square-error minimization,^{18} and the introduction of sensor spectral response.^{20} However, these approaches enhance the image by adding details to each multispectral band weighted by certain coefficients separately, which is by nature a band-by-band process. Furthermore, they assume a correlation between the PAN and each MS band. Such an assumption may not hold for some spectral bands such as the near infrared band (NIR-2) for WorldView-2 images, which has no spectral overlap with the PAN band.^{21}

Among the existing methods, some of them have been shown to produce very high-quality fused imagery such as Gram–Schmidt method,^{22} generalized minimum mean-square-error (gMMSE) method,^{18} and University of New Brunswick (UNB) sharpening method.^{13} The Gram–Schmidt method is a multivariate statistics-based approach. It is patented by Kodak and widely used in many commercial image-processing packages. The gMMSE method is an optimized, MR analysis-based approach that uses general Laplacian pyramid. The gMMSE method injects higher frequency resolution from the PAN to the MSI and minimizes the root mean squared errors by optimizing fusion parameters through enhancing a degraded version of the MSI and the PAN. The authors of gMMSE claim to produce very satisfactory results that outperform the winner of the IEEE Data Fusion Contest of 2006.^{23} The UNB method is a proprietary algorithm and can be classified as a CS-based statistical approach.^{23} The method has been shown to produce a pan sharpened image of very high spatial quality.^{24}

In this article, a novel pan-sharpening method is proposed which uses the pixel spectrum as its smallest unit of operation and generates resolution-enhanced spectral images using a mixture model. The underlying assumption of our approach is that each new spectrum in the high-resolution fused image is a weighted combination of the immediate neighboring superpixel spectra in the low-resolution spectral image. The weights are controlled by a diffusion model inferred from the panchromatic image that relates the similarity of the pixel of interest to the neighboring superpixels. The per-pixel-spectrum operation nature is different from the existing algorithms, which mostly rely on a band-by-band processing. The results show that our algorithm is capable of preserving very sharp spatial features indiscernible from the multispectral images while preserving spectral information from multispectral images. This is particularly important for applications (such as land-cover classification) that rely on accurate spectral information beyond traditional visual inspection. In addition, our approach is fairly straightforward and highly parallel. It can be implemented using OpenMP and compute unified device architecture (CUDA) parallel processing techniques, which significantly reduces the processing time to a satisfactory timeframe.

This article is organized as follows. Section 2 introduces the algorithm for our nearest-neighbor diffusion-based pan-sharpening method. The results of our method are shown in Sec. 3 together with a discussion of comparison with state-of-the-art algorithms. Finally, we provide a discussion and future work in Sec. 4.

## 2.

## Method

The pan-sharpening algorithm follows the flowchart shown in Fig. 1. The algorithm works in two branches. In the left branch of Fig. 1, a spectral band photometric contribution vector $\mathbf{T}$ is obtained through linear regression. The vector is of size $b\times 1$, where $b$ is the number of MSI bands. It relates the contribution of the digital counts of each multispectral band to the panchromatic image. We assume

where $\tilde{P}(u,v)$ is the digital count of pixel $(u,v)$ in the downsampled PAN, $T(i)$ is the $i$’th value in vector $\mathbf{T}$, ${M}_{i}(u,v)$ is the digital count of corresponding pixel in the $i$’th band of MSI, and accounts for regression error. This assumption is valid, since the spectral response function of each single band in MSI does not overlap much with each other and that the ensemble of all MSI bands can cover the spectral range of the PAN in general. In practice, if certain MSI bands do not overlap with the PAN spectral bandwidth, $T(i)$ for these bands should be zero or very close to zero. Since PAN and MSI are not of the same spatial size, to obtain $\mathbf{T}$, we need to downsample the PAN to fit the size of the MSI and then perform the linear regression. The vector $\mathbf{T}$ is useful in the following steps to normalize the spectral values. $\mathbf{T}$ can also be obtained from the sensor spectral radiance responses.^{21}

In the other branch of the flowchart, the difference factors $N[9]$ from neighboring superpixels are acquired for each pixels from the PAN at the original resolution. The factors are calculated from

## (2)

$${N}_{j}(x,y)=\sum _{(p,q)\in {\mathrm{\Omega}}_{j}(x,y)}|P(x,y)-P(p,q)|\phantom{\rule[-0.0ex]{2em}{0.0ex}}\phantom{\rule{0ex}{0ex}}j=1,2,\dots 9,$$^{25}

^{,}

^{26}Eq. (2) poses as a valid approximation to this ideal case, since the diffusion areas are fairly small. In addition, the estimation can significantly reduce the computation time. It is worth mentioning that the integration regions are not only suitable for $4\times 4$ resolution scale, but also applicable to any integer resolution scale as well. The general rule for mapping the integration regions is to cover the $i$’th superpixel as well as the center subpixels leading to that superpixel. Although there is no rigorous physical evidence to support such mapping scheme, it is consistent with our intuition and has been empirically tested to perform very well in various scenarios.

${N}_{j}(x,y)$ provides a similarity metric between pixel $(x,y)$ to its neighboring superpixels. It is then possible to generate the new spectrum simulating the problem of anisotropic diffusion^{27} as

## (3)

$$\mathbf{HM}(x,y)=\frac{1}{k(x,y)}\sum _{j=1}^{9}\mathrm{exp}[-\frac{{N}_{j}(x,y)}{{\sigma}^{2}}]\times \mathrm{exp}[-\frac{\parallel (x,y)-({x}_{u,v},{y}_{u,v}){|}_{x,y,j}\parallel}{{\sigma}_{s}^{2}}]\mathbf{M}(u,v;x,y,j),$$## 3.

## Results

Our algorithm has been implemented to work with no sensor dependency. In this article, we tested our algorithm on a number of WorldView-2, GeoEye-1, and USGS EO-1 sensor images. These scenes are summarized in Table 1. The spatial smoothness factor ${\sigma}_{s}$ is set to 2.5 for $1:4$ spatial ratio and 1.9 for $1:3$ spatial ratio. The intensity factor $\sigma $ is set adaptively using local similarity, so that

## (5)

$${\sigma}^{2}(x,y)=\mathrm{min}[{N}_{j}(x,y)]\phantom{\rule[-0.0ex]{2em}{0.0ex}}j=1,2,\dots 9,$$## Table 1

Description of the dataset used in this article.28–31

Scene name | PAN size (pixels) | Sensor type | PAN resolution (m) | Resolution scale | PAN spectral range (nm) | MSI spectral range (nm) |
---|---|---|---|---|---|---|

Parking lot | 652×652 | WorldView-2 | 0.52 | 4 | 450 to 800 | 400 to 1040 |

Recycling site | 652×652 | WorldView-2 | 0.52 | 4 | 450 to 800 | 400 to 1040 |

Rome | 480×480 | GeoEye-1 | 0.41 | 4 | 450 to 800 | 450 to 920 |

Victor mall | 903×903 | EO-1 | 10 | 3 | 480 to 690 | 433 to 2350 |

## 3.1.

### Spatial Analysis

The first test scene is a WorldView-2 image of a parking area in Rochester, New York, captured from June 2009 shown in Fig. 3. The scene is very complex, in which it comprises abundant objects (roads, vegetation, houses, and cars) and a great number of them are fairly small. We performed the fusion technique described in Sec. 2. The spectral band contribution vector $\mathbf{T}$ is fitted from linear regression using all the pixels across the entire scene and is shown in Table 2. In this step, the PAN image is downsampled to match the spatial size of the MSI; thus the regression can be carried out using all the pixels in the downsampled PAN and the MSI. The error is calculated as the root mean squared error over the mean value of the PAN. Since $\mathbf{T}$ is obtained through linear regression, it may not exactly reflect the contribution of each multispectral band to the panchromatic band; but the values in Table 2 show that the digital counts in PAN are mostly contributed from the blue to the red edge bands, which is consistent with the spectral radiance response of the WorldView-2 sensor.^{21} Although we see a slight negative contribution from the Coastal band, the value is fairly small and does not appear to have any effect on our final results.

## Table 2

The spectral band contribution vector for the parking area scene.

Band no. (i) | Coastal | Blue | Green | Yellow | Red | Red edge | NIR1 | NIR2 | Error |
---|---|---|---|---|---|---|---|---|---|

Ti | −0.026 | 0.214 | 0.059 | 0.174 | 0.190 | 0.136 | 0.060 | 0.038 | 2% |

The synthesized image of our diffusion-based pan-sharpening approach is shown in Fig. 4, where the RGB bands and the color-IR bands are displayed and their histograms are matched with the images shown in Fig. 3 for visual comparison. The pan-sharpened images show good spatial and color quality. The algorithm preserves the strong edges well in both the PAN and the MSI. One can clearly see the cars and the parking lane marks as well as the edges of the buildings. The RGB colors in the MSI are well maintained for the uniform areas such as the building rooftops and the parking spots. The color-IR image also indicates that our method works not only for RGB, but also for other bands as well.

The second test scene is a recycling site also from the WorldView-2 sensor. The PAN, RGB, and color-IR from the MSI are shown in Fig. 5. This site possesses a lot of fine spatial details. The pan-sharpening result is shown in Fig. 6. Similar to the results of the parking site scene, the image preserves well the spatial details with very accurate spectral/color information.

We also tested our algorithm on a GeoEye-1 sensor image of Rome, Italy, shown in Fig. 7. The imagery is composed of four bands covering visible and near-infrared. This scene contains rich structural details in the Colosseum and the building complexes. The pan-sharpened image is shown in Fig. 8. The spectral band contribution vector, obtained through linear regression, shows that the PAN band correlates mostly with the green and red bands, next highest with near-infrared and least with the blue band; this is reasonable considering that the PAN band covers from visible to near-IR wavelengths from 450 to 900 nm. The fused image shows that our algorithm also works well for complex urban GeoEye-1 sensor imagery. The method can successfully recover missing structural elements in the MSI from the PAN; on the other hand, the color-IR image indicates that the algorithm also properly preserves the signals in the NIR-2.

In an effort to evaluate the performance of our methods on lower resolution imagery, we tested our algorithm on the USGS EO-1 dataset scene of Victor, New York, shown in Fig. 9. The EO-1 multispectral dataset contains nine spectral bands with 30-m spatial resolution for the spectral image and 10-m resolution for the PAN band. Although the spatial resolution ratio between the PAN band and the multispectral bands is $3:1$, we can still follow the similar pattern in Fig. 2 to calculate local difference factors. The resultant pan-sharpened image shown in Fig. 10 indicates that our algorithm also works well on lower resolution imagery.

## 3.2.

### Spectral Analysis

In addition to visually observing the four complex scenes shown in the previous section, we also compared our results with several existing state-of-the-art pan-sharpening methods. These methods include Gram–Schmidt method,^{22} gMMSE method,^{18} and UNB sharpening method.^{13} These methods have been shown to produce high-quality fusion imagery, and brief descriptions of them can be found in the Sec. 1. The Gram–Schmidt method is integrated in ENVI; the authors of gMMSE have published a standalone package on their website, and the UNB method is distributed in the FuzeGo package that works on a trial license. Because these algorithms are available, a comparison is possible.

The visual comparison of these algorithms can be made in Fig. 11 on a residential area from WorldView-2 image. It can be seen that all of these pan-sharpening approaches produce acceptable results. However, the edges in the Gram–Schmidt pan-sharpened image appear blurred. The gMMSE method produces better results, but the edges do not resemble the same level of sharpness in the PAN band. One possible reason is that both of these methods rely on a bicubic interpolation (or an alternative) as their basis for further processing, which leads to overly smooth edges if not correctly compensated. In comparison, both the UNB and our method produce images of very sharp contrast.

## 3.2.1.

#### Pixel spectra evaluation

The spectral fidelity is another factor for evaluation of pan-sharpening algorithms. In this effort, we sample a number of signature pixels in relatively uniform areas to assess the spectral difference. These pixels are marked in the PAN image in Fig. 11. These pixels cover a large set of materials in the scene and are uniform around their neighborhood. We use the spectra from the original MSI image as ground truth for comparison with a reasonable assumption that the spectra around such uniform areas are most likely unchanged. The spectra resulting from the pan-sharpening algorithms are shown in Fig. 12, and the spectral difference expressed in spectral angle difference and Euclidean distance difference are shown in Table 3. It can be seen that both the gMMSE and our method can well preserve the spectra of these uniform pixels, whereas the spectra from Gram–Schmidt and UNB methods are very different from the truth. The majority of the error in our method comes from slight spectral bleeding from regions with weak edges. For example, the rooftop edge between pixel #7 and the grass is very weak in the PAN band; due to the fact that our algorithm is based on the difference distribution on the PAN, it allows a small amount of diffusion from the grass spectrum. The result is a slight increased digital count in the infrared band of the pan-sharpened pixel spectrum. In general, our method is seen to produce very small error by comparison. The spectral distortion by the UNB method can also be reflected by observing the color RGB images in Fig. 11; for example, the asphalt pavement is much darker in the UNB pan-sharpened image than in the original MSI image, and the color of the house rooftops is shifted. While the spectra in the visible regions are intact, the Gram–Schmidt method also suffers from severe spectral distortion in the near-infrared region. The Gram–Schmidt method assumes a statistical correlation between the PAN and each MSI band; however, the PAN band for the WorldView-2 imagery does not cover the NIR-2. The lack of correlation or even existence of anti-correlation results in the distortion of spectra in the infrared bands. One can also observe such spectral distortion in the NIR-2 from the images fused by both gMMSE and UNB; while our algorithm performs very well on most of the infrared regions, as shown in Fig. 12. In fact, most of the existing algorithms produce pan-sharpened images based on the band-to-band correlation, whereas our method treats the pixel spectra as the smallest element of operation which leads to relatively smaller spectral distortion even for the noncorrelated or low-correlated bands. In addition, our method operates locally and the results are always identical with no dependence on the size of the scene with given $\mathbf{T}$, $\sigma $, and ${\sigma}_{s}$. In comparison, most of the existing pan-sharpening algorithms rely, to a certain degree, on the global statistics of the scene, which may lead to somewhat different results depending on the scene content.

## Table 3

Spectral differences of sampled pixels in Fig. 12.

Euclidean distance difference | Spectral angle difference | |||||||
---|---|---|---|---|---|---|---|---|

G-S | gMMSE | UNB | NNDiffuse | G-S | gMMSE | UNB | NNDiffuse | |

Road (#1) | 324.08 | 79.04 | 199.73 | 8.92 | 0.182 | 0.064 | 0.152 | 0.0021 |

Path (#2) | 193.54 | 71.94 | 104.38 | 18.90 | 0.188 | 0.084 | 0.119 | 0.0038 |

Roof (#3) | 212.46 | 49.97 | 140.43 | 5.37 | 0.161 | 0.055 | 0.161 | 0.0015 |

Water (#4) | 177.09 | 154.70 | 162.74 | 40.96 | 0.121 | 0.145 | 0.148 | 0.0056 |

Shadow (#5) | 120.62 | 27.00 | 87.22 | 2.93 | 0.109 | 0.033 | 0.082 | 0.0032 |

Grass (#6) | 63.40 | 42.64 | 49.64 | 35.87 | 0.029 | 0.017 | 0.024 | 0.0162 |

Roof (#7) | 368.92 | 37.72 | 250.53 | 29.87 | 0.191 | 0.029 | 0.162 | 0.0017 |

Path (#8) | 183.79 | 96.91 | 108.04 | 43.18 | 0.117 | 0.073 | 0.099 | 0.0176 |

Note: Best (smallest) values are shown in bold.

## 3.2.2.

#### Statistical comparison

Another popular evaluation technique degrades the PAN and MSI images and uses the original MSI as the ground truth for comparison; the metrics include spectral angle mapper (SAM), spectral Euclidean distance (EUD), and erreur relative globale adimensionnelle de synthese ( ERGAS).^{32} SAM between two spectral vectors ${\overrightarrow{v}}_{1}$ and ${\overrightarrow{v}}_{2}$ is defined as

## (6)

$$\mathrm{SAM}({\overrightarrow{v}}_{1},{\overrightarrow{v}}_{2})={\mathrm{cos}}^{-1}\left(\frac{{\overrightarrow{v}}_{1}\xb7{\overrightarrow{v}}_{2}}{\Vert {\overrightarrow{v}}_{1}\Vert \times \Vert {\overrightarrow{v}}_{2}\Vert}\right),$$## (7)

$$\mathrm{EUD}({\overrightarrow{v}}_{1},{\overrightarrow{v}}_{2})=\Vert {\overrightarrow{v}}_{1}-{\overrightarrow{v}}_{2}\Vert ,$$^{18}

^{,}

^{24}

^{,}

^{33}

^{,}

^{34}that such an evaluation scheme is not practical especially for high-resolution data particularly in highly detailed urban areas where pan-sharpening is most needed. Thus, such comparison is only provided as one factor for evaluation.

## Table 4

Comparison of the accuracy statistics for the degraded WorldView-2 recycling site scene.

SAM | EUD | ERGAS | |
---|---|---|---|

Bi-cubic | 0.0588 | 31.5032 | 3.0698 |

Gram–Schmidt | 0.0834 | 40.0608 | 3.3547 |

gMMSE | 0.0608 | 23.2981 | 2.1721 |

UNB | 0.0798 | 31.8118 | 2.6984 |

NNDiffuse | 0.0578 | 22.9945 | 2.1679 |

Note: Best (smallest) values are shown in bold.

In this section, we performed a comprehensive comparison of different pan-sharpening algorithms through visual comparison, spectral comparison, and global statistics. Visual comparison indicates both UNB and our nearest-neighbor diffusion algorithm are more capable of enhancing the spatial sharpness than the other two methods; on the other hand, spectral accuracy is better reserved by both gMMSE and our algorithm through spectral and global statistics comparisons. Our method has shown to significantly suppress spectral angle distortion in comparison with the other methods. The comprehensive comparisons carried out in this section suggest that our nearest-neighbor diffusion-based approach can produce higher quality pan-sharpened spectral images than some state-of-the-art existing algorithms.

## 4.

## Discussion and Future Work

In this article, we have shown our nearest-neighbor diffusion-based pan-sharpening method to have superior performance both in spatial/spectral quality and in computational time. The novelty of our approach lies in its per-spectra operation and the utilization of a diffusion model to solve the multispectral fusion problem. Our method operates locally and does not rely on global optimization and thus will produce identical results regardless of the scene size or scene contents. The parallel nature of our algorithm allows fast processing on multicore devices and achieves significant reduction of processing time.

Our method relies on two external parameters: intensity smoothness factor $\sigma $ and spatial smoothness factor ${\sigma}_{s}$ that control the smoothness of the diffusion. Smaller values of $\sigma $ restricts diffusion and thus produces sharper images but also introduces more noise, whereas larger values of $\sigma $ will produce smoother contents with less noise. The proper choice of $\sigma $ can be determined based on the application of the pan-sharpened image. $\sigma $ is suggested to be tuned to a smaller value if it is for visual observation and inspection, while applications like classification and segmentation can use a larger $\sigma $ to enhance strong structural features and to suppress noise artifacts. In addition, the scene contrast and complexity can also affect the choice of $\sigma $. Scenes with high contrast in PAN will need less diffusion sensitivity thus entail higher $\sigma $, and the same is true for complex scenes to reduce the influence of possible noise. In practice, ${\sigma}^{2}$ is set to dynamically adjust to local similarity, as shown in Eq. (5), and ${\sigma}_{s}$ is set to a value that will mostly resemble a bicubic interpolation kernel, which roughly gives ${\sigma}_{s}=\text{scale}\times 0.62$. It is necessary to point out that the choice of $\sigma $ and ${\sigma}_{s}$ does not appear to have much impact on the visual results but will produce better numerical results compared with our previous findings.^{35}

General readers may be interested in the execution speed of our algorithm, and we have implemented the algorithm in C/C++ to examine this. Our algorithm performs intensive localized operations. It is natural to extend our algorithm to work in parallel by taking advantage of the multicore architectures in modern day central processing unit (CPUs) and graphics processing unit (GPUs). We used OpenMP to accelerate our algorithm on the CPU. On the GPU, the algorithm is implemented using the CUDA architecture. Empirical analysis on an image set made up of a $4000\times 4000$ panchromatic image and a $1000\times 1000$ multispectral image showed an increase in processing speed of one order of magnitude on the OpenMP CPU and two orders of magnitude on the GPU. More analysis is required to fully characterize the computational speed-up potential of this algorithm.

The proposed algorithm works well for urban scenes especially for objects with sharp edges observable in the PAN, but it may occasionally suffer from spectral bleeding at lower contrast PAN edges. In these cases, our algorithm may benefit by using IHS or UNB pan-sharpened images as a prior to produce difference factor $N[9]$, since these two methods have been known to preserve very high-spatial details. Furthermore, the difference factors can incorporate not only difference in the PAN, but also spectral difference from the MSI.^{36} In our method, we normalized the final spectra by stipulating pixel radiometric integrity ($\mathbf{HM}\times \mathbf{T}=p$); however, other constraints can also be incorporated to obtain more accurate results, such as the spectra radiometric integrity calculated by

In addition, our algorithm always assumes positive diffusion weights due to the exponent, and the implication is that only spectral summation, but no subtraction, is considered. In practice, spectral subtraction is needed to produce subpixel spectral accuracy; thus, adjusting diffusion weights to incorporate negative diffusion can also be designed to improve spectral and spatial accuracies.

## Acknowledgments

The authors are grateful to DigitalGlobe and USGS for generously providing the WorldView-2, GeoEye-1 and EO-1 imagery. The author (WS) would like to thank Dr. Sangmook Lee from Corning Incorporated for his useful suggestions in programming with CUDA and Dr. John R. Schott for providing valuable resources for algorithm improvement. This work is supported by Department of Energy Grant Number DE-NA0000444.

## References

## Biography

**Weihua Sun** is a PhD student at the Center for Imaging Science at Rochester Institute of Technology (RIT). His research interests are imaging processing, machine learning, and algorithm development on multispectral and hyperspectral images. Currently, his work is focused on feature extraction from aerial images to aid scene development. He also has a MS and BS degrees in physics from Nanjing University, where his research area was design, simulation, and fabrication of optical metamaterials.

**Bin Chen** received his BS degree in optical information science and the MS degree in optical engineering from Nanjing University of Science and Technology, Nanjing, China, in 2006 and 2008, respectively. He is currently pursuing the PhD degree in imaging science at Rochester Institute of Technology, Rochester, New York. His research interests include object recognition in high-resolution remote sensing images using hybrid algorithms and multisource data fusion.

**David W. Messinger** received a bachelor’s degree in physics from Clarkson University and a PhD in physics from Rensselaer Polytechnic Institute. He is an associate research professor in the Chester F. Carlson Center for Imaging Science at the Rochester Institute of Technology, where he is the director of the digital imaging and remote sensing laboratory. His research focuses on remotely sensed spectral image exploitation using physics-based approaches and advanced mathematical techniques.