Decision fusion for dual-window-based hyperspectral anomaly detector

Abstract. In hyperspectral anomaly detection, the dual-window-based detector is a widely used technique that employs two windows to capture nonstationary statistics of anomalies and background. However, its detection performance is usually sensitive to the choice of window sizes and suffers from inappropriate window settings. In this work, a decision-fusion approach is proposed to alleviate such sensitivity by merging the results from multiple detectors with different window sizes. The proposed approach is compared with the classic Reed-Xiaoli (RX) algorithm as well as kernel RX (KRX) using two real hyperspectral data. Experimental results demonstrate that it outperforms the existing detectors, such as RX, KRX, and multiple-window-based RX. The overall detection framework is suitable for parallel computing, which can greatly reduce computational time when processing large-scale remote sensing image data.


Introduction
Hyperspectral imagery (HSI) contains hundreds of contiguous spectral bands which enable the discrimination of different materials and make a variety of potential civilian and military applications possible. 1,2 Target detection is the ability to detect a low-probability target with a known signature from an unknown background. [3][4][5] When the target spectral signature is unknown, unsupervised anomaly detection has to be applied, which is a method used to find anomalous pixels whose spectral signatures are different from their surroundings. 6,7 As a classic anomaly detector, the Reed-Xiaoli (RX) algorithm [8][9][10] was developed under a hypothesis testing where the conditional probability density functions under the two hypotheses (without and with anomaly) are assumed to be Gaussian. The solution turns out to be an adaptive Mahalanobis distance between the pixel under test and the local background. It is preferred to use local background to capture nonstationary statistics, and its advantage of using a global background covariance matrix has been demonstrated in the literature. [11][12][13] The RX detector has become the benchmark of anomaly detection algorithms in HSI. Obviously, the key to success is an appropriate estimate of a local background covariance matrix for effective background suppression. An adaptive RX detector employs a dual-window strategy: the inner window is slightly larger than the pixel size, the outer window is even larger than the inner one, and only the samples in the outer region (i.e., between the frames of inner and outer windows) are used to estimate the background covariance matrix to avoid the use of the potential anomalous pixels. Intuitively, the number of pixels in the outer region (related to the sizes of inner and outer windows) should be more than the number of bands so that the resulting covariance matrix can be full-rank for inverse matrix operation. However, even when the covariance matrix is ill-rank, its inversion can still be computed by several strategies, such as eigen-decomposition and reconstruction of nonzero eigenvalues and eigenvectors, data dimensionality reduction, or simply matrix regularization. Thus, in this work, we do not limit our discussion to the case of a full-rank local covariance matrix.
In addition to the classical RX detector, a number of extensions and other anomaly detection algorithms have also been proposed for hyperspectral data. A time-efficient method has been introduced for anomaly detection in Ref. 14, the kurtosis maximization-based anomaly detection was improved in Ref. 15, the subpixel anomaly detection was discussed in Ref. 16, a randomselection-based anomaly detector was introduced in Ref. 17, weighted and linear filter-based RX was analyzed in Ref. 18, subspace-projection-based detectors were proposed in Ref. 19, and discriminative metric learning was applied to anomaly detection in Ref. 20. In particular, kernel-based detectors, such as kernel RX (KRX), 21 kernel eigenspace separation transform, 22 and kernel regression analysis 23 for anomaly detection were introduced. In addition, different background modeling approaches were proposed, such as support vector data description, 24 automated modeling methods in Ref. 25, and the collaborative-representation-based method. 26 However, the dual-window-based RX algorithm remains the benchmark due to its relative robustness and easy implementation.
A multiple-window-based RX (MW-RX) detector was recently discussed in Ref. 27, whose final output is independent of the window sizes. In MW-RX, RX was implemented several times with different dual windows, but for each pixel, only the maximum RX output was used to generate the final detection map. In this paper, we propose a decision-fusion approach for hyperspectral anomaly detection using multiple windows, where a decision map is produced for each dual-window detector and the final decision map is generated with a voting strategy. Experimental results will demonstrate that the proposed strategy can reduce the false alarm rates when maintaining the same true positive rates.

Dual-Window RX Detector
Consider a three-dimensional hyperspectral cube with resized samples X ¼ fx i g n i¼1 in R d (d is the number of spectral bands) and n is the total number of samples. For each pixel y (of size d × 1), surrounding data are collected inside the outer window (of size w out × w out ) while outside the inner window (of size w in × w in ), centered at the pixel y. The selected data are resized into a two-dimensional matrix X s ¼ fx i g s i¼1 (s is the number of chosen samples, s ¼ w out × w out − w in × w in ). Hence, the matrix X s (of size d × s) is obtained for every pixel y on its own local window.
A single pixel form of the RX algorithm is often approximated by the following equation: 6,13,28 where P local is the d × d covariance matrix of the background data, and mean vector The test statistic rðyÞ is compared with the prescribed threshold η-if rðyÞ > η, the pixel is an anomaly, otherwise it is a normal pixel.
In Ref. 21, KRX has been investigated via projecting data into a high-dimensional feature space in which the data become more separable. In the kernel-induced feature space, the mapping function Φ maps the pixel y → ΦðyÞ ∈ R d 0 ×1 (d 0 ≫ d is the dimension of the kernel feature space) and Φ ¼ Φðx 1 Þ, Φðx 2 Þ; · · · ; Φðx s Þ ∈ R d 0 ×s . The corresponding output of KRX is represented as where P Φ local and μ Φ local are the estimated covariance matrix and mean vector of the background data in the kernel feature space. More implementation details can be found in Ref. 21.

Proposed Decision-Fusion Detector
Adaptive anomaly detection is used to detect anomalies whose spectral signatures are different from the local background; depending upon the definition of local, the resulting anomaly detection performance will be different. In the setting of dual-window implementation, the pixels between the inner and outer windows are considered as local background; of course, the change  of dual-window sizes will end up with different anomaly detection performances. Note that the purpose of the inner window is to prevent the background from being contaminated by the central pixel when it is a target; thus, the size of the inner window should be slightly larger than the target size; under a complete unknown environment, this information is unknown as well. Inspired by multiclassifier fusion, 29 such difficulty in appropriate window setting may be mitigated by detector fusion.
In the proposed decision-fusion approach, detection outputs for a pixel y using m detectors with m different windows are expressed as fr i ðyÞ; i ¼ 1;2; · · · ; mg, where r i ðyÞ represents the i'th output using the i'th pair ðw in ; w out Þ via Eq. (1) or Eq. (2). The outputs of an entire image are normalized to have a range of [0, 1] and compared with a prescribed threshold η. A pixel will be claimed to an anomaly if the output is larger than η. The number of times that the pixel y is assigned to be an anomaly will be counted: NðyÞ ¼ fCountjr i ðyÞ − η > 0; i¼ 1;2; · · · ; mg: The final class-label decision follows a voting process expressed as where 1 ≤ t ≤ m, 1 means y is an anomaly, and 0 means y is normal. In MW-RX, for a pixel y, after obtaining RX outputs with multiple dual windows, the maximum value will be taken 27 which will be compared with a threshold for the decision. The differences between MW-RX and the proposed RX-Fusion approach are apparent. On one hand, the former one solely considers the maximal value while ignoring others, whereas the latter one conducts multiple detection    (thresholding) processes for an actual fusion of multiple decisions. On the other hand, the winner-take-all concept in MW-RX may have an advantage over anomalies but not background, which is prone to a high false alarm rate, whereas the latter one exploits an additional parameter t to adaptively control the issue. In the experiments, we will show that the parameter t can be easily selected for a suboptimal performance status which is close to the best one. This means the proposed RX-Fusion and KRX-Fusion can be operated as parameter-free.

Hyperspectral Data
The first experimental data we employed are the hyperspectral digital imagery collection experiment (HYDICE) image 30 This scene consists of 80 × 100 pixels for an urban area. The spatial resolution is approximately 1 m. 175 bands of spectral coverage 0.4 to 2.5 μm remain after removal of water vapor absorption bands. There are approximately 21 anomalous pixels, representing cars and roof. The scene and the ground-truth map of anomalies are shown in Fig. 1.
The second dataset was acquired by the HyMap airborne hyperspectral imaging sensor, 31 which provides 126 spectral bands spanning the wavelength interval 0.4 to 2.5 μm. The image dataset, covering one area of Cooke City, Montana, was collected on July 4, 2006, with the spatial size 200 × 800 pixels. Each pixel has approximately 3 m of ground resolution. Seven types of targets, including four fabric panel targets, and three vehicle targets, were deployed in the region of interest. In our experiment, we crop a subimage of size 100 × 300 pixels, including all these targets (anomalies) as depicted in Fig. 2. Figure 3 further illustrates the spectral signatures of the seven targets, which are significantly different from the mean of background.

Detection Performance
We investigate the effectiveness of the proposed RX-Fusion and KRX-Fusion. For KRX, a commonly used Gaussian radial basis function kernel is adopted. 21 In this work, the kernel parameter is set to 50 for these two data according to our experimental study. As for windows ðw in ; w out Þ, since the size of anomalies is usually small, we set the general choices as listed in Table 1, which includes 12 pairs in total. Figure 4 first illustrates the performance with varying sizes of windows ðw in ; w out Þ using the HYDICE urban data. The receiver-operating-characteristic (ROC) curve is employed to quantitatively evaluate the detection ability. The results clearly show that the performance of the detector changes significantly with different ðw in ; w out Þ and indicate that it deteriorates if an inappropriate window is chosen, which motivates us to design a windowindependent detector. The proposed RX-Fusion and KRX-Fusion, based on the decision-fusion strategy, simultaneously adopt multiple windows and produce the final decision map via a voting process.   Fig. 5(a), the best ðw in ; w out Þ for both RX and KRX is (7,9); moreover, we observe that the AUC performance of RX and KRX is sensitive to the choice of sizes of windows, which is consistent with the performance in Fig. 4. In Fig. 5(b), the optimal t values (out of 12) for RX-Fusion and KRX-Fusion are 5 and 4, respectively. Note that when t ¼ 6, the performance of RX-Fusion and KRX-Fusion is very similar to the best ones, which are also close to the case with the best window settings as shown in Fig. 5(b). In Fig. 6, for the HyMap data, the best ðw in ; w out Þ for both RX and KRX is (7,11), and the best t values for RX-Fusion and KRX-Fusion are 9 and 8, respectively. In Fig. 6(b), if t ¼ 6, the performance of both RX-Fusion and KRX-Fusion is slightly worse, but much better than the cases with inappropriate window sizes as shown in Fig. 6(a).
Under the best parameters, Figs. 7 to 8 illustrate the ROC performance of the proposed RX-Fusion and KRX-Fusion compared with RX, KRX, MW-RX, and MW-KRX. For better visualization, we separate the cases of RX-Fusion and KRX-Fusion. From the results, it is obvious that the proposed RX-Fusion is always superior to RX and MW-RX, and the proposed KRX-Fusion outperforms KRX and MW-KRX. For the HYDICE urban data, MW-KRX exhibits a better performance than KRX; however, this is not true for the HyMap data. To further investigate the detection performance in the HYDICE urban data, Fig. 9 illustrates the detection maps when P f is fixed to a small value (e.g., 0.005) and P d is the maximum. The proposed RX-Fusion and KRX-Fusion still perform the best with the largest P d , which is consistent with the results in Fig. 7.  , we can see that although the performances of suboptimal RX-Fusion and KRX-Fusion (i.e., t ¼ 6 when m ¼ 12) are slightly worse than the best RX and KRX (which are practically unknown), respectively, they are much better than their worst and average performances. This means, in reality, we can empirically choose t to equal 50% of the total number of detectors; in other words, if half of detectors claim a pixel to be an anomaly, then it will be an anomaly in the final decision.

Conclusions
In this work, we proposed an effective decision-fusion strategy for dual-window-based anomaly detection in HSI. For each testing sample, the detection outputs of a detector with multiple windows were first obtained. The final detection was achieved through a voting process. Experimental results of two hyperspectral data demonstrated that the proposed RX-Fusion/ KRX-Fusion outperformed the existing RX, KRX, MW-RX, and MW-KRX. Although the final decision is dependent on a voting parameter, we find out that 50% voting can generate a suboptimal (and close to optimal) performance, which is significantly better than a single detector with unfortunately poor window settings. The base detector utilizes the fashion of spatial convolution with a sliding dual window, which is suitable for parallel computing, 32,33 because the output of one pixel is irrelevant to the output of another. In the proposed decision-fusion framework, the multiple dual windows can also be simultaneously implemented, which will be investigated as the future work.