## 1.

## Introduction

One of the latest advancements in remote sensing is the advent of hyperspectral imaging spectrometers that are able to acquire data simultaneously in hundreds of bands with narrow bandwidths. Hyperspectral data can provide detailed contiguous spectral curves, a trait that traditional multispectral sensors cannot offer. For example, NASA’s Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) has 224 contiguous bands with a nominal bandwidth of 10 nm for each, while the multispectral sensor Landsat thematic mapper (TM) only provides seven noncontiguous broad bands with an average bandwidth of 100 nm, which leaves large gaps between bands. This increase in spectral dimensionality not only allows the delineation of the land surface at the land use and land cover level, but also potentially enables the characterization of various minerals, soils, rocks, and vegetation at the material level.^{1} However, the greater spectral detail comes at a cost of huge data volumes with high dimensionality, which poses great challenges in extracting thematic information from these images.

The conventional multispectral classification approaches, such as maximum likelihood, could not deliver satisfactory results when applied to hyperspectral images.^{2} This is attributed to the fact that most multispectral classifiers are statistics-based, assuming multivariate normal distributions, linearity, and an absence of collinearity for the input bands. This is hardly achievable for hyperspectral images with hundreds of bands. For supervised classifiers, the number of training samples for each class, at a minimum, should be more than the number of spectral bands to avoid the creation of singular matrices.^{3} In practice, it is suggested that the number of training samples for each class be 10 times the number of input image bands in order to obtain acceptable results,^{1} which becomes very difficult, if not completely possible, for hyperspectral images, due to their high spectral dimensionality. For these reasons, nonparametric approaches specifically designed for processing hyperspectral data have been proposed, such as spectral angle mapper, linear spectral unmixing, and spectroscopic library matching.^{1} These methods are supervised techniques based on the availability of reference endmembers, which are spectrally pure reflectance readings of different materials. Since these endmember-based approaches are nonparametric, there is no need to collect a large number of training samples for each class. However, as supervised techniques, they require *a priori* knowledge of the ground reference information in the form of a complete set of endmembers. Endmembers can be derived either from extensive *in situ* and lab work using a spectroradiometer or from a hyperspectral image if spectrally pure pixels of the materials can be identified on the image. However, *in situ* data collection to obtain ground reference information in advance is not always possible prior to image classification. Additionally, endmember determination from the image itself is very difficult, due to the existence of mixed pixels in the images. Therefore, when no or very little *a priori* ground reference information is available, unsupervised approaches appear to be an attractive alternative to the existing endmember-based methods.^{1} However, most conventional unsupervised image classifiers, such as the Iterative Self-organizing Data Analysis (ISODATA) technique, were designed to analyze multispectral imagery and may be inadequate for hyperspectral data analysis for two main reasons. First, these methods are computationally intensive, because all pixels in the image must be compared with all clusters through multiple iterations in order to assign them to the closest cluster. Second, they tend to suffer from performance degradation with increased spectral dimensionality of the hyperspectral images.^{4}

Artificial intelligence (AI)-based approaches, especially artificial neural network and fuzzy logic techniques, have been extensively employed to analyze multispectral images. Mas and Flores^{5} have reviewed these techniques for remote sensing applications. Like conventional image classifiers, AI-based approaches can be supervised or unsupervised. Among the various unsupervised classifiers, self-organizing maps (SOM),^{6} fuzzy $c$-means (FCM),^{7} and descending fuzzy learning vector quantization (DFLVQ)^{8} have historically been used for processing multispectral images or hyperspectral images with reduced spectral dimensionality.^{9}10.11.12.13.^{–}^{14} Despite avoiding problems with singular matrices and the need for obtaining pure pixels as with the statistics-based and endmember-based approaches, these AI-based classifiers have not been widely used in hyperspectral image classification. This may be attributed to certain limitations with these AI-based classifiers. As with the ISODATA approach, these classifiers involve computationally intensive algorithms. Further, when assigning a pixel to a specific cluster, these classifiers only consider the center of the clusters and ignore the data dispersion within each cluster.^{15} In addition, the exponent weighting parameter involved with some of these classifiers (such as FCM and DFLVQ) is difficult to specify.^{16}

To address these deficiencies of the SOM, FCM, and DFLVQ approaches, we propose an unsupervised Gaussian fuzzy self-organizing map (GFSOM), a neuro-fuzzy system specifically designed for hyperspectral image classification. A neuro-fuzzy system combines the advantages of neural network and fuzzy logic systems and avoids the shortcomings of each individual system when they are used separately for image classifications. These shortcomings include the black box problem of a neural network and the fuzzy system’s lack of an automatic knowledge acquisition capability.^{17} The GFSOM classifier is built upon the success of its supervised counterpart, Gaussian fuzzy learning vector quantization (GFLVQ).^{2} We also investigate whether the optimal learning sample selection strategy and the prototype initialization system developed for the GFSOM system can be adopted by the SOM, FCM, and FLVQ approaches in classifying hyperspectral images.

## 2.

## Background

The basic algorithms underlying the three classic AI-based unsupervised classifiers (SOM, FCM, and DFLVQ) are described below, and their limitations are briefly examined. They will be used as benchmarks to evaluate the proposed GFSOM system.

## 2.1.

### SOM

SOM is a “winner-take-all” unsupervised learning neural network. It was first proposed by Kohonen^{6} and is often used to classify inputs into different categories. SOM has two layers: an input layer and a competitive or output layer. The input layer has the same dimensionality as the input vector, and the output layer consists of a physical net of neurons located at fixed positions. Different from many other neural networks, SOM is unique in that it is independent of any activation function, and it does not have a hidden layer. For a given input vector, outputs are computed for the neurons in the output layer, and the winner is the neuron whose weight has the minimum distance from the input vector. Then the network updates the weight of the winner and those of its predefined neighbors via a learning rule. In this way, the output neurons become selectively tuned to the input patterns presented during this competitive-learning procedure. In the case of remotely sensed image classification, for a pixel with $N$ bands as an input vector (${x}_{1},\text{\hspace{0.17em}}{x}_{2},\dots \text{\hspace{0.17em}}{x}_{n}$), the winner output neuron is decided by:

## (2)

$$\mathrm{\Delta}{c}_{ij}=\eta ({x}_{i}-{c}_{ij}),\phantom{\rule[-0.0ex]{1em}{0.0ex}}\text{if}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{output neuron}\text{\hspace{0.17em}}j\text{\hspace{0.17em}}\text{is the winner},$$Like other AI-based unsupervised algorithms, the objective of SOM learning is to generate the weights to model the prototypes, also known in the literature as the code vectors or center of the clusters.^{16} SOM uses the Euclidean distance to determine the closeness of an input pixel to the center of a cluster, but it does not consider the data dispersion in the cluster. SOM has been extensively applied in various areas^{18} but has not yet been seen in clustering hyperspectral images, due to certain limitations.^{12} For example, it is difficult to select the initial neighborhood size that will be altered during the learning iterations in order to achieve “useful” results.^{8} When it is used for hyperspectral image classification, the first Kohonen heuristic rule that monotonically decreases the learning rate with time works well, but the second rule that decreases the neighborhood size adaptively may not be applicable for images with a small number of spectral clusters (say, $\le 4$). The initial values for the two parameters, $\eta $ and ${c}_{ij}$, need to be set before the learning starts, and these values can greatly influence both the speed of the computations and the learning outcome.

## 2.2.

### FCM

The learning rules in SOM are basically a local updating strategy that ignores the global relationships between the winner and the rest of the weights, in that all the emphasis is put on the winning prototype.^{16} Unlike SOM, FCM considers the global geometric structure present in the data and updates all the prototypes during the learning iterations. FCM was firstly developed by Dunn^{19} and improved by Bezdek.^{7} It has been widely utilized in pattern recognition, but its application in hyperspectral image classification is rare. The algorithm for FCM is based on minimization of an objective function:

## (4)

$${J}_{m}=\sum _{i=1}^{N}\sum _{j=1}^{M}{u}_{ij}^{m}{\Vert {x}_{i}-{c}_{ij}\Vert}^{2},\phantom{\rule[-0.0ex]{1em}{0.0ex}}1\le m<\infty ,$$## (5)

$${u}_{ij}={\left(\sum _{k=1}^{M}{\left(\frac{\Vert {x}_{i}-{c}_{ij}\Vert}{\Vert {x}_{i}-{c}_{ik}\Vert}\right)}^{\frac{2}{m-1}}\right)}^{-1}.$$FCM requires setting an appropriate value for $m$, an important parameter depicting the fuzziness of the system. If $m$ is small (e.g., close to 1), FCM tends to produce almost crisp vectors, and only the winner prototype is updated, leading to a process similar to SOM. If $m$ is big (e.g., $>7$), FCM becomes fuzzy and updates every prototype to a very small extent. To achieve a satisfactory result, neither very small nor very big values of $m$ are desirable.^{16} Similarly, ${c}_{ij}$ or ${u}_{ij}$ needs to be initialized, because they determine the speed of convergence and its outcome. FCM does not explicitly consider the data dispersion of each cluster, although it does model the global geometric structure of the data. For FCM, the membership ${u}_{ij}$ needs to be calculated for all the learning samples in each run, which may cause tremendous computation times when used to process hyperspectral data.

## 2.3.

### DFLVQ

DFLVQ was developed by Tsao et al.^{8} to improve the performance of SOM and FCM. It was proposed to combine the hard Kohonen’s SOM and the soft FCM algorithms. In DFLVQ, the learning rate $\eta $ in SOM is renamed as $\alpha $ and is calculated as a function of the fuzzy membership ${u}_{ij}$ in FCM as

## (8)

$${c}_{ij,t}={c}_{ij,t-1}+\frac{\sum _{i=1}^{N}{\alpha}_{ij,t}({x}_{i}-{c}_{ij,t-1})}{\sum _{i=1}^{N}{\alpha}_{ij,t}}.$$We can see that in DFLVQ, the learning rule depends not only on the distance between the input spectral vector and the prototype, but also the learning rate that is adjusted over time with ${m}_{t}$. A larger ${m}_{t}$ leads to a lower learning rate, whereas a smaller ${m}_{t}$ results in a higher learning rate. Thus, it is preferable to vary ${m}_{t}$ in order to control the amount of fuzziness by adjusting it from a larger initial value (${m}_{0}$) to a smaller final one (${m}_{f}$) using a simple linear function decreasing with time. Hence, this method is referred to as DFLVQ in the literature. This improved algorithm not only adjusts the learning rate but also updates all neighborhoods with time, thus achieving a good balance between SOM and FCM. However, it is often criticized because of the difficulty in setting the right range for $m$ and the appropriate initial values for ${c}_{ij}$, which may severely influence the computation cost and results. Again, this method does not consider data dispersion within each cluster explicitly, and it is computationally very intensive. Consequently, practical applications of DFLVQ in hyperspectral image classification are very limited.

## 3.

## GFSOM

As mentioned above, SOM, FCM, and DFLVQ only model the center of a cluster without explicitly considering the dispersion of data within the cluster in the learning procedure. To address this problem, we propose a neuro-fuzzy system that utilizes the Gaussian membership function to fuzzify the SOM neural network, which leads to a fuzzy GFSOM system. The development of this unsupervised GFSOM was inspired by its counterpart, the supervised fuzzy learning vector quantization GFLVQ,^{2} which was first developed by Qiu and Jensen^{17} in an attempt to open the black box of neural systems in multispectral image classification.

The structure of the GFSOM system is similar to that of Kohonen’s SOM. It has two layers: the input layer and the competitive layer, also known as the output layer (Fig. 1). The neurons in the input layer correspond to all the hyperspectral bands of an input pixel, and the number of output neurons is associated with the number of resulting clusters defined by users. In addition to these fundamental components of the SOM neural network, the GFSOM system has four further components: system initialization, fuzzification, neuro-fuzzy learning, and clustering and defuzzification.

## 3.1.

### System Initialization

## 3.1.1.

#### Selection of learning samples

Unsupervised classification searches for natural groupings of spectral properties of input pixels. The selection of the input pixels used as learning samples is critical for the performance of unsupervised AI-based approaches to hyperspectral image classification. For multispectral image classifications, the unsupervised methods often use all the pixels of an image scene as the learning samples. This strategy may be feasible for small hyperspectral images but is unlikely to succeed when processing regular hyperspectral images with hundreds of bands. Even if it succeeds, the learning procedure is likely to be extremely time-consuming for these types of images. The use of all the pixels as learning samples may also encounter problems due to spatial autocorrelation, which can lead to an overestimation of the contrast between categories.^{20} Tubbs and Coverly^{21} suggest that classification algorithms need to be modified in order to take into account the problems caused by the spatial autocorrelation structure of homogenous samples. To avoid such problems, Craig^{22} recommends sample pixels no closer than every 10th pixel for accuracy assessment. Therefore, using all pixels as learning samples is not only inefficient but also theoretically problematic for clustering hyperspectral images.

To address this problem, we propose a random sampling scheme (RSS) to select learning samples during the learning iteration for GFSOM, a method that can also be used by the other three AI-based unsupervised classifiers for hyperspectral images. With the RSS method, an input image is randomly sampled based on a user-defined number. This number, representing the number of pixels to be randomly selected, is set depending on the number of output clusters, the size of an input image, and the degree of spatial autocorrelation existing among the input pixels. One case study illustrated that a hyperspectral image with 160,000 pixels and 112 spectral bands only needs approximately 1000 randomly selected learning pixels to generate eight stable clusters. In each iterative learning step, the randomly selected pixels are used to update the weights of the AI-based classifiers. This proposed RSS method can greatly improve the efficiency of the learning process and reduce the spatial autocorrelation among the learning data.

## 3.1.2.

#### Initialization of the prototypes

As with SOM, FCM, and DFLVQ, prototypes (${c}_{ij}$) need to be initialized before the self-learning starts. Random assignment of prototypes is often used by the existing AI-based approaches to multispectral image classification. In SOM and GFSOM, a winner will be determined for each input learning sample. When the random assignment of prototype is used, the first winner will always be the winner, because the spectral profiles of other prototypes are still random. This leads to the learning procedure completely failing. For FCM and DFLVQ, starting from random spectral profiles of the prototypes may cause the learning procedure to be extremely slow. Thus, we propose a scheme for generating initial prototypes from the selected learning samples. This initialization strategy is based on a simplified K-mean scheme (SKS) and involves several steps. The first step is to randomly select initial prototypes from the first $M$ learning samples. The second step is to classify the remaining learning samples to the closest prototype to form clusters. The last step is to calculate the new prototypes based on the samples assigned to individual clusters. The SKS is different from K-mean clustering, because it needs neither to cluster all of the pixels of an image nor to iterate over multiple passes. The prototypes derived from this SKS approach can then be used as the initial prototypes for GFSOM. Similarly, this scheme can also be adopted by SOM, FCM, and DFLVQ to classify hyperspectral images.

In GFSOM, it is also necessary to initialize the standard deviations that measure the dispersion of data within a cluster, a parameter lacking from the other three AI-based unsupervised classifiers. The initial value for this parameter can be calculated by applying the normal standard deviation equation to the initial value of a prototype (as the mean) and the values of the selected learning samples assigned to its related cluster. Finally, prior to running the learning procedure, all the pixel values of the selected learning samples and the original input image need to be normalized to [0, 1] as in the SOM neural network.

## 3.2.

### Fuzzification

Since the input pixel values fed into the GFSOM system are not fuzzy numbers, they must be converted into a set of fuzzy numbers through a fuzzification process. The fuzzification of a pixel value of a single band is based on the Gaussian fuzzy membership function:

where ${u}_{ij}$ is the fuzzy membership grade, ${c}_{ij}$ is the mean parameter of the Gaussian function, corresponding to the center of the $j$’th cluster of the $i$’th band, ${x}_{i}$ is the $i$’th input variable (i.e., the pixel value for the $i$’th band) of the input learning pixel, and ${\sigma}_{ij}$ represents the Gaussian standard deviation parameter characterizing dispersion of data in the cluster. The parameters ${u}_{ij}$, ${c}_{ij}$, ${x}_{i}$, $i$, and $j$ are the same parameters as in FCM and DFLVQ. The GFSOM system uses the derived fuzzy membership grade, rather than the Euclidean distance, as a relative distance to determine the closeness of the input pixel to a cluster prototype. This fuzzification process models the spectral properties of input data not only by capturing their center tendency characteristics but also by modeling their data dispersion distinctiveness. The values ${c}_{ij}$ and ${\sigma}_{ij}$ are initialized using the proposed prototype initialization system. Only the selected learning samples are entered into the learning iterations.Equation (9) can only be used to determine the fuzzy membership grade of one band for an input pixel. For hyperspectral data, to assign a pixel to a particular cluster, all the bands of this pixel need to be considered. To achieve this, an *and-or* fuzzy operator in the form of the geometric mean is utilized, so that an overall membership grade for this pixel can be obtained as:

## (10)

$${\alpha}_{j}={\left\{{\prod}_{i=1}^{N}{e}^{-\left[\frac{{({x}_{i}-{c}_{ij})}^{2}}{2\times {\sigma}_{ij}}\right]}\right\}}^{\frac{1}{N}},$$*and-or*fuzzy operator, because it sits between the fuzzy

*and*(intersection) and the fuzzy

*or*(union) operators. As an averaging operator, it allows a low membership grade in one band to be compensated for by a high membership grade in another band, so that a missing or noisy value in one band will not heavily affect the clustering output of the entire pixel. In addition, the

*and-or*operator is an “idempotent” function, such that

*and-or*$(a,a,\dots ,a)=a$, which delivers a more reasonable final membership grade. The winner is then assigned to the output neuron with the maximum overall membership grade.

## 3.3.

### Neuro-Fuzzy Learning

GFSOM performs fuzzy partitioning of the input vector based on competitive learning as in SOM, but unlike the original SOM that searches only the center of each cluster, GFSOM seeks to learn both the center of a cluster and the average deviation from the center of this cluster based on the pixels assigned to it. Thus, the parameters ${c}_{ij}$ and ${\sigma}_{ij}$ in the Gaussian membership function are used in conjunction to determine the “winner” output neuron. The GFSOM updates both parameters for the output neuron. Unlike the supervised system, where these two parameters are updated based on the true target class information provided by the training data and the competition result, the unsupervised learning of GFSOM has to rely solely on the competition outcome. The fuzzy competitive learning for these two parameters in GFSOM is as follows:

## (11)

$$\mathrm{\Delta}{c}_{ij}=\eta ({x}_{i}-{c}_{ij}),\phantom{\rule[-0.0ex]{1em}{0.0ex}}\text{if}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{the}\text{\hspace{0.17em}}j\text{\hspace{0.17em}}\text{'}\text{th output neuron is the winner},$$## (13)

$$\mathrm{\Delta}{\sigma}_{ij}=\eta (|{x}_{i}-{c}_{ij}|-{\sigma}_{ij}),\phantom{\rule[-0.0ex]{1em}{0.0ex}}\text{if}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{the}\text{\hspace{0.17em}}j\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{'th output neuron is the winner},$$The updating rule for ${c}_{ij}$ is the same as that used in SOM [see Eqs. (2) and (3)]. However, additional updating rules for the standard deviation weights were created [Eqs. (13) and (14)]. When the absolute deviation of the input pattern ${x}_{i}$ from the center of the matched cluster (centered at ${c}_{ij}$) is larger than the current standard deviation ${\sigma}_{ij}$ of the cluster, the standard deviation weight will be increased by $\eta $ portion of the difference. If the absolute deviation is smaller, then the standard deviation will be decreased by that small portion of the difference. No update is needed if they are the same. This ensures that the size of the cluster will be shrunk or enlarged adaptively based on the deviations of the matched input patterns from the cluster center. The updating rule for the standard deviation works well both in the supervised GFLVQ and unsupervised GFSOM systems.

## 3.4.

### Clustering and Defuzzification

When the centers (${c}_{ij}$) and standard deviations (${\sigma}_{ij}$) of all clusters have been fine tuned in the learning procedure, they can then be directly used to cluster all the pixels of the input image. For a hard clustering algorithm, the result is a single clustering map where each pixel is assigned to only one cluster. The output of the GFSOM system, however, is fuzzy membership grades for all clusters, affording much richer information that can be used to estimate various constituents found in a mixed pixel.^{1} If a hard clustering map is desired, defuzzification, a reverse of the fuzzification process, can be used to convert various fuzzy membership grades to a single cluster for each pixel. Defuzzification is achieved by comparing a pixel’s membership grades for all clusters and then assigning the pixel to the cluster with the maximum membership grade. A final thematic map is obtained through visually labeling the clusters into land use/land cover classes, as with other unsupervised classification approaches.

## 4.

## Case Study

The proposed GFSOM and the other three AI-based unsupervised classifiers (SOM, FCM, and DFLVQ) were used to classify a subset of a Hyperion image acquired in Wuxi, China (centered at $+31\xb033\prime 10.80\u2033$, $+120\xb09\prime 54.00\u2033$) on August 19, 2004. This subset image has a total of $400\times 400\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{pixels}$. Hyperion is a push broom, hyperspectral imaging spectrometer on board the Earth-Observing 1 (EO-1) spacecraft. Unlike widely used airborne hyperspectral instruments, Hyperion is a satellite sensor system with a coarse spatial resolution of $30\times 30\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{m}$. There are 242 unique spectral channels with a bandwidth of 10 nm for each band in the visible/near infrared (VNIR) spectral region (400 to 1000 nm) and short-wave infrared (SWIR) region (900 to 2500 nm). Due to the fact that some channels have low signal strengths, and others fall into an overlap region between VNIR and SWIR, only 196 valid bands are available for use. To alleviate the collinearity problem, we selected most of the calibrated VNIR bands and discarded the overlapped SWIR bands. A uniform feature design (UFD) was then applied as suggested by Filippi and Jensen^{13} to reduce the dimensionality of the data set while retaining as much spectral shape information as possible in the SWIR range, resulting in 112 bands being used for the subsequent analysis. A VNIR color composite map of the study area is shown in Fig. 2(a) with a band combination of 993, 603, and 711 nm. Four land cover types at level I of the USGS Classification System^{23} were identified in this image, including urban/built-up, agriculture, forest, and water. We also randomly collected 491 validating pixels (130 for urban, 118 for agriculture, 124 for forest, and 119 for water), which were used as references for accuracy assessment purposes.

Successful applications of AI-based classifiers in remote sensing largely depend on the setting of appropriate parameters. The common parameters that need to be set for GFSOM and the other three AI classifiers include the number of learning cycles (i.e., iterations), number of input neurons (i.e., input bands), number of output neurons (i.e., clusters), number of learning samples, and initial values of the prototypes (${c}_{ij}$). To ensure comparability of results, these parameters should be kept consistent across the four classifiers. Several tests were conducted in order to obtain the optimal settings of these parameters. In this case study, the outputs were obtained by setting the number of learning iterations to 100, the number of input neurons equal to the number of image bands (112 in this case), the number of output neurons to eight, and the number of randomly selected learning samples in each iterative step to 1000. Aside from these common parameters, other parameters that are unique to each classifier also need to be specified. In SOM and GFSOM, the learning rate ($\eta $) was set to decrease monotonically with time from 0.5 to 0.05, following the first Kohonen heuristic rule. This range was empirically determined based on experiments. A large value for the learning rate makes the algorithm unstable, while a small value takes a long time to converge. This adaptive learning rate keeps the learning step size as large as possible, while keeping the learning procedure as stable as possible.^{24} In FCM and DFLVQ, setting the exponential weight ($m$) appropriately is critical for the successful use of these two algorithms. For FCM, most users suggest a value in the range of 1.1 to 5 to yield a suitable interpretation. In this case study, a value of 3 for $m$ generated the best results. For DFLVQ, a varying $m$ approach was adopted with a range of 1.1 to 7.0, as suggested by Bezdek and Pal.^{16} All the parameters initialized for each classifier are summarized in Table 1. Tests were conducted on a PC with a 3.4-GHz Pentium CPU and 1 GB of RAM. Once the learning procedure achieves convergence, the whole image is then input into the system to be grouped into natural spectral clusters, which are finally transformed into classification maps with four identified classes, as presented in Figs. 2(b)–2(e).

## Table 1

Parameter set for SOM, FCM, DFLVQ, and GFSOM in classifying an EO-1/Hyperion hyperspectral image with 400×400 pixels and 112 bands.

Common parameters for all the classifiers | |
---|---|

Number of input neurons | 112 |

Number of output clusters | 8 |

Number of learning cycles | 100 |

Number of selected learning samples | 1000 |

Initialization of prototypes (cij) | simplified K-mean scheme (SKM) |

Parameters | SOM |

Initial learning rate | 0.5 |

Final learning rate | 0.05 |

Parameters | FCM |

Weighting exponent (m) | 3 |

Parameters | Descending FLVQ |

Initial weighting exponent (m0) | 1.1 |

Final weighting exponent (mf) | 7.0 |

Parameters | GFSOM |

Initial learning rate | 0.5 |

Final learning rate | 0.05 |

## 5.

## Results and Discussion

To evaluate the performance of the RSS strategy for learning sample selection, classification was also attempted using all pixels (160,000) in the input image. Similarly, the random assignment approach to ${c}_{ij}$ prototype initialization was also tested to evaluate the performance of the proposed SKS approach.

The utilization of all the pixels from the input image caused all the AI classifiers to crash before convergence, due to the large data volume of the image. This suggests that the use of all pixels of hyperspectral images as learning samples is problematic for the AI-based unsupervised classifiers. Conversely, all four classifiers were able to deliver desired results using a small number of learning samples selected with the proposed RSS strategy. We found that for the image in our case study with 160,000 pixels, we only needed to collect 1000 learning samples to achieve stable learning outcomes. Therefore, the proposed RSS strategy for learning sample selection is both an efficient and effective solution for AI-based unsupervised classification of hyperspectral images.

The SOM and GFSOM algorithms also completely failed with a random initialization approach. Although FCM and DFLVQ could converge, they took a longer time to do so. However, employing the proposed SKS approach to prototype initialization, SOM and GFSOM were able to achieve convergence, while the convergence of FCM and DFLVQ was also substantially faster. By combining the RSS learning sample selection strategy and SKS prototype initialization system, all classifiers were able to produce desired clustering maps within seconds (Table 2). SOM was the most efficient classifier and finished the self-learning process in 14 sec. GFSOM took a little longer (20 sec) to complete the self-learning process but was faster than FCM and DFLVQ, which took 35 and 33 sec, respectively.

## Table 2

Computational cost of each AI-based algorithm.

SOM | FCM | DFLVQ | GFSOM | |
---|---|---|---|---|

Total CPU (sec) | 14 | 35 | 33 | 20 |

Note: The computer used has a 3.4 GHz Pentium processor with 1 GB RAM.

For comparison purposes, the performance of the conventional unsupervised ISODATA method was also evaluated, and the final classification map derived from this method is shown in Fig. 2(f). The accuracy of the classifications from all the unsupervised classifiers was assessed using the confusion matrix based on 491 randomly selected samples. The producer’s, user’s, and overall accuracy thus obtained are given in Table 3. Kappa coefficients were also calculated for quantifying the classification accuracy. The Kappa coefficient (${K}_{\text{hat}}$) is believed to be a better representation of the general quality of an image classification, because it removes the effects caused by differences in sample size and accounts for the off-diagonal elements in the error matrix.^{25} It also allows different classifications to be compared statistically.^{26} The results of these significance tests, which are based on the Z (Normal) distribution, between GFSOM and the other classifiers using the ${K}_{\text{hat}}$ values are also shown in Table 3. According to Fleiss,^{27} Kappa coefficients larger than 0.75 suggest strong agreement. Landis and Koch^{28} suggest that Kappa coefficients larger than 0.81 indicate an almost perfect agreement. $Z$-scores larger than 1.96 suggest that the two methods are significantly different at the 95% statistical confidence level.

## Table 3

Results of accuracy assessment for each algorithm.

SOM | FCM | DFLVQ | GFSOM | ISODATA | ||||||
---|---|---|---|---|---|---|---|---|---|---|

PA | UA | PA | UA | PA | UA | PA | UA | PA | UA | |

Urban | 66.2 | 96.6 | 40.7 | 88.9 | 97.7 | 82.5 | 98.5 | 86.5 | 11.5 | 16.5 |

Forest | 85.4 | 59.9 | 87.9 | 60.6 | 75.8 | 70.7 | 72.6 | 85.7 | 61.9 | 89.0 |

Agriculture | 40.7 | 84.2 | 93.9 | 89.0 | 66.1 | 83.9 | 83.5 | 84.5 | 44.4 | 44.0 |

Water | 100 | 70.8 | 99.2 | 98.3 | 93.2 | 100 | 100 | 97.5 | 100 | 61.7 |

Overall accuracy (%) | 73.1 | 80.9 | 83.5 | 88.6 | 53.4 | |||||

Kappa coefficient | 0.64 | 0.74 | 0.78 | 0.85 | 0.38 | |||||

Z-score | 6.38 | 3.45 | 2.33 | 13.33 |

Note: PA, producer’s accuracy (%); UA, user’s accuracy (%).

Comparing the classification results, we can see that the best overall accuracy of 88.6% was achieved by GFSOM with Kappa coefficient (${K}_{\text{hat}}$) of 0.85. DFLVQ obtained an overall accuracy of 83.5% with Kappa coefficient of 0.78. SOM produces the poorest result among the AI-based methods with an overall accuracy of 73.1% and Kappa coefficient of 0.64, primarily caused by the misclassification of the agriculture in the east side of the image as forest and urban as water [Fig. 2(c)]. However, SOM could still produce an outcome better than ISODATA, which has an overall accuracy of only 53.4% with Kappa coefficient of 0.38 [Fig. 2(f)]. This confirms the conclusion in the literature that ISODATA is not appropriate for clustering hyperspectral data. The $Z$-tests indicate that the classification accuracy of GFSOM is significantly better than all other unsupervised classifiers at a 95% confidence level.

When the producer’s and user’s accuracy are examined, all methods generate their best accuracy for the water class and obtain varying degrees of agreement for categorizing other classes. Water is often the easiest feature to classify in remotely sensed images, due to the homogeneous nature of most water surfaces. Urban is a difficult category to identify, because of its spatial heterogeneity, resulting in larger data dispersion within this class. GFSOM achieved the best accuracy for identifying the urban class among these methods, owing to its capability to capture the dispersion of data within a class. It obtained comparable accuracy to FCM and DFLVQ for categorizing forest and agriculture classes, which have relatively homogeneous spatial patterns.

Using the proposed approaches for selecting learning samples and initializing prototypes, all AI-based classifiers were able to complete the learning and clustering process within a short period of time. As a result, the computational efficiency would not be a problem for these AI-based unsupervised classifiers in classifying hyperspectral images. In our experiment, SOM achieved the best computational efficiency, due to its simple learning process, but it had the lowest classification accuracy among all AI-based classifiers. As a hard classification algorithm, SOM does not have the capability to provide the rich membership information of a fuzzy classifier^{1} such as FCM, DFLVQ, and GFSOM. Although FCM is a popular fuzzy classifier for multispectral data analysis, to the best of our knowledge, this study is the first to explore its applicability for hyperspectral image processing. By adopting the proposed learning sample selection and prototype initialization approaches, it has been shown to be applicable to hyperspectral data. Nevertheless, for FCM and DFLVQ, a critical issue is still the choice of an appropriate weighting exponent $m$, which can take any values in the range $(1,\infty )$. A varying $m$ approach was adopted in the DFLVQ algorithm used in this case study, but its range is still difficult to determine. In contrast, GFSOM, as a fuzzy neural network relying on the Gaussian membership function adaptively learning from sampling data, does not suffer from the problem of choosing an appropriate value for parameter $m$ to determine its fuzziness. Further, the learning algorithms of SOM, FCM, and DFLVQ all model the central tendency of the individual clusters without considering the data dispersion within each cluster. GFSOM, on the contrary, captures both the typicality of the input vectors and the atypicality of the data within a cluster using fuzzy competitive learning, which enables it to achieve the best accuracy among all the tested unsupervised classifiers. It is also the most computationally efficient classifier, since it only updates the winners in the learning iterations.

## 6.

## Summary

We proposed an unsupervised GFSOM classifier for hyperspectral image analysis with the purpose of solving the various problems inherent in the three standard AI-based approaches: SOM, FCM, and DFLVQ. These three methods have been primarily employed for multispectral image analysis with limited hyperspectral application. We thus additionally explored the potential of these techniques for hyperspectral image processing. To apply these AI-based methods effectively and efficiently, we developed a learning sample selection strategy and prototype initialization process. A case study in classifying an EO-1/Hyperion image illustrated that the proposed GFSOM achieved the best classification result among the unsupervised classifiers tested, owing to its ability to model explicitly not only the center tendency of data groups, but also the data dispersion characteristics within the groups. The results also demonstrate that all the AI-based techniques have the capability to classify hyperspectral images if they adopt the learning sample selection strategy and prototype initialization process developed for GFSOM.

It is instructive to note that our proposed method achieved the best accuracy in this case study, but this does not imply the ultimate superiority of this technique over the other approaches under all circumstances. The fact that different parameters are required for different approaches does not really make them comparable in a strict sense. Similar to other classification approaches, the accuracy of GFSOM heavily depends on the number of classes to be identified, the degree of spatial heterogeneity of the image, and the associated parameters specified. The results obtained here are certainly encouraging, but further research is needed using images with more land use/cover types, as well as different spatial and spectral resolutions in order to confirm the robustness and extensibility of this method.

## Acknowledgments

This research was jointly supported by the National Basic Research Program of China under Project 2009CB723903, and the National High-Tech Research and Development Program of China under Project 2008AA09A404.