PROCEEDINGS ARTICLE | September 23, 2003

Proc. SPIE. 5093, Algorithms and Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery IX

KEYWORDS: Hyperspectral imaging, Statistical analysis, Data modeling, Sensors, Scanning electron microscopy, Algorithm development, Stochastic processes, Statistical modeling, Expectation maximization algorithms, Mahalanobis distance

Developing proper models for hyperspectral imaging (HSI) data allows for useful and reliable algorithms for data exploitation. These models provide the foundation for development and evaluation of detection, classification, clustering, and estimation algorithms. To date, real world HSI data has been modeled as a single multivariate Gaussian, however it is well known that real data often exhibits non-Gaussian behavior with multi-modal distributions. Instead of the single multivariate Gaussian distribution, HSI data can be model as a finite mixture model, where each of the mixture components need not be Gaussian. This paper will focus on techniques used to segment HSI data into homogenous clusters. Once the data has been segmented, each individual cluster can be modeled, and the benefits provided by the homogeneous clustering of the data versus non-clustering explored. One of the promising techniques uses the Expectation-Maximization (EM) algorithm to cluster the data into Elliptically Contoured Distributions (ECDs). A larger family of distributions, the family of ECDs includes the mutlivariate Gaussian distribution and exhibits most of its properties. ECDs are uniquely defined by their multivariate mean, covariance and the distribution of its Mahalanobis (or quadratic) distance metric. This metric lets multivariate data be identified using a univariate statistic and can be adjusted to more closely match the longer tailed distributions of real data. This paper will focus on three issues. First, the definition of the multivariate Elliptically Contoured Distribution mixture model will be developed. Second, various techniques will be described that segment the mixed data into homogeneous clusters. Most of this work will focus on the EM algorithm and the multivariate t-distribution, which is a member of the family of ECDs and provides longer tailed distributions than the Gaussian. Lastly, results using HSI data from the AVIRIS sensor will be shown, and the benefits of clustered data will be presented.