Translator Disclaimer
30 March 2000 Fuzzy c-means clustering of partially missing data sets
Author Affiliations +
The fuzzy c-means algorithm is a useful tool for clustering real s-dimensional data. Typically, each observation consists of numerical values for s feature such as height, length, etc. In some cases, data sets contain vectors that are missing one or more feature values. For example, a particular datum might have the form: (254.3, x, 36.2, 112.7, x), where the second and fifth feature values are missing. The (standard) fuzzy c-means algorithm cannot be applied in this case since the required computations reference numerical features values for all s features of every data point. Two adaptations of fuzzy c-means to the incomplete data case are presented here. One adaptation replaces unknown feature values with additional variables that are optimized to prove an extrapolated data set yielding the smallest possible value of the fuzzy c-means criterion. Another approach uses only the available feature values in distance calculations, and then adjusts for the missing feature values by an appropriately chosen scaling of the computed distances. Numerical convergence properties of the adaptations and computational costs are discussed. Artificial data sets are used to demonstrate the two new approaches.
© (2000) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Richard J. Hathaway, Dessa D. Overstreet, and James C. Bezdek "Fuzzy c-means clustering of partially missing data sets", Proc. SPIE 4055, Applications and Science of Computational Intelligence III, (30 March 2000);

Back to Top