Iterative clustering (e.g. K-Means, EM) is one of the most commonly used clustering methods, which attempts to
iteratively find a local optimum starting from an initial condition, including initial centroids and initial number of
clusters. For iterative clustering, research has shown that the initial conditions are crucial to clustering quality and
running time of a clustering computation. Using a novel visualization tool, CComViz (Cluster Comparison
Visualization), we present an innovative approach to refine the initial centroids and the number of clusters by visually
analyzing multiple clustering results generated by different clustering algorithms. As an example, we apply our new
approach to a gene expression case study for generating a better and converging clustering. The proposed approach is
considered to be an extension to cluster ensembles since the original data sources are reused, while in classic cluster
ensembles they are not.
Tightly coupled visualization and analysis is a powerful approach to data exploration especially for clustering. We describe such a specific integration of analysis and visualization for the evaluation of multiple partitions of a data set. Partitions are decompositions of a dataset into a family of disjoint subsets. They may be the results of clustering, of groupings of categorical dimensions, of binned numerical dimensions, of predetermined class labeling dimensions, or of prior knowledge structured in mutually exclusive format (one data item associated with one and only one outcome).
Partition or cluster stability analysis can be used to identify near-optimal structures, build ensembles, or conduct validation. We extend Parallel Sets to a new visualization tool which provides for the mutual comparison and evaluation of multiple partitions of the same dataset. We describe a novel layout algorithm for informatively rearranging the order of records and dimensions. We provide examples of its application to data stability and correlation at the record, cluster, and dimension levels within a single interactive display.
Although there are a number of visualization systems to choose from when analyzing data, only a few of these allow for the integration of other visualization and analysis techniques. There are even fewer visualization toolkits and frameworks from which one can develop ones own visualization applications. Even within the research community, scientists either use what they can from the available tools or start from scratch to define a program in which they are able to develop new or modified visualization techniques and analysis algorithms. Presented here is a new general-purpose platform for constructing numerous visualization and analysis applications. The focus of this system is the design and experimentation of new techniques, and where the sharing of and integration with other tools becomes second nature. Moreover, this platform supports multiple large data sets, and the recording and visualizing of user sessions. Here we introduce the Universal Visualization Platform (UVP) as a modern data visualization and analysis system.