User-generated Structured Query Language (SQL) queries are a rich source of information for database analysts,
information scientists, and the end users of databases. In this study a group of scientists in astronomy and computer and
information scientists work together to analyze a large volume of SQL log data generated by users of the Sloan Digital
Sky Survey (SDSS) data archive in order to better understand users' data seeking behavior. While statistical analysis of
such logs is useful at aggregated levels, efficiently exploring specific patterns of queries is often a challenging task due
to the typically large volume of the data, multivariate features, and data requirements specified in SQL queries. To
enable and facilitate effective and efficient exploration of the SDSS log data, we designed an interactive visualization
tool, called the SDSS Log Viewer, which integrates time series visualization, text visualization, and dynamic query
techniques. We describe two analysis scenarios of visual exploration of SDSS log data, including understanding
unusually high daily query traffic and modeling the types of data seeking behaviors of massive query generators. The
two scenarios demonstrate that the SDSS Log Viewer provides a novel and potentially valuable approach to support these
This paper introduces a new method for creating an interactive sequence similarity map of all known influenza virus
protein sequences and integrating the map with existing general purpose analytical tools. The NCBI data model was
designed to provide a high degree of interconnectedness amongst data objects. Substantial and continuous increase in
data volume has led to a large and highly connected information space. Researchers seeking to explore this space are
challenged to identify a starting point. They often choose data that is popular in the literature. Reference in the literature
follow a power law distribution and popular data points may bias explorers toward paths that lead only to a dead-end of
what is already known. To help discover the unexpected we developed an interactive visual analytics system to map the
information space of influenza protein sequence data. The design is motivated by the needs of eScience researchers.
A research paradigm is a dynamical system of scientific works, including their perceived values by peer scientists, and governed by intrinsic intellectual values and associated citation endurance and decay. Identifying an emerging research paradigm and monitoring changes in an existing paradigm have been a challenging task due to the scale and complexity involved. In this article, we describe an exploratory data analysis method for identifying a research paradigm based on clustering scientific articles by their citation half life and betweenness centrality as well as citation frequencies. The Expectation Maximization algorithm is used to cluster articles based on these attributes. It is hypothesized that the resultant clusters correspond to dynamic groupings of articles manifested by a research paradigm. The method is tested with three example datasets: <i>Social Network Analysis</i> (1992-2004), <i>Mass Extinction</i> (1981-2004), and <i>Terrorism</i> (1989-2004). All these subject domains have known emergent paradigms identified independently. The resultant clusters are interpreted and assessed with reference to clusters identified by co-citation links. The consistency and discrepancy between the EM clusters and the link-based co-citation clusters are also discussed.
Scatter graphs are a popular medium for visualizing spatial- semantic structures derived from abstract information spaces. For small spaces such graphs can be an effective means of reducing high-dimensional information into two or three spatial dimensions. As dimensionally increases, representing the thematic diversity of documents using spatial proximity alone becomes less and less effective. This paper reports an experiment designed to determine whether, for larger spaces, benefits are to be gained from adding visual links between document nodes as an additional means of representing the most important semantic relationships. Two well known algorithms, minimum spanning trees (MST) and pathfinder associative networks (PFNET), were tested against both a scatter graph visualization, derived from factor analysis, and a traditional list-based hypertext interface. It was hypothesized that visual links would facilitate users' comprehension of the information space with corresponding gains in information space with corresponding gains in information seeking performance. Navigation performance and user impression were analyzed across a range of different search tasks. Results indicate both significant performance gains and more positive user feedback for MST and PFNET visualizations over scatter graphs. Performance on all visualizations was generally poorer and never better than that achieved on the text list interface although the magnitude of these differences was found to be highly task dependent.