Detecting and tracking objects in spatio-temporal datasets is an
active research area with applications in many domains. A common
approach is to segment the 2D frames in order to separate the objects
of interest from the background, then estimate the motion of the
objects and track them over time. Most existing algorithms assume
that the objects to be tracked are rigid. In many scientific
simulations, however, the objects of interest evolve over time and
thus pose additional challenges for the segmentation and tracking
tasks. We investigate efficient segmentation methods in the context of
scientific simulation data. Instead of segmenting each frame
separately, we propose an incremental approach which incorporates the
segmentation result from the previous time frame when segmenting the
data at the current time frame. We start with the simple K-means
method, then we study more complicated segmentation techniques based
on Markov random fields. We compare the incremental methods to the
corresponding sequential ones both in terms of the quality of the
results, as well as computational complexity.
Proc. SPIE. 5102, Independent Component Analyses, Wavelets, and Neural Networks
KEYWORDS: Principal component analysis, Independent component analysis, Statistical analysis, Data modeling, Electroluminescence, Solids, Data centers, Signal analyzers, Climatology, Atmospheric modeling
Observed and simulated global temperature series include the effects
of many different sources, such as volcano eruptions and El Nino
Southern Oscillation (ENSO) variations. In order to compare the
results of different models to each other, and to the observed data,
it is necessary to first remove contributions from sources that are
not commonly shared across the models considered. Such a separation
of sources is also desired in order to assess the effect of human
contributions on the global climate. Atmospheric scientists currently use parametric models and iterative techniques to remove the effects of volcano eruptions and ENSO variations from global temperature trends. Drawbacks of the parametric approach include the non-robustness of the results to the estimated values of the parameters, and the possible lack of fit of the data to the model. In this paper, we investigate ICA as an alternative method for separating independent sources in global temperature series. Instead of fitting parametric models, we let the data guide the estimation, and separate automatically the effects of the underlying sources. We first assess ICA on simple artificial datasets to establish the conditions under which ICA is feasible in our context, then we study its results on climate data from the National Centers for Environmental Predictions.
In this paper, we describe the use of data mining techniques to search for radio-emitting galaxies with a bent-double morphology. In the past, astronomers from the FIRST (Faint Images of the Radio Sky at Twenty-cm) survey identified these galaxies through visual inspection. This was not only subjective but also tedious as the on-going survey now covers 8000 square degrees, with each square degree containing about 90 galaxies. In this paper, we describe how data mining can be used to automate the identification of these galaxies. We discuss the challenges faced in defining meaningful features that represent the shape of a galaxy and our experiences with ensembles of decision trees for the classification of bent-double galaxies.
Advances in technology have enabled us to collect data from observations, experiments, and simulations at an ever increasing pace. As these data sets approach the terabyte and petabyte range, scientists are increasingly using semi-automated techniques from data mining and pattern recognition to find useful information in the data. In order for data mining to be successful, the raw data must first be processed into a form suitable for the detection of patterns. When the data is in the form of images, this can involve a substantial amount of processing on very large data sets. To help make this task more efficient, we are designing and implementing an object-oriented image processing toolkit that specifically targets massively-parallel, distributed-memory architectures. We first show that it is possible to use object-oriented technology to effectively address the diverse needs of image applications. Next, we describe how we abstract out the similarities in image processing algorithms to enable re-use in our software. We will also discuss the difficulties encountered in parallelizing image algorithms on the massively parallel machines as well as the bottlenecks to high performance. We will demonstrate our work using images from an astronomical data set, and illustrate how techniques such as filters and denoising through the thresholding of wavelet coefficients can be applied when a large image is distributed across several processors.