Comparing the output of a physics simulation with an experiment is
often done by visually comparing the two outputs. In order to
determine which simulation is a closer match to the experiment, more
quantitative measures are needed. This paper describes our early
experiences with this problem by considering the slightly simpler
problem of finding objects in a image that are similar to a given
query object. Focusing on a dataset from a fluid mixing problem, we
report on our experiments using classification techniques from machine
learning to retrieve the objects of interest in the simulation data.
The early results reported in this paper suggest that machine learning
techniques can retrieve more objects that are similar to the query
than distance-based similarity methods.
In this paper, we describe the use of data mining techniques to search for radio-emitting galaxies with a bent-double morphology. In the past, astronomers from the FIRST (Faint Images of the Radio Sky at Twenty-cm) survey identified these galaxies through visual inspection. This was not only subjective but also tedious as the on-going survey now covers 8000 square degrees, with each square degree containing about 90 galaxies. In this paper, we describe how data mining can be used to automate the identification of these galaxies. We discuss the challenges faced in defining meaningful features that represent the shape of a galaxy and our experiences with ensembles of decision trees for the classification of bent-double galaxies.
Decision tress have long been popular in classification as they use simple and easy-to-understand tests at each node. Most variants of decision trees test a single attribute at a node, leading to axis- parallel trees, where the test results in a hyperplane which is parallel to one of the dimensions in the attribute space. These trees can be rather large and inaccurate in cases where the concept to be learned is best approximated by oblique hyperplanes. In such cases, it may be more appropriate to use an oblique decision tree, where the decision at each node is a linear combination of the attributes. Oblique decision trees have not gained wide popularity in part due to the complexity of constructing good oblique splits and the tendency of existing splitting algorithms to get stuck in local minima. Several alternatives have been proposed to handle these problems including randomization in conjunction wiht deterministic hill-climbing and the use of simulated annealing. In this paper, we use evolutionary algorithms (EAs) to determine the split. EAs are well suited for this problem because of their global search properties, their tolerance to noisy fitness evaluations, and their scalability to large dimensional search spaces. We demonstrate our technique on a synthetic data set, and then we apply it to a practical problem from astronomy, namely, the classification of galaxies with a bent-double morphology. In addition, we describe our experiences with several split evaluation criteria. Our results suggest that, in some cases, the evolutionary approach is faster and more accurate than existing oblique decision tree algorithms. However, for our astronomical data, the accuracy is not significantly different than the axis-parallel trees.