In this paper, Difference-Similitude Set Theory is proposed which is fundamental different from the known algorithms. Two new concepts in this subject, “Available Member Set” and “Upper-approximate Set” are defined. In the framework of Difference Similitude Set Theory (DSST), the knowledge reduction on any information system is equivalent to the serials calculations on difference set and similitude set of both the whole information system and each object. It is concluded and demonstrated that: 1. The attribute reduction process is just to find an upper-approximate set of the available set of difference set with minimal cardinality as possible. 2. The rule construction of certain object is just to find an upper-approximate set of available set of the object’s difference set with minimal cardinality as possible. The algorithms about these two jobs are also described. The first process can be separated into two steps: a) to find the base attribute. b) to find an upper-approximate set of available member set of no-base set. The union of this two steps’ results is the reserved attribute. The second process can be also separated into two steps: a) to find the object’s base attribute. b) to find an upper-approximate set of the object’s no-base set. By comparing the overlapness of each rule and rejecting the object one by one which the reserved rules suit, the synthesized rules as less as possible should be found.
Support Vector Regression is a well established robust method for function estimation. The Support Vector Machine uses inner-product kernels between support vectors and the input vectors to transform the nonlinear classification and regressions problem to a linear version.function where the surface is approximated with a linear
combination of the kernel function evaluated at the support vectors. In many applications, the number of these support vectors can be quite large which can increase the length of the prediction phase for large data sets. Here we study a technique for reducing the number of support vectors to achieve comparable function estimation accuracy. The method identifies support vectors that are close to the ε-tube and uses them to approximate the function estimate of the original algorithm.
Exploratory data-driven methods such as unsupervised clustering and independent component analysis (ICA) are considered to be hypothesis-generating procedures, and are complementary to the hypothesis-led statistical inferential methods in functional magnetic resonance imaging (fMRI). In this paper, we present a comparison between unsupervised clustering and ICA in a systematic fMRI study. The comparative results were evaluated by a very detailed ROC analysis.
For the fMRI data, a comparative quantitative evaluation between the three clustering techniques, SOM, "neural gas" network, and fuzzy clustering based on deterministic annealing, and the three ICA methods, FastICA, Infomax and topographic ICA was performed. The ICA methods proved to extract features relatively well for a small number of independent components but are limited to the linear mixture assumption. The unsupervised clustering outperforms ICA in terms of classification results but requires a longer processing time than the ICA methods.
Structure from motion is a technique that attempts to reconstruct the 3D structure of a scene from a sequence of images taken from a camera moving within the scene. Structure from motion can be used on an Unmanned Aerial Vehicle or Unmanned Ground Vehicle for obstacle detection as well as for path-planning and navigation. The 3D structure of the scene is estimated using the optical flow values found at a set of feature points on the image. Typically, this is done under the assumption that all of the optical flow values have the same level of accuracy and that the accuracy of each optical flow value in the horizontal and vertical directions is the same. These assumptions are not entirely correct. The accuracy of different optical flow values are not the same and the accuracy can be measured and quantified using an optical flow probability distribution. We will present a novel structure from motion algorithm that is more accurate and more robust than other methods because it uses optical flow probability distributions. We will present one method that is designed to work on only two frames and another method designed to work on any number of frames.
A priori information given by the complete modelling of the ballistic behavior (trajectory, attitude) of the projectile is simplified to give a pertinent reduced evolution model. An algorithm based on extended Kalman filters is designed to determinate: • position: x,y,z references in earth frame. • value and direction of the velocity vector; its direction is given by 2 angles (η and θ). • attitude around velocity vector given by 3 angles: roll angle in the range [0, 2π], angle of attack α and side-slip angle β in the range of few milliradians. The estimation is based on the measures of the magnetic field of the earth given by a three-axis magnetometer sensor embedded on the projectile. The algorithm also needs the knowledge of the direction of the earth magnetic fields in the earth frame and aerodynamics coefficients of the projectile. The algorithm has been tested on simulation, using real evolution of attitude data for a shot with a 155 mm rotating projectile over a distance of 16 km, with wind and measurement noise. The results show that we can estimate milliradians with non-linear equations and approximations, with good precision.
Clinical diagnosis is often supported by a wide variety of medical images. Different images with their own specialized information complement each other. Biomedical image registration addresses the mapping of two or more images of same patient. In this paper, a Distance Transform (DT) is applied to the edge of the natural brain tissue in the CT image. Then the registration based on Chamfer matching algorithm is accomplished by using the edges of the natural brain tissue in MR-T1 and MR-T2 images as masks. When two images are precisely matched, the fusion of CT and MR is performed. The example used in this paper is the registration of CT and MR images of a brain tumor patient. After matching and fusion, the resultant image shows clear structure for all tissue and distinct contrast between gray matter and white matter. The tumor part in the final image pops out. Also, the details which show how the tumor presses the surrounding brain parenchyma are vivid. The algorithm can also be used as a reference for the matching of other medical images such as SPECT, PET, DSA, etc.
With the proliferation of online resources, there is an increasing need to effectively and efficiently retrieve data and knowledge from distributed geospatial databases. One of the key challenges of this problem is the fact that geospatial databases are usually large and dynamic. In this paper, we address this problem by developing a large scale distributed intelligent foraging, gathering and matching (I-FGM) framework for massive and dynamic information spaces. We assess the effectiveness of our approach by comparing a prototype I-FGM against two simple controls systems (randomized selection and partially intelligent systems). We designed and employed a medium-sized testbed to get an accurate measure of retrieval precision and recall for each system. The results obtained show that I-FGM retrieves relevant information more quickly than the two other control approaches.
Many situations involving strategic interaction between agents involve a continuos set of choices. Therefore it is natural to model these problems using continuous space games. Consequently the population of agents playing the game will be represented with a density function defined over the continuous set of strategy choices.
Simulating evolution of this population is a challenging problem. We present a method for simulating replicator dynamics in continuos space games using sequential Monte Carlo methods. The particle approach to density estimation provides a computationally efficient way of modeling the evolution of probability density functions.
Finally a resampling step and smoothing methods are used to prevent particle degeneration problem associated with particle estimates. Finally we compare and contrast the resulting algorithm for simulation with genetic programming techniques.
A genetic program (GP) based data mining (DM) procedure has been developed that automatically creates decision theoretic algorithms. A GP is an algorithm that uses the theory of evolution to automatically evolve other computer programs or mathematical expressions. The output of the GP is a computer program or mathematical expression that is optimal in the sense that it maximizes a fitness function. The decision theoretic algorithms created by the DM algorithm are typically designed for making real-time decisions about the behavior of systems. The database that is mined by the DM typically consists of many scenarios characterized by sensor output and labeled by experts as to the status of the scenario. The DM procedure will call a GP as a data mining function. The GP incorporates the database and expert’s rules into its fitness function to evolve an optimal decision theoretic algorithm. A decision theoretic algorithm created through this process will be discussed as well as validation efforts showing the utility of the decision theoretic algorithm created by the DM process. GP based data mining to determine equations related to scientific theories and automatic simplification methods based on computer algebra will also be discussed.
Numerous applications of topical interest call for knowledge discovery and classification from information that may be inaccurate and/or incomplete. For example, in an airport threat classification scenario, data from heterogeneous sensors are used to extract features for classifying potential threats. This requires a training set that utilizes non-traditional information sources (e.g., domain experts) to assign a threat level to each training set instance. Sensor reliability, accuracy, noise, etc., all contribute to feature level ambiguities; conflicting opinions of experts generate class label ambiguities that may however indicate important clues. To accommodate these, a belief theoretic approach is proposed. It utilizes a data structure that facilitates belief/plausibility queries regarding “ambiguous” itemsets. An efficient apriori-like algorithm is then developed to extract frequent such itemsets and to generate corresponding association rules. These are then used to classify an incoming “ambiguous” data instance into a class label (which may be “hard” or “soft”). To test its performance, the proposed algorithm is compared with C4.5 for several databases from the UCI repository and a threat assessment application scenario.
The purpose of our work is to mine streaming data from a variety of hundreds of automotive sensors in order to develop methods to minimize driver distraction from in-vehicle communications and entertainment systems such as audio/video devices, cellphones, PDAs, Fax, eMail, and other messaging devices. Our endeavor is to create a safer driving environment, by providing assistance in the form of warning, delaying, or re-routing, incoming signals if the assistance system detects that the driver is performing, or is about to perform, a critical maneuver, such as passing, changing lanes, making a turn, or during a sudden evasive maneuver. To accomplish this, our assistance system relies on maneuver detection by continuously evaluating various embedded vehicle sensors, such as speed, steering, acceleration, lane distance, and many others, combined into representing an instance of the “state” of the vehicle. One key issue is how to effectively and efficiently monitor many sensors with constant data streams. Data streams have their unique characteristics and may produce data that is not relevant or pertinent to a maneuver. We propose an adaptive sampling method that takes advantage of these unique characteristics and develop algorithms that attempt to select relevant and important instances to determine which sensors to monitor and how to provide quick and effective responses to this type of mission critical situations. This work can be extended to many similar sensor applications with data streams.
Starting with the supposition that the product of smart sensors - whether autonomous, networked, or fused - is in all cases information, it is shown here using information theory how a metric Q, ranging between 0 and 100%, can be derived to assess the quality of the information provided. The analogy with student grades is immediately evident and elaborated. As with student grades, numerical percentages suggest more precision than can be justified, so a conversion to letter grades A+ to D- is desirable. Owing to the close analogy with familiar academic grades, moreover, the new grade is a measure of effectiveness (MOE) that commanders and decision makers should immediately appreciate and find quite natural, even if they do not care to follow the methodology behind the performance test, as they focus on higher-level strategic matters of sensor deployment or procurement. The metric is illustrated by translating three specialist performance tests - the Receiver Operating Characteristic (ROC) curve, the Constant False Alarm Rate (CFAR) approach, and confusion matrices - into letter grades for use then by strategists. Actual military and security systems are included among the examples.
Traditional centralized approaches to security are difficult to apply to multi-agent systems which are used nowadays in e-commerce applications. Developing a notion of trust that is based on the reputation of an agent can provide a softer notion of security that is sufficient for many multi-agent applications. Our paper proposes a mechanism for computing reputation of the trustee agent for use by the trustier agent. The trustier agent computes the reputation based on its own experience as well as the experience the peer agents have with the trustee agents. The trustier agents intentionally interact with the peer agents to get their experience information in the form of recommendations. We have also considered the case of unintentional encounters between the referee agents and the trustee agent, which can be directly between them or indirectly through a set of interacting agents. The clustering is done to filter off the noise in the recommendations in the form of outliers. The trustier agent clusters the recommendations received from referee agents on the basis of the distances between recommendations using the hierarchical agglomerative method. The dendogram hence obtained is cut at the required similarity level which restricts the maximum distance between any two recommendations within a cluster. The cluster with maximum number of elements denotes the views of the majority of recommenders. The center of this cluster represents the reputation of the trustee agent which can be computed using c-means algorithm.
Recently there has been a renewed interest in the notion of deploying large numbers of networked sensors for applications ranging from environmental monitoring to surveillance. In a typical scenario a number of sensors are distributed in a region of interest. Each sensor is equipped with sensing, processing and communication
capabilities. The information gathered from the sensors can be used to detect, track and classify objects of interest. For a number of locations the sensors location is crucial in interpreting the data collected from those sensors. Scalability requirements dictate sensor nodes that are inexpensive devices without a dedicated localization
hardware such as GPS. Therefore the network has to rely on information collected within the network to self-localize. In the literature a number of algorithms has been proposed for network localization which uses measurements informative of range, angle, proximity between nodes. Recent work by Patwari and Hero relies on
sensor data without explicit range estimates. The assumption is that the correlation structure in the data is a monotone function of the intersensor distances. In this paper we propose a new method based on unsupervised learning techniques to extract location information from the sensor data itself. We consider a grid consisting of virtual nodes and try to fit grid in the actual sensor network data using the method of self organizing maps. Then known sensor network geometry can be used to rotate and scale the grid to a global coordinate system. Finally, we illustrate how the virtual nodes location information can be used to track a target.
Current urban operations require intelligent methods for integrating data and transmitting fused information to users. In this paper, we evaluate the capability to deliver accurate and timely data to both a commander and the user on the ground. The ground user requires data on immediate threats for rapid reaction, whereas the commander has time to reason over information on potential threats for preventative action. Using predicted data and information affords proactive decision making on anticipated threats. Proactive action includes gathering new information, relocating for safety, and hindering the opposition from action. Complexities abound with urban operations and sensor fusion strategies, which revolve around delivering quality information (i.e. timely, accurate, confident, high throughput, and minimal cost). New strategies are needed to account for high density targets, sensor obscurations, and rapid response to meet Sustainable and Security Operations (SASO). The purpose of this paper is to evaluate the inherent responsibility of the fusion system to deliver a consistent and succinct set of information over the appropriate time window. This paper with highlight (1) proactive use of sensor resources, (2) integration of users with fielded system, and (3) communication and decision making modeling to meet operational timeliness needs.
The advent of reliable spontaneous networking technologies (commonly known as wireless ad-hoc networks) has ostensibly raised stakes for the conception of computing intensive environments using intelligent devices as their interface with the external world. These smart devices are used as data gateways for the computing units. These devices are employed in highly volatile environments where the secure exchange of data between these devices and their computing centers is of paramount importance. Moreover, their mission critical applications require dependable measures against the attacks like denial of service (DoS), eavesdropping, masquerading, etc. In this paper, we propose a mechanism to assure reliable data exchange between an intelligent environment composed of smart devices and distributed computing units collectively called 'computational grid'. The notion of infosphere is used to define a digital space made up of a persistent and a volatile asset in an often indefinite geographical space. We study different infospheres and present general evolutions and issues in the security of such technology-rich and intelligent environments. It is beyond any doubt that these environments will likely face a proliferation of users, applications, networked devices, and their interactions on a scale never experienced before. It would be better to build in the ability to uniformly deal with these systems. As a solution, we propose a concept of virtualization of security services. We try to solve the difficult problems of implementation and maintenance of trust on the one hand, and those of security management in heterogeneous infrastructure on the other hand.
This paper introduces drift analysis approach in studying the convergence and hitting times of evolutionary algorithms. First the methodology of drift analysis is introduced, which links evolutionary algorithms with Markov chains or supermartingales. Then the drift conditions which guarantee the convergence of evolutionary algorithms are described. Finally the drift conditions which are used to estimate the hitting times of evolutionary algorithms are presented.
We present a probability-density-based data stream clustering approach which requires only the newly arrived data, not the entire historical data, to be saved in memory. This approach incrementally updates the density estimate taking only the newly arrived data and the previously estimated density. The idea roots on a theorem of estimator updating and it works naturally with Gaussian mixture models. We implement it through the expectation maximization algorithm and a cluster merging strategy by multivariate statistical tests for equality of covariance and mean. Our approach is highly efficient in clustering voluminous online data streams when compared to the standard EM algorithm. We demonstrate the performance of our algorithm on clustering a simulated Gaussian mixture data stream and clustering real noisy spike signals extracted from neuronal recordings.