Granular Computing(GrC) is an emerging theory which simulates the process of human brain understanding and solving
problems. Rough set theory is a tool for dealing with uncertainty and vagueness aspects of knowledge model. SMLGrC
algorithm introduces GrC to classical rough set algorithms, and makes the length of the rules relatively short but it can
not process mass data sets. In order to solve this problem, based on the analysis of the hierarchical granular model of
information table, the method of Granular Distribution List(GDL) is introduced to generate granule, and a granular
computing algorithm(SLMGrC) is improved. Sample Covered Factor(SCF) is also introduced to control the generation
of rules when the algorithm generates conflicting rules. The improved algorithm can process mass data sets directly
without influencing the validity of SLMGrC. Experiments demonstrated the validity and flexibility of our method.
Proc. SPIE. 6241, Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2006
KEYWORDS: Principal component analysis, Detection and tracking algorithms, Data modeling, Sensors, Error analysis, Data processing, Computer intrusion detection, Reconstruction algorithms, Systems modeling, Network security
Intrusion Detection Systems (IDSs) need a mass of labeled data in the process of training, which hampers the application and popularity of traditional IDSs. Classical principal component analysis is highly sensitive to outliers in training data, and leads to poor classification accuracy. This paper proposes a novel scheme based on robust principal component classifier, which obtains principal components that are not influenced much by outliers. An anomaly detection model is constructed from the distances in the principal component space and the reconstruction error of training data. The experiments show that this proposed approach can detect unknown intrusions effectively, and has a good performance in detection rate and false positive rate especially.
Practical Intrusion Detection Systems (IDSs) based on data mining are facing two key problems, discovering intrusion knowledge from real-time network data, and automatically updating them when new intrusions appear. Most data mining algorithms work on labeled data. In order to set up basic data set for mining, huge volumes of network data need to be collected and labeled manually. In fact, it is rather difficult and impractical to label intrusions, which has been a big restrict for current IDSs and has led to limited ability of identifying all kinds of intrusion types. An improved unsupervised clustering-based intrusion model working on unlabeled training data is introduced. In this model, center of a cluster is defined and used as substitution of this cluster. Then all cluster centers are adopted to detect intrusions. Testing on data sets of KDDCUP’99, experimental results demonstrate that our method has good performance in detection rate. Furthermore, the incremental-learning method is adopted to detect those unknown-type intrusions and it decreases false positive rate.
There are already some extensions of rough set theory for incomplete information systems, such as tolerance relation, limited tolerance relation, similarity relation, and etc. But there are no approaches and algorithms for these extensions. A direct approach for processing incomplete information systems is developed in this paper, including discretization, attribute reduction, value reduction, and rule matching. This approach can be used in all kinds of extensions of rough set theory for incomplete information systems. It is both effective in complete and incomplete information systems.
Intrusion detection is an essential component of critical infrastructure protection mechanism. Since many current IDSs are constructed by manual encoding of expert knowledge, it is time-consuming to update their knowledge. In order to solve this problem, an effective method for misuse intrusion detection with low cost and high efficiency is presented. This paper gives an overview of our research in building a detection model for identifying known intrusions, their variations and novel attacks with unknown natures. The method is based on rough set theory and capable of extracting a set of detection rules from network packet features. After getting a decision table through preprocessing raw packet data, rough-set-based reduction and rule generation algorithms are applied, and useful rules for intrusion detection are obtained. In addition, a rough set and rule-tree-based incremental knowledge acquisition algorithm is presented in order to solve problems of updating rule set when new attacks appear. Compared with other methods, our method requires a smaller size of training data set and less effort to collect training data. Experimental results demonstrate that our system is effective and more suitable for online intrusion detection.
One main technical means of anti-Spam is to build filters in email transfer route. However, the design of many junk mail filters hasn't made use of the whole security information in an email, which exists mostly in mail header rather than in the text and accessory. In this paper, data mining based on rough sets is introduced to design a new anti-Spam filter. Firstly, by recording and analyzing the header of every collected email sample, we get all necessary original raw data. Next, by selecting and computing features from the original header data, we obtain our decision table including several condition attributes and one decision attribute. Then, a data mining technique based on rough sets, which mainly includes relative reduction and rule generation, is introduced to mine this decision table. And we obtain some useful anti-Spam knowledge from all the email headers. Finally, we have made tests by using our rules to judge different mails. Tests demonstrate that when mining on selected baleful email corpus with specific Spam rate, our anti-Spam filter has high efficiency and high identification rate. By mining email headers, we can find potential security problems of some email systems and cheating methods of Spam senders.
Wavelet transforms via lifting scheme are called the second-generation wavelet transforms. However, in some lifting schemes the coefficients are transformed using mathematical method from the first-generation wavelets, so the filters with better performance using in lifting are limited. The spatial structures of lifting scheme are also simple. For example, the classical lifting scheme, predicting-updating, is two-stage, and most researchers simply adopt this structure. In addition, in most design results the lifting filters are not only hard to get and also fixed. In our former work, we had presented a new three-stage lifting scheme, predicting-updating-adapting, and the results of filter design are no more fixed. In this paper, we continue to research the spatial model of lifting scheme. A group of general multi-stage lifting schemes are achieved and designed. All lifting filters are designed in spatial domain and proper mathematical methods are selected. Our designed coefficients are flexible and can be adjusted according to different data. We give the mathematical design details in this paper. Finally, all designed model of lifting are used in image compression and satisfactory results are achieved.
Aiming at the demand of adaptive wavelet transforms via lifting, a three-stage lifting scheme (predict-update-adapt) is proposed according to common two-stage lifting scheme (predict-update) in this paper. The second stage is updating stage. The third is adaptive predicting stage. Our scheme is an update-then-predict scheme that can detect jumps in image from the updated data and it needs not any more additional information. The first stage is the key in our scheme. It is the interim of updating. Its coefficient can be adjusted to adapt to data to achieve a better result. In the adaptive predicting stage, we use symmetric prediction filters in the smooth area of image, while asymmetric prediction filters at the edge of jumps to reduce predicting errors. We design these filters using spatial method directly. The inherent relationships between the coefficients of the first stage and the other stages are found and presented by equations. Thus, the design result is a class of filters with coefficient that are no longer invariant. Simulation result of image coding with our scheme is good.
In this paper, we propose an approach that can generate logical rules from an information system. It is based on Pawlak's rough set theory. There are two steps in our rule generation approach. First, attribute reduction is done on an information table according to Skowron's discernibility matrix and logic function simplification, some important and valuable attributes are extracted. Then, value reduction is performed and corresponding logic rules are generated. All reducts including the minimal reduct of an information system can be obtained through these two reductions. Our approach can generate both the maximal generalized decision rules as well as potential interesting and useful rules according to requirements.