Machine learning classifiers can be used to discover the patterns hidden within large data sets, and one of the largest datasets is the information being passed through a network every day. Many information technology applications have been proposed and also used to classify network traffic [1, 2, 3, 4, 5]. Intrusion detection systems (IDS) monitor the system or network events and detect violations or threats to computer security policies, acceptable use policies, or standard security practices , and are one of the most significant counter measures [7, 8, 9, 10] against security threats. Intrusions can be found using signature based detection of known threats, but there are also anomaly detectors. Signature based detectors look for specific log entries or a specific payload in a data packet known to be indicative of misuse.
The IDS monitors the network traffic from a system or through a network and looks for any abnormal behavior in the network activity which indicates a possibility of unwanted and malicious network traffic and take appropriate action if such situation occurs. The IDS uses signature detection for specific known threats or anomaly detection for unknown threats to analyze the data. However, many unknown threats are merely updated versions of known threats. Since machine learning techniques can determine whether new threats are similar to known threats, there is the potential to combine anomaly detection with approximate signature detection. One of the most significant aspects of an IDS is the use of artificial intelligence  to train the IDS about possible threats. The Intrusion Detection can gather information about the various traffic patterns and rules can be formed based on these patterns, to distinguish between normal traffic and anomalous traffic in the network. Machine learning techniques have the ability to generalize from limited, noisy data that is not complete to broader categories on new data. This generalization capability provides the potential to recognize patterns similar to known patterns but not exactly matching. The IDS should ideally recognize not only previously observed attacks but also future attacks that have not yet been seen .
Some significant contributions to IDS have been made using Fuzzy Logic. Fuzzy inference combined with artificial neural networks were used for real time traffic analysis by building a signature pattern database using protocol analysis and neuro-fuzzy learning techniques . Fuzzy rule-based classifiers for IDS were modeled . A fuzzy intrusion recognition engine (FIRE) used Fuzzy Logic and data mining techniques to produce fuzzy sets based on the input traffic data to detect security threats . Association-based classification of normal and anomalous attacks was performed on the basis of a compatibility threshold . Association rules along with data mining techniques and classification was used on suspicious events in real-time . Fuzzy rules gave the best detection rate when compared to linear generic programming, decision trees, and support vector machines on the DARPA 1998 dataset . Fuzzy logic with an expert system performed better than 91.5% detection rate over all attack types with a reduced complexity over traditional fuzzy number ranking techniques . Fuzzy adaptive resonance theory have also been used to implement network IDS  as well as fuzzy rules [21, 22].
A lot of work has been done on IDS using genetic algorithms. Genetic algorithms using both temporal and spatial information of the network connection during the encoding phase were used to identify anomalous network behaviors . Genetic algorithms were used to find the best possible fuzzy function and select the most significant network features . Genetic programming was used to derive classification rules with traffic data on the network . Multiple agent technology with genetic programming was used to detect anomalies in the network . A combination of information theory to filter the traffic data with genetic algorithms was used to detect anomalous behaviors in the network with reduced complexity .
Artificial neural networks are a popular machine learning technique, and it has been applied to IDS. A hybrid neural network was proposed using a combination of Self-Organizing Map (SOM) and Resilient Back-Propagation Neural Network (BPNN) . Another hybrid system using a BPNN and a C4.5 Decision Tree was built  which showed that the certain network attack types could not be detected without a hybrid system. A multi-layer artificial neural network was used to classify network activity . A multi-classification IDS system was built that showed a higher detection rate in each classification category than when only a single class was used to classify all non-normal data .
A system that can detect network intrusion while an attack is occurring is called a real-time detection system. There are very few real-time network IDS approaches. A real-time IDS using Self-Organizing Maps (SOM) to detect normal network activity and differentiate it from a DoS attack was proposed . A Bayesian classification model for anomaly detection was also built . A real-time IDS was built using two unsupervised neural network algorithms with a detection rate over 97%, separating normal traffic data from network attacks . A real-time network IDS using fuzzy association rules could separate the normal network activity from network attacks . A high-speed intrusion detection model using TCP/IP header information was built to detect denial of service (DoS) attacks .
One of the most widely used and well-known IDS is called SNORT, and it has become a standard in IDS . SNORT is a commercial tool that does not use machine learning, basing its detection on regular expressions that match to known signatures of network attacks. Its attack signature rules are available only to their registered customers. The signature rules or patches have to be frequently updated and installed in order to detect current attack types or variations in known attack types.
Although some researchers are investigating real-time IDS with machine learning techniques, most of the work is based on accurate learning without good real-time performance measures and without good generalization capabilities. This paper reports its processing speed and results with SNORT as a standard.
Many researchers have proposed IDS classification algorithms based on machine learning techniques, but they have used older datasets from DARPA and others to evaluate their approaches. This dataset used is a network packet dataset consisting of normal network activity as well as many network attack types. The dataset is based on the DARPA98 dataset from MIT Lincoln laboratory, which provides answer class (labeled data) for evaluation of intrusion detection . This dataset was created in 1998 and lacks of many current attack types. This paper uses current signatures from an IDS as an oracle for machine learning to form a new, faster IDS with the generalization capabilities of a machine learning built in. This avoids the work of manually labeling a dataset and provides more current signature information, but the quality of the initial IDS information determines the baseline for the new artificial intelligence based IDS.
Network packets are small collections of text. An N-gram can be used to break up the text into series of letters of a specified length to be used for classification . This proven text processing technique is used on the network packets. However, unlike a word or a sentence, an entire network packet can be mapped into a large space of possible n-grams in series. This maps the network data packet into a very high dimensional space where machine learning can be challenging. The high dimensional space can be hashed into a lower dimensional space without losing the ability to directly match the same packet [44, 45]. However, the hash is not a unique identifier and other similar packets may have the same hash. In the high dimensional space, the network packet was unique, but that is lost when hashing down to a lower dimensional space. This approximation makes the system run faster and makes the machine learning classify better, but the approximation can result in a large number of false positives if the dimension of the hash is too small. For the networking data set that was chosen, the use of more than eight bits for the hash length was sufficient to minimize the false positives and provide nearly perfect true positives at 99.9% using a linear classifier. However, the runtime is affected by the size of the hash length, staying at nearly constant speed for hashes as long as eleven bits, but increasing exponentially after that. The runtime of the learned system was always faster and often much faster than SNORT .
There are multiple advantages to a machine-learning based system over a signature-based system. A signature-based system needs to store attack signatures and download new signatures when they are updated, while a machine-learning system merely updates the weights on its classifier. A signature system can be difficult to parallelize with a shared signature database, while a machine-learned system can run multiple instances due to its lightweight nature. The speed of a machine learned system was better, and that advantage only increases with the growth in the size of the signature database to search over. The machine-learned system did have slightly more false positives and did not give detailed information about the true positives, so a version of SNORT should be run on the output from the machine-learned system for labeling .
The primary advantage of a machine learning system over a signature system is the ability to generalize to new but similar data. This was the dream of machine learning with an IDS, that the IDS should ideally recognize not only previously observed attacks but also future attacks that have not yet been seen . There are some systems that can generalize their detection well from learned attack patterns to new attack patterns , especially on probing attacks . This system also has some ability to generalize to patterns not seen in the training data, and this was seen anecdotally in this project. Five-fold cross training was done for the linear classifier, and attacks that were only in one testing fold were still recognized. However, the detection of new attacks was not the primary goal of this project and was just the nice result of utilizing machine learning techniques. However, this project does present the path for analyzing the ability of machine learning to detect new attack types since it uses SNORT as the oracle for labeling the data. By training using an earlier version of SNORT as the oracle for training and a current version as the oracle for testing, the ability to detect new signatures can be analyzed across a large amount of time. However, a more interesting dataset may be necessary as well as improved categorization of the detections for analysis.
NORMALCY CLASSIFIER AND A HYBRID INTRUSION DETECTION SYSTEM
This project aimed at prototyping a faster, less resource-intensive version of SNORT. The current network IDS setup is shown in Figure 1. The front-end with the capability to replicate the detections of a SNORT oracle are in place, with a slightly higher false positive rate. However, the labeling and analysis of detection is not implemented, so a version of SNORT should be run on the detections for labeling and analysis of the suspicious network traffic.
A hybrid system will outperform SNORT as a standalone in speed since the high percentage of network traffic will be classified as normal and not sent to the labeler, but the cost will be a marginal loss in accuracy. A hybrid system will also be more scalable, since additional normalcy classifiers can be run with significantly less overhead. The resulting system will produce the same level of labeling quality as SNORT since the abnormal traffic would be routed through SNORT. The level of false alarms would not rise since SNORT would be run on the abnormal output from the normalcy classifier. The hybrid system would run faster, scale more easily, and use far less resources than a series of SNORT instances. The cost is the slightly increased false negative rate. However, the abnormal output from the normalcy classifier may contain information about a new or unrecognized attack pattern. This output can be sent to an analyst or to an anomaly classifier.
Updates are one large advantage of a hybrid system over developing a brand-new system. Since the classifier is trained on SNORT outputs, a new signature inserted into SNORT can trigger retraining of the classifier and redistribution of the training weights. Additionally, the guarantee of no increase in false alarms as well as the consistent labeling when running the output through SNORT provide additional incentives for maintaining a hybrid system over building from scratch.
We report on a machine learning classifier that can be used to discover the patterns hidden within large networking data flows. It utilizes SNORT as an oracle to learn a faster, less resource intensive normalcy classifier as a front-end to a hybrid network intrusion detection system. This system has the capability to recognize new attacks that are similar to known attack signatures. It is also more highly scalable and distributable than SNORT. The new hybrid design also allows distributed updates and retraining of the normalcy classifier to stay up-to-date with current threats.
Chen, R. C., Cheng, K. F., & Hsieh, C. F. (2010). Using rough set and support vector machine for network intrusion detection. arXiv preprint arXiv: 1004.0567.Google Scholar
Scarfone, K., & Mell, P. (2007). Guide to intrusion detection and prevention systems (idps). NIST special publication, 800(2007), 94.Google Scholar
Yao, J. T., Zhao, S. L., & Saxton, L. V. (2005, March). A study on fuzzy intrusion detection. In Defense and Security (pp. 23–30). International Society for Optics and Photonics.Google Scholar
Bace, R. G. (2000). Intrusion detection. Sams Publishing.Google Scholar
Bobor, V. (2006). Efficient Intrusion Detection System Architecture Based on Neural Networks and Genetic Algorithms. Department of Computer and Systems Sciences, Stockholm University/Royal Institute of Technology, KTH/DSV.Google Scholar
Chavan, S., Shah, K., Dave, N., Mukherjee, S., Abraham, A., & Sanyal, S. (2004, April). Adaptive neuro-fuzzy intrusion detection systems. In Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004. International Conference on (Vol. 1, pp. 70–74). IEEE.Google Scholar
Dickerson, J. E., & Dickerson, J. A. (2000). Fuzzy network profiling for intrusion detection. In Fuzzy Information Processing Society, 2000. NAFIPS. 19th International Conference of the North American (pp. 301–306). IEEE.Google Scholar
Abraham, A., & Jain, R. (2005). Soft computing models for network intrusion detection systems. In Classification and clustering for knowledge discovery (pp. 191–207). Springer Berlin Heidelberg.Google Scholar
Li, W. (2004). Using genetic algorithm for network intrusion detection. Proceedings of the United States Department of Energy Cyber Security Group, 1–8.Google Scholar
Bridges, S. M., & Vaughn, R. B. (2000, October). Fuzzy data mining and genetic algorithms applied to intrusion detection. In Proceedings twenty third National Information Security Conference.Google Scholar
Crosbie, M., & Spafford, G. (1995, November). Applying genetic programming to intrusion detection. In Working Notes for the AAAI Symposium on Genetic Programming (pp. 1–8). MIT, Cambridge, MA, USA: AAAI.Google Scholar
Xia, T., Qu, G., Hariri, S., & Yousif, M. (2005, April). An efficient network intrusion detection method based on information theory and genetic algorithm. In Performance, Computing, and Communications Conference, 2005. IPCCC 2005. 24th IEEEInternational (pp. 11–17). IEEE.Google Scholar
Jirapummin, C., Wattanapongsakorn, N., & Kanthamanon, P. (2002, July). Hybrid neural networks for intrusion detection system. In Proceedings of International Conference on Circuits, Computers and Communications (pp. 928–931).Google Scholar
Pan, Z. S., Chen, S. C., Hu, G. B., & Zhang, D. Q. (2003, November). Hybrid neural network and C4. 5 for misuse detection. In Machine Learning and Cybernetics, 2003 International Conference on (Vol. 4, pp. 2463–2467). IEEE.Google Scholar
Moradi, M., & Zulkernine, M. (2004, November). A neural network based system for intrusion detection and classification of attacks. In Proceedings of the 2004IEEE international conference on advances in intelligent systems-theory and applications.Google Scholar
Ngamwitthayanon, N., Wattanapongsakorn, N., Charnsripinyo, C., & Coit, D. W. (2008). Multi-stage network-based intrusion detection system using back propagation neural networks. In Asian International Workshop on Advanced Reliability Modeling (AIWARM), Taiwan (pp. 609–619).Google Scholar
Pukkawanna, S., Visoottiviseth, V., & Pongpaibool, P. (2007, November). Lightweight detection of DoS attacks. In Networks, 2007. ICON2007. 15th IEEE International Conference on (pp. 77–82). IEEE.Google Scholar
Lee, J. H., Lee, J. H., Sohn, S. G., Ryu, J. H., & Chung, T. M. (2008, February). Effective value of decision tree with KDD 99 intrusion detection datasets for intrusion detection system. In Advanced Communication Technology, 2008. ICACT 2008. 10th International Conference on (Vol. 2, pp. 1170–1175). IEEE.Google Scholar
Lee, W., Stolfo, S. J., & Mok, K. W. (1999, August). Mining in a data-flow environment: Experience in network intrusion detection. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 114–124). ACM.Google Scholar
Labib, K., & Vemuri, R. (2002). NSOM: A real-time network-based intrusion detection system using self-organizing maps. Networks and Security, 1–6.Google Scholar
Puttini, R. S., Marrakchi, Z., & Me, L. (2003, March). A Bayesian classification model for real-time intrusion detection. In AIP Conference Proceedings (pp. 150–162).Google Scholar
Chakrabarti, S., Chakraborty, M., & Mukhopadhyay, I. (2010, February). Study of snort-based IDS. In Proceedings of the International Conference and Workshop on Emerging Trends in Technology (pp. 43–47). ACM.Google Scholar
Brown, P. F., Desouza, P. V., Mercer, R. L., Pietra, V. J. D., & Lai, J. C. (1992). Class-based n-gram models of natural language. Computational linguistics, 18(4), 467–479.Google Scholar
Shi, Q., Petterson, J., Dror, G., Langford, J., Strehl, A. L., Smola, A. J., & Vishwanathan, S. V. N. (2009). Hash kernels. In International Conference on Artificial Intelligence and Statistics (pp. 496–503).Google Scholar
Shi, Q., Petterson, J., Dror, G., Langford, J., Smola, A., & Vishwanathan, S. V. N. (2009). Hash kernels for structured data. The Journal of Machine Learning Research, 10, 2615–2637.Google Scholar
Hwang, T. S., Lee, T. J., & Lee, Y. J. (2007, June). A three-tier IDS via data mining approach. In Proceedings of the 3rd annual ACM workshop on Mining network data (pp. 1–6). ACM.Google Scholar