330 likes | 497 Views
Applied Anomaly Based IDS. Craig Buchanan University of Illinois at Urbana-Champaign CS 598 MCC 4/30/13. Outline. K-Nearest Neighbor Neural Networks Support Vector Machines Lightweight Network Intrusion Detection (LNID). K-Nearest Neighbor.
E N D
Applied Anomaly Based IDS Craig Buchanan University of Illinois at Urbana-Champaign CS 598 MCC 4/30/13
Outline • K-Nearest Neighbor • Neural Networks • Support Vector Machines • Lightweight Network Intrusion Detection (LNID)
K-Nearest Neighbor • “Use of K-Nearest Neighbor classifier for intrusion detection” [Liao, Computers and Security]
K-nearest neighbor on text • Categorize training documents into vector space model, A • Word-by-document matrix A • Rows = words • Columns = documents • Represents weight of each word in set of documents • Build vector for test document, X • Classify X into A using K-nearest neighbor
Text categorization • Create vector space model A • – weight of word i in document j • Useful variables • N – number of documents in the collection • M – number of distinct words in the collection • – frequency of word i in document j • – total number of times word i in the collection
Text categorization • Frequency weighting • Term frequency – inverse document frequency (tf*idf)
Text categorization • System call = “word” • Program execution = “document” • Close, execve, open, mmap, open, mmap, munmap, mmap, mmap, close, …, exit
Document Classification • Distance measured by Euclidean distance • – test document • – jth training document • – word shared by and • – weight of word in • – weight of word in
Anomaly detection • If X has unknown system call then abnormal • If X is the same as any Dj then normal • K-nearest neighbor • Calculate sim_avg for k-nearest neighbors • If sim_avg > threshold then normal • Else abnormal
Neural Networks • Intrusion Detection with Neural Networks [Ryan, AAAI Technical Report 1997] • Learn user profiles (“prints”) to detect intrusion
NNID System • Collect training data • Audit logs from each user • Train the neural network • Obtain new command distribution vector • Compare to training data • Anomaly if: • Associated with a different user • Not clearly associated with any user
Collect training data • Type of data • as, awk, bc, bibtex, calendar, cat, chmod, comsat, cp, cpp, cut, cvs, date, df, diff, du, dvips, egrep, elm, emacs, …, w, wc, whereis, xbiff++, xcalc, xdvi, xhost, xterm • Type of platform • Audit trail logging • Small number of users • Not a large target
Train Neural Network • Map frequency of command to nonlinear scale • 0.0 to 1.0 in 0.1 increments • 0.0 – never used • 0.1 – used once or twice • 1.0 – used > 500x • Concatenate values to 100-dimensional command distribution vector
Neural Network • 3-layer backpropagation architecture Input (x100) Hidden (x30) Output (x10)
Results • Rejected 63% random user vectors • Anomaly detection rate 96% • Correctly identified user 93% • False alarm rate 7%
Support Vector Machines • Intrusion Detection Using Neural Networks and Support Vector Machines [Mukkamala, IEEE 2002]
SVM IDS • Preprocess randomly selected raw TCP/IP traffic • Train SVM • 41 input features • 1 – normal • -1 – attack • Classify new traffic as normal or anomaly
Recent Anomaly-based IDS • An efficient network intrusion detection [Chen, Computer Communications 2010] • Lightweight Network Intrusion Detection (LNID) system
LNID Approach • Detect R2L and U2R • Assume attack is in first few packets • Calculate anomaly score of packets
Anomaly Score • Based on Mahoney’s network IDS [21-24] • M.V. Mahoney, P.K. Chan, PHAD: packet header anomaly detection for identifying hostile network traffic, Florida Institute of Technology Technical Report CS-2001-04, 2001. • M.V. Mahoney, P.K. Chan, Learning nonstationary models of normal network traffic for detecting novel attacks, in: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002a, pp. 276-385. • M.V. Mahoney, P.K. Chan, Learning models of network traffic for detecting novel attacks, Florida Institute of Technology Technical Report CS-2002-08, 2002b. • M.V. Mahoney, Network traffic anomaly detection based on packet bytes, in: Proceedings of the 2003 ACM Symposium on Applied Computing, 2003, pp. 346-350.
Anomaly Score (Mahoney) • = time elapsed since last time attribute was anomalous • = number of training or observed instances • = number of novel values of attribute
Anomaly Score (revised) • = number of training or observed instances • = number of novel values of attribute
Attributes • Attribute = packet byte • 256 possible values • 48 attributes (packet bytes) • 20 bytes of IP header • 20 bytes of TCP header • 8 bytes of payload
Results • Detection rate • Workload • LNID – 0.3% of traffic • NETAD – 3.16% of traffic • Lee et. al. – 100% of traffic
Results • Hard detected attacks