190 likes | 385 Views
A Hybrid Anomaly Detection Model using G-LDA. Bhavesh Kasliwal , Shraey Bhatia, Shubham Saini, I.Sumaiya Thaseen , Ch.Aswani Kumar. VIT University – Chennai. Typical IDS. This work mainly focused on Intrusion Identification. Architecture. Attribute Selection.
E N D
A Hybrid Anomaly Detection Model using G-LDA BhaveshKasliwal, Shraey Bhatia, Shubham Saini, I.SumaiyaThaseen, Ch.Aswani Kumar. VIT University – Chennai
Typical IDS This work mainly focused on Intrusion Identification
Attribute Selection “With more data, the simpler solution can be more accurate than the sophisticated solution.” • Selection process based on means and modes of numeric attributes • A contrast between the mode values of anomaly and normal patterns with their corresponding means inclined towards the modes
Selected Attributes A strong contrast between the trends of a selected and discarded attribute visible
Training Set Selection (using LDA) • Latent Dirichlet Allocation is a generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar. • Apply LDA (separately on anomaly and normal packets) to obtain 200 sets of 10 packets each. Each set dominated by a particular packet type.
Sample LDA Output Topic 0th: 0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly 0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly 0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly 0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly 0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly 0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.26,0,0,0,0,anomaly 0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.26,0,0,0,0,anomaly 0,tcp,telnet,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,125,13,1,1,0,0,0.1,0.06,0,255,0.03,0.07,0,0,1,1,0,0,anomaly 0,tcp,uucp,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,135,9,1,1,0,0,0.07,0.06,0,255,0.04,0.07,0,0,1,1,0,0,anomaly 0,tcp,vmnet,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,258,10,1,1,0,0,0.04,0.05,0,255,0.04,0.05,0,0,1,1,0,0,anomaly Topic 1th: 0,tcp,finger,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,14,3,1,1,0,0,0.21,0.29,0,255,0.25,0.02,0.01,0,1,1,0,0,anomaly 0,tcp,finger,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,246,20,1,1,0,0,0.08,0.06,0,255,0.08,0.07,0,0,1,1,0,0,anomaly 0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.55,0.01,0.55,0,0,0,0,0,anomaly 0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.56,0.02,0.56,0,0.01,0,0,0,anomaly 0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.6,0.01,0.6,0,0,0,0,0,anomaly 0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.64,0.02,0.64,0,0,0,0.02,0,anomaly 0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.64,0.02,0.64,0,0,0,0,0,anomaly ………………
Genetic Algorithm • Applied on Normal and Anomaly packets separately • Threshold value taken for providing a negative weight • Run for 3 generations • Top 3 values for anomaly and normal packets used
Identifying nature of incoming packet • For each selected attribute value Fi in incoming packet • If Fi ∈ Vi • Si = (A* Frequency of Fi in Anomaly) – (Frequency of Fi in Normal) • Else • Si= 0 • C = Σ Si • If C > 0 • Then Anomaly • Else Normal
Additional Weight • Multiplied to the anomaly frequency • Why ? • generic anomalies having diverse values • unlike the normal packets that contain values in a particular range • Trade-off between the accuracy and • the false positive rate required
Results • Tested against 50000 anomaly and 50000 normal packets from KDDCup’99 dataset. • 88.5% Accuracy with 6% FPR
Future Work • Focus on specific anomaly types • Better Attribute Selection algorithm ? • oneR • Entropy based • Chi-squared • randomForest • Better classification technique ? • Clustering – Hierarchical , K-Means • Decision Trees
REFERENCES • Valeur, Fredrik, and Giovanni Vigna. Intrusion detection and correlation: challenges and solutions. Vol. 14. Springer, 2005. • Kim, Dong Seong, and JongSou Park. "Network-based intrusion detection with support vector machines." Information Networking. Springer Berlin Heidelberg, 2003. • Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent dirichlet allocation." the Journal of machine Learning research,Volume 3, pp.993-1022,2003. • Cramer, Christopher, and Lawrence Carin. "Bayesian topic models for describing computer network behaviors." Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on. IEEE, 2011. • Newton, Benjamin D. "Anomaly Detection in Network Traffic Traces Using Latent Dirichlet Allocation." • Li, Wei. "Using genetic algorithm for network intrusion detection." Proceedings of the United States Department of Energy Cyber Security Group,pp1-8,2004.
REFERENCES (Contd.) • Bing-Yi Zhang,Ya-Min Sun,Yu-Lan,Bian,HongKeZhang,”LinearDiscriminant Analysis in network traffic modeling”, International Journal of Communication Systems”,Volume 19,Issue 1,pp.53-65,2006. • A.Gomathy and B.Lakshmi,”Network intrusion detection using Genetic algorithm and Neural Network”, Communications in Computer and Information Science,Volume 198,pp.399-408,2011. • Siva S,SivathaSindhu,S.Geetha,A.Kannan,”Decision tree based light weight intrusion detection using a wrapper approach”,Expert Systems with applications,Volume 39,pp.129-141,2012. • B.Kavitha,S.Karthikeyan,P.SheebaMaybell,”An ensemble design of intrusion detection system for handling uncertainity using neutrosophiclogicclassifier”,Knowledge based systems,Volume 28,pp.88-96,2012. • Saini, Shubham, BhaveshKasliwal, and Shraey Bhatia. "Spam Detection using G-LDA." International Journal of Advanced Research in Computer Science and Software Engineering,Volume 3,Issue 10,pp.406-409,2013. • Cup, K. D. D. "Available on: http://kdd. ics. uci. edu/databases/kddcup 99/kddcup99. html.",2007.
REFERENCES (Contd.) • Phan, Xuan-Hieu, and Cam-Tu Nguyen. "Jgibblda: A java implementation of latent Dirichlet allocation (lda) using gibbs sampling for parameter estimation and inference”,2006. • Shekhar R Gaddam, Vir V Phoha and Kiran S Balagani,”A novel method for supervised anomaly detection by cascading K-Means clustering and ID3 deicsion tree learning methods”, IEEE transactions on knowledge and data engineering,Volume.19,pp.345-354,2007. • Amor, Nahla Ben, Salem Benferhat, and ZiedElouedi. “Naive Bayesvs decision trees in intrusion detection systems” Proceedings of the 2004 ACM symposium on Applied computing, pp.420-424,2004. • Benferhat, S. and Tabia, K., “On the combination of Naive Bayes and decision trees for intrusion detection”, International Conference on Intelligent Agents, Web Technologies and Internet Commerce,Volume 1, pp. 211–216,2006. • [17] Xiang, C., and Lim, S. M, “Design of multiple-level hybrid classifier for intrusion detection system”, IEEE Transaction on System, Man and Cybernetics, Part A: Cybernetics, Volume 2, pp.117–122,2005. • [18] SumaiyaThaseen and Ch. Aswani Kumar, “An Analysis of supervised tree based classifiers for intrusion detection system”, IEEE International Conference on Pattern Recognition, Informatics and Mobile Engineering (PRIME), February 2013.