120 likes | 135 Views
This paper proposes a new benchmark data set for evaluating intrusion detection algorithms. The current KDD Cup 99 data set has limitations, and this proposal aims to address those issues and improve performance. The authors suggest using honeypots and sanitizing IP addresses to compare IDS alerts and honeypot traffic data.
E N D
A Proposal of New Benchmark Data to Evaluate Mining Algorithms for Intrusion Detection Jungsuk SONG†, Hiroki TAKAKURA‡, Yasuo OKABE‡ †Graduate School of Informatics, Kyoto Univ. ‡Academic Center for Computing and Studies, Kyoto Univ. oaktree@net.ist.i.kyoto-u.ac.jp, takakura@media.kyoto-u.ac.jp, okabe@i.kyoto-u.ac.jp
Overview • Introduction • Intrusion Detection System • Intrusion Detection Evaluation Data • KDD Cup 99 Data Set • Details • Problems • Our Experimental Result • Our Proposal 23rd Asia Pacific Advanced Network Meeting
Firewall Introduction • Intrusion Detection System(IDS) • combination of software and hardware that attempts to perform intrusion detection • raise the alarm when possible intrusion or suspicious patterns are observed IDS The Internet Intrusion Intrusion Attacker IDS Internal Network 23rd Asia Pacific Advanced Network Meeting
Introduction • Why we need IDS? • Unknown weakness or bugs • Complex, unforeseen attacks • Firewalls, security policies • Using information detected • Recover compromised system • Understand the attack mechanism • Detect novel attacks • Defend our systems 23rd Asia Pacific Advanced Network Meeting
Introduction • We need evaluation data for IDS • Performance improvement • Technical progress • Research guide… • KDD Cup 99 Data Set • Most commonly used evaluation data, but.. • Propose new benchmark data 23rd Asia Pacific Advanced Network Meeting
KDD Cup 99 Data Set • Modification of DARPA 1998 data set • DARPA 1998 data set • Managed by Lincoln Lab.(under DARPA sponsorship) • Simulated nine weeks of raw TCP dump data • Attacks • 38 different attacks against Unix/Linux machines • DoS, Scan, Buffer overflow and so on. • Normal traffic • 1000’s of virtual hosts and 100’s of user automata 23rd Asia Pacific Advanced Network Meeting
KDD Cup 99 Data Set • Each connection ⇒41-dimensions vector • Samples 5,tcp,smtp,SF,959,337,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00, 0.00,0.00,144,192,0.70,0.02,0.01,0.01,0.00,0.00,0.00,0.00,normal. 0,tcp,http,SF,54540,8314,0,0,0,2,0,1,1,0,0,0,0,0,0,0,0,0,2,2,0.00,0.00,0.00,0.00,1.0 0,0.00,0.00,118,118,1.00,0.00,0.01,0.00,0.00,0.00,0.02,0.02,back. 0,tcp,http_443,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,114,2,1.00,1.00,0.00,0.00,0.02 ,0.06,0.00,255,2,0.01,0.07,0.00,0.00,1.00,1.00,0.00,0.00,neptune. • Numerical: 34, Categorical: 7 • Basic feature:“duration”, “protocol”… • Statistical feature:“number of connections to the same host as the current connection in the past two seconds”… • Label ⇒“normal” or “name of attacks” 23rd Asia Pacific Advanced Network Meeting
KDD Cup 99 Data Set • Problems • Attacks • Can not reflect current malicious activities • Stealthy scan ⇒ short time interval, no multiple IP address scan • No attacks against Windows machines • Protocol types • Only TCP, UDP, ICMP • Can not detect attacks such as ARP Spoofing • Simplicity • Only 3 real victim hosts • 1000’s of virtual hosts and 100’s of user automata(custom software) 23rd Asia Pacific Advanced Network Meeting
Our Experimental Results • PCA(Principal Components Analysis) • Technique for reducing dimensions of data set • Transform the data to a new coordinate system • What we know from PCA • The number of dimensions that are actually required to represent the original data • Accumulative Contribution Ratio • Indicate what percentage of the original data can be represented • For example • 2 dimensions ⇒ 90% : represent 90% of the original data by them 23rd Asia Pacific Advanced Network Meeting
Our Experimental Results There is no guarantee their performance also will be good in real environment 23rd Asia Pacific Advanced Network Meeting
Our Proposal • New benchmark data • IDS • Honeypots • Privacy problems • Sanitize IP address • Remove payload data • Goal • Comparison analysis of IDS alert and Honeypots traffic data • Detect the attacks that are missed by IDS KDD Cup 99 form Open Update every month 23rd Asia Pacific Advanced Network Meeting
Thank you for your attention! 23rd Asia Pacific Advanced Network Meeting