230 likes | 451 Views
Network Intrusion Detection Using Random Forests. Jiong Zhang Mohammad Zulkernine School of Computing Queen's University Kingston, Ontario, Canada. Outline. Motivation Intrusion detection system Data mining meets intrusion detection Proposed architecture Challenges and solutions
E N D
Network Intrusion Detection Using Random Forests Jiong Zhang Mohammad Zulkernine School of Computing Queen's University Kingston, Ontario, Canada
Outline • Motivation • Intrusion detection system • Data mining meets intrusion detection • Proposed architecture • Challenges and solutions • Experimental results • Conclusion and future work Jiong Zhang and Mohammad Zulkernine
Motivation • Intrusion Prevention System (firewall) can not prevent all attacks. Intruder Victim Intruder Firewall Internet Jiong Zhang and Mohammad Zulkernine
Motivation (contd.) Statistical data for intrusions • Total losses of 2004 (reported): $141,496,560. • Source: FBI survey for Year 2004 • 50% of security breaches are undetected. • Source: FBI Statistics for Year 2000 Jiong Zhang and Mohammad Zulkernine
Intrusion Detection Techniques • Misuse Detection • Extracts patterns of known intrusions • Cannot detect novel intrusions • Has low false positive rate • Anomaly Detection • Builds profiles for normal activities • Uses the deviations from the profiles to detect attacks • Can detect unknown attacks • Has high false positive rate Jiong Zhang and Mohammad Zulkernine
Network Intrusion Detection System (NIDS) • Monitors network traffic to detect intrusions • Monitors more targets on a network • Detects some attacks that host-based systems miss • Does not affect network operations Jiong Zhang and Mohammad Zulkernine
Current NIDS Many current NIDSs (like snort) : • Rule-based • Unable to detect novel attacks • High maintenance cost Jiong Zhang and Mohammad Zulkernine
Rule Based vs. Data Mining • Rule based systems • Data mining based systems Intrusion Data Security Experts Rules Labeled Data Data Mining Engine Patterns Jiong Zhang and Mohammad Zulkernine
Data Mining Meets Intrusion Detection • Extract patterns of intrusions for misuse detection • Build profiles of normal activities for anomaly detection • Build classifiers to detect attacks • Some IDSs have successfully applied data mining techniques in intrusion detection Jiong Zhang and Mohammad Zulkernine
Proposed Architecture Networks Database (On line) Alarms Packets Audited data Sensors On-line Pre- Processors Detector Alarmer Feature vectors Patterns On line Off line Training data Feature vectors Data Set Off-line Pre- processor Pattern Builder Database (Off line) Architecture of the proposed NIDS Jiong Zhang and Mohammad Zulkernine
Random Forests • Unsurpassable in accuracy among the current data mining algorithms • Runs efficiently on large data set with many features • Gives the estimates of what features are important • No nominal data problem • No over-fitting Jiong Zhang and Mohammad Zulkernine
Imbalanced Intrusion • Problems • Higher error rate for minority intrusions • Some minority intrusions are more dangerous • Need to improve the performance for the minority intrusions • Proposed Solution • Down-sample the majority intrusions and over-sample the minority intrusions Jiong Zhang and Mohammad Zulkernine
Feature Selection • Essential for improving detection rate • Reduces the computational cost • Many NIDSs select features by intuition or the domain knowledge Jiong Zhang and Mohammad Zulkernine
Feature Selection over the KDD’99 Dataset • Calculate variable importance using random forests. • Select the 38 most important features in detection. Jiong Zhang and Mohammad Zulkernine
Some Features • The two most important features • Feature 3. service type, such as http, telnet, and ftp • Feature 23. count, # connections to the same host as the current one during past two seconds • The three least important features • Feature 7. land, 1 if connection is from/to the same host/port; 0 otherwise • Feature 20. num_outbound_cmds, # of outbound commands in an ftp session • Feature 21. is_hot_login, 1 if the login belongs to the “hot” list; 0 otherwise Jiong Zhang and Mohammad Zulkernine
Parameter Optimization for Random Forests • Optimize the parameter Mtry of random forests to improve detection rate. • Choose 15 as the optimal value, which reaches the minimum of the oob error rate. Jiong Zhang and Mohammad Zulkernine
Performance Comparison on the KDD’99 Dataset • Our approach provides lower overall error rate and cost compared to the best KDD’99 result. • Feature selection can improve the performance of intrusion detection. Jiong Zhang and Mohammad Zulkernine
Conclusion and Future Work • Random forests algorithm can help improve detection performance and select features. • Sampling techniques can reduce the time to build patterns and increase the detection rate of minority intrusions. • In future, we will focus on anomaly detection and a multiple classifier architecture. Jiong Zhang and Mohammad Zulkernine