220 likes | 232 Views
Rule-Based Anomaly Detection on IP Flows. Nick Duffield, Patrick Haffner, Balachander Krishnamurthy, Haakon Ringberg. Unwanted traffic detection. Enterprise. Intrusion Detection Systems ( IDSes ) p rotect the edge of a network Inspect IP packets
E N D
Rule-Based Anomaly Detectionon IP Flows Nick Duffield, Patrick Haffner,Balachander Krishnamurthy, Haakon Ringberg
Unwanted traffic detection Enterprise • Intrusion Detection Systems (IDSes) protect the edge of a network • Inspect IP packets • Look for worms, DoS, scans, instant messaging, etc • Many IDSes leverage known signatures of traffic • e.g., Slammer packets contain “MS-SQL” (say) in the payload • or AOL IM packets use specific TCP ports and application headers
Packet and rule-based IDSs Benefits Programmable Leverage existing community Many rules already exist CERT, SANS Institute, etc Classification “for free” • A predicate is a boolean function on a packet feature • e.g., TCP port = 80 • A signature (or rule) is a set of predicates
Packet and rule-based IDSs Drawbacks • A predicate is a boolean function on a packet feature • e.g., TCP port = 80 • A signature (or rule) is a set of predicates Packet inspection at the edge requires deployment at many interfaces Too many packets per second
Drawbacks Packet and rule-based IDSs • A predicate is a boolean function on a packet feature • e.g., TCP port = 80 • A signature (or rule) is a set of predicates • Packet has: • Port number X, Y, or Z • Contains pattern “foo” within the first 20 bytes • Contains pattern “ba*r” within the first 40 bytes Packet inspection at the edge requires deployment at many interfaces Too many packets per second DPI predicates can be computationally expensive
Our idea: IDS on IP flows • Efficient • Only fixed-offset rule predicates • More compact (no payload) • Flow collection infrastructure is ubiquitous • IP flows capture the concept of a connection How well can rule-based IDS’s be mimicked on IP flows?
Idea • IDS’es associate a “label” with every packet • An IP flow is associated with a set of packets • Our systems associates the labels with flows
Snort rule taxonomy Relies on features that cannot be exactly reproduced in the IP flow realm
Simple translation • Our systems associates the labels with flows • Simple rule translation would capture only flow predicates • Low accuracy or low applicability 9 Snort rule: • dst port = MS SQL • contains “Slammer” Slammer Worm Only flow predicates: • dst port = MS SQL
Machine Learning (ML) • Our systems associates the labels with flows • Leverage ML to learn mapping from “IP flow space” to label • IP flow space = src port * # packets * flags * duration ifraised : otherwise # packets src port
Boosting h1 h2 h3 Hfinal sign Boosting combines a set of weak learners to create a strong learner
Benefit of Machine Learning (ML) Slammer Worm Snort rule: Only flow predicates: ML-generated rule: • dst port = MS SQL • contains “Slammer” • dst port = MS SQL • dst port = MS SQL • packet size = 404 • flow duration • Rule translation would capture flow-only predicates • Low accuracy or low applicability • ML algorithms discover new predicates that capture the rule • Latent correlations between predicates • Capturing same subspace using different dimensions
Architecture Operate at a small # of interfaces Use ML algorithms to learn to classify on IP flows Apply learned classifiers across all/other interfaces
Evaluation • Border router on OC-3 link • Used Snort rules in place • Unsampled NetFlow v5 and packet traces • Statistics • One month, 2 MB/s average, 1 billion flows • 400k Snort alarms
Accuracy metrics • Receiver Operator Characteristic (ROC) • Full FP vs TP tradeoff • But need a single number • Area Under Curve (AUC) • Average Precision 1 - p AP of p FP per TP p
Classifier accuracy 5 FP per 100 TP 43 FP per 100 TP • Training on week 1, testing on week n • High degree of accuracy for header and meta • Minimal drift within a month
Difference in rule accuracy • Accuracy is a function of correlation between flow and packet-level features
Choosing an operating point • X = alarms we want raised • Z = alarms that are raised Y Precision Exactness Z X Z Y Y Recall Completeness X • AP is a single number, but not most intuitive • Precision & recall are useful for operators • “I need to detect 99% of these alarms!”
Choosing an operating point • AP is a single number, but not most intuitive • Precision & recall are useful for operators • “I need to detect 99% of these alarms!”
Computational efficiency • Machine learning (boosting) • 33 hours per rule for one week of OC48 • Classification of flows • 57k flows/sec 1.5 GHz Itanium 2 • Line rate classification for OC48
Conclusion • Applying Snort alarms to flows is feasible • ML algorithms discover latent correlations between packet and flow predicates • High degree of accuracy for many rules • Minimal drift within a month • Prototype can scale up to OC48 speeds • Qualitatively predictive rule taxonomy • Future work • Performance on sampled NetFlow • Cross-site training /classification
Thank you! • Questions? Nick Duffield, Patrick Haffner,Balachander Krishnamurthy, Haakon Ringberg