430 likes | 631 Views
Rule-based Anomaly Detection on IP Flows. Nick Duffield, Partick Haffner, Balachander Krishnamurthy (AT&T) , Haakon Ringberg (Princeton Univ.) INFOCOM’09. Rule actions. protocol. Source IP & port. direction. Destination IP & port. Detail of rule. Message text. Packet size.
E N D
Rule-based Anomaly Detection on IP Flows Nick Duffield, Partick Haffner, Balachander Krishnamurthy (AT&T), Haakon Ringberg (Princeton Univ.) INFOCOM’09
Rule actions protocol Source IP & port direction Destination IP & port Detail of rule Message text Packet size Patterns in packet’s payload Snort • Snort is a powerful, flexible open source NIDS Rule-based Anomaly Detection on Packets • A Snort rule: • alertudp$EXTERNAL_NET any->$HOME_NET 1434 (msg:"MS-SQL version ove…"; dsize:>100; content:"|04|"; …) Speaker: Li-Ming Chen
Challenge for deployingSnortover a Large Network (e.g., a Tier-1 ISP) • Deploy at the edge: • Network scale is huge • Deployment issues • Deploy at the core: • Links capacity is high • Performance issues • Hundreds of rules may need to be operated concurrently for each packet Speaker: Li-Ming Chen
Idea: Rules for IP Flows ! • Does it possible to construct rules at the flow level that accurately reproduce the action of packet-level rules ? • e.g., alerts should be raised for a flow, if some packets of this flow trigger packet-level rules • Why? • Easy to have IP flows • ISPs already collect flow statistics ubiquitously (e.g., NetFlow) • More scalable Speaker: Li-Ming Chen
Think about Rules for IP Flows… (1/2) • If packet-level rule looks like: • alertudp$EXTERNAL_NET any->$HOME_NET 1434 (msg:"MS-SQL version ove…";dsize:>100;content:"|04|"; …) • In flow-level, maybe we can do: • AlertUDP flows come from $EXTERNAL_NET to $HOME_NET at port 1434 with mean packet size larger than 100 • Yes, we ignore the content !! • Although we don’t know the exact packet size, we can measure mean packet size of each flow !? • What’s the detection accuracy !? Speaker: Li-Ming Chen
Think about Rules for IP Flows… (2/2) • What about packet-level rule is: • alerticmpany any->any any (msg:"ICMP Dest. Unreachable Comm. Administratively Prohibited"; icode:13; itype:3; …) • In flow-level, what can do? • ICMP destination unreachable is generated by the host or its inbound gateway to inform the client that the destination is unreachable for some reason • e.g., every packet points to IP address A will trigger this event • Can we LEARN this kind of events? Speaker: Li-Ming Chen
Motivation & Goal • For NIDS, inspecting every packet would be ideal, but impractical • Signature-based NIDS has scale and performance problems • Goal: develop an architecture that can translate many existing packet signature to instead operate effectively on IP flows • Premise: flow statistics are compact and collected within most ISPs’ network Speaker: Li-Ming Chen
Build Flow Rules via Learning • Authors use machine learning (ML) approaches to learn the association between flow features and packet payload • Problem: • Flows: aggregate packet header information, while lose payload information • Flow rules: loss of accuracy !? • Does ML mitigate the impact of losing payload information !? Speaker: Li-Ming Chen
Outline • Motivation & Goal • Packet Rule Classification • Packet Rules Flow Rules • Dataset & Evaluation Methodology • Experimental Results • Real Deployment Issues • Conclusion & My Comments Speaker: Li-Ming Chen
Why to classify packet rules?Packet Rule Classification (1/3) • Not all packet rules can be effectively learned… • Using a taxonomy of packet rules to understand their impacts, and • Evaluate the performance of proposed ML-method • For example: • ML-method can learn perfectly …? • ML-method is likely to learn very well …? • The accuracy of ML-method varies based on the nature of the rule…? Speaker: Li-Ming Chen
What kinds of predicates in a packet rule?Packet Rule Classification (2/3) • 3 set of predicates consist a packet rule • FH (flow header): packet fields exactly reported in the flow record • PP (packet payload): content signature • MI (meta information): other packet header information that is reported either inexactly or not at all in the flow record alertudp$EXTERNAL_NET any->$HOME_NET 1434 (msg:"MS-SQL version ove…"; dsize:>100; content:"|04|"; …) (FH) (FH) (FH) (FH) (FH) (MI) (PP) Speaker: Li-Ming Chen
How to classify packet rules?Packet Rule Classification (3/3) • Partition packet rules into disjoint classes • Classify rules based on types of predicates present Other rules (noPP, do have MI, may include FH) Rules comprise onlyFH predicates rule Rules include at least onePP predicates Speaker: Li-Ming Chen
Outline • Motivation & Goal • Packet Rule Classification • Packet Rules Flow Rules • Dataset & Evaluation Methodology • Experimental Results • Real Deployment Issues • Conclusion & My Comments Speaker: Li-Ming Chen
Rules in Practice FH, MI & PP • Snort rules: • A Boolean formula composed of predicates that check for specific values of various fields present in the IP header, transport header, and payload • Features used to construct flow rules in this paper: • Src. port, Dst. port, • Src. IP address, Dst. IP address, • #packets, #bytes, mean packet size, • duration, mean packet interarrival time, • TCP flags, protocol, ToS. Speaker: Li-Ming Chen
… Packet Rules Flow Rules Packets Snort Snort alerts e.g., NetFlow IP flows Build training data ML -method Flow rules (associate the packet alert with the corresponding flow) Speaker: Li-Ming Chen
Packet Rules Flow Rules (detailed) • For eachSnort rule, • training data (xi, yi), flow i has flow • features xi, and yi = {–1, 1} indicates • where flow i triggered this snort rule. • then we can run ML algo. by minimizing • the classification error: Snort Snort alerts Assign each Snort rule a score Give each feature a weight. Learn these weights to minimize training error. Build training data ML -method (xi, yi) Flow rules Speaker: Li-Ming Chen
Learning Flow Rules • Note that • A single packet may raise multiple Snort alerts • individual flows can be associated with many Snort alerts • Machine learning algorithms • Choose AdaBoost as the candidate algorithm • Due to, actual number of features is large • AdaBoost use incremental greedy training procedure to only adds features needed for finer discrimination • Good generalization (than SVM) • Low level of noise in the training data Speaker: Li-Ming Chen
Outline • Motivation & Goal • Packet Rule Classification • Packet Rules Flow Rules • Dataset & Evaluation Methodology • Experimental Results • Real Deployment Issues • Conclusion & My Comments Speaker: Li-Ming Chen
Dataset (during Aug ~ Sep 2005) OC-3 link • 29 days (4 weeks) • Total: >106 flows, >5 TBytes. • Average rate: 2 MBytes/sec. • Average: 14.5 pkt/flow. • 55% of flows comprised 1 pkt ! • For machine learning: • Week 1: training • Week 2: training & testing • Week 3 & 4: testing border router (all) Packets unsampled NetFlow IP flows Speaker: Li-Ming Chen
Dataset (learning performance…!?) Number of flows (106) per week • Further speedup: • Remove deterministic features reduce # of training data • 1) remove flows whose source is part of local network • 2) Snort rules only apply to a single protocol train for specific protocol (TCP, UDP, ICMP) Normal flows: Anomalous flows: (Neg: True Negative, Pos: True Positive) Amount of unique examples is small ( speed up training) Speaker: Li-Ming Chen
Evaluation Criteria • A detection is a boolean action (T or F ?) • For each rule, we get a confidence score after testing by a classifier • require an threshold to determine T or F • Use precision and recall as evaluation criteria • Precision = TPk/(TPk + FPk) • Average Precision => value closer to 1 is better ! Speaker: Li-Ming Chen
Evaluation Methodology • Focus on 21 most triggered rules over wk 1 & 2 • Refer to next slide! • Compare the AP (Avg. Precisions) for: • 1) Baseline behavior • Training on one full week and testing on the subsequent week • E.g., wk1-2 training on wk 1 and testing on wk 2. • 2) Data drift • Determine how often re-training should be applied (e.g., wk1-3) • 3) Sampling of negative example • Normal flows are the majority • Reduce normal flows keep accuracy while reduce training time !? Speaker: Li-Ming Chen
1 3 4 9 10 15 20 (Snort alerts) Show the complexity of a unique flow ICMP content? flag size flag See alert details Speaker: Li-Ming Chen
1 3 4 9 10 15 20 Header • Data Draft: • 2-week drift is acceptable • 3-week drift loss of performance • especially for Meta-Info & Payload Meta-Info Payload Payload rules show great variability Speaker: Li-Ming Chen
1 3 4 9 10 15 20 Header • Sampling of Negative (normal) Example: • measurable loss in performance • while 6x faster in training Meta-Info Payload Speaker: Li-Ming Chen
What features are more important than others? Feature is removed during detection • Payload rules are hard to reproduced • in a flow setting. • some rules have several predicates • (that could be learned) Speaker: Li-Ming Chen
Outline • Motivation & Goal • Packet Rule Classification • Packet Rules Flow Rules • Dataset & Evaluation Methodology • Experimental Results • Real Deployment Issues • Conclusion & My Comments Speaker: Li-Ming Chen
Architecture • Other issues: • Can rules learned from a site be used for other sites? • Some flow features (e.g., duration) are link/network dependent… Speaker: Li-Ming Chen
Other issues • Computational efficiency • Initial correlation of Flows and Snort Alarms • AdaBoost parameter setup, and learning time • Run-time classification Speaker: Li-Ming Chen
Conclusion Speaker: Li-Ming Chen
My Comments Speaker: Li-Ming Chen
Back to evaluation Appendix – 21 Snort Rules used in this paper From snort-rules-version
Header (1/2) Back to evaluation • 1) alert icmp any any -> any any (msg:"ICMP Destination Unreachable Communication Administratively Prohibited"; icode:13; itype:3; classtype:misc-activity; sid:485; rev:4;) • 2) alert icmp any any -> any any (msg:"ICMP Destination Unreachable Communication with Destination Host is Administratively Prohibited"; icode:10; itype:3; classtype:misc-activity; sid:486; rev:4;) Speaker: Li-Ming Chen
Header (2/2) • 3)alert icmp $EXTERNAL_NET any -> $HOME_NET any (msg:"ICMP Source Quench"; icode:0; itype:4; classtype:bad-unknown; sid:477; rev:2;) Speaker: Li-Ming Chen
Meta-Information (1/3) • 4) alert icmp $EXTERNAL_NET any -> $HOME_NET any (msg:"ICMP webtrends scanner"; icode:0; itype:8; content:"|00 00 00 00|EEEEEEEEEEEE"; reference:arachnids,307; classtype:attempted-recon; sid:476; rev:4;) • 5)alert tcp $EXTERNAL_NET any -> $HOME_NET any (msg:"BAD-TRAFFIC data in TCP SYN packet"; flow:stateless; dsize:>6; flags:S,12; reference:url,www.cert.org/incident_notes/IN-99-07.html; classtype:misc-activity; sid:526; rev:11;) Speaker: Li-Ming Chen
Meta-Information (2/3) • 6) alert icmp $EXTERNAL_NET any -> $HOME_NET any (msg:"ICMP Large ICMP Packet"; dsize:>800; reference:arachnids,246; classtype:bad-unknown; sid:499; rev:4;) • 7) alert icmp $EXTERNAL_NET any -> $HOME_NET any (msg:"ICMP PING NMAP"; dsize:0; itype:8; reference:arachnids,162; classtype:attempted-recon; sid:469; rev:3;) Speaker: Li-Ming Chen
Meta-Information (3/3) • 8) alert tcp $EXTERNAL_NET any -> $HOME_NET any (msg:"SCAN FIN"; flow:stateless; flags:F,12; reference:arachnids,27; classtype:attempted-recon; sid:621; rev:7;) • 9) 111 || 8 || spp_stream4: FIN Stealth Scan • gid: 111 Snort Pre-processor, 4th stream pre-processor • alert id: 8 Speaker: Li-Ming Chen
Payload (1/6) • 10) alert udp $EXTERNAL_NET any -> $HOME_NET 1434 (msg:"MS-SQL version overflow attempt"; flowbits:isnotset,ms_sql_seen_dns; dsize:>100; content:"|04|"; depth:1; reference:bugtraq,5310; reference:cve,2002-0649; reference:nessus,10674; classtype:misc-activity; sid:2050; rev:8;) • 11) alert tcp $AIM_SERVERS any -> $HOME_NET any (msg:"CHAT AIM receive message"; flow:to_client; content:"*|02|"; depth:2; content:"|00 04 00 07|"; depth:4; offset:6; classtype:policy-violation; sid:1633; rev:6;) Speaker: Li-Ming Chen
Payload (2/6) • 12) 2376 || EXPLOIT ISAKMP first payload certificate request length overflow attempt || bugtraq,9582 || cve,2004-0040 • 13) 483 || ICMP PING CyberKit 2.2 Windows || arachnids,154 • 14) 480 || ICMP PING speedera Speaker: Li-Ming Chen
Payload (3/6) Speaker: Li-Ming Chen
Payload (4/6) Speaker: Li-Ming Chen
Payload (5/6) Speaker: Li-Ming Chen
Payload (6/6) Speaker: Li-Ming Chen