330 likes | 354 Views
Towards Scalable and Robust Distributed Intrusion Alert Fusion with Good Load Balancing. Zhichun Li, Yan Chen and Aaron Beach. Lab for Internet & Security Technology (LIST) http://list.cs.northwestern.edu Northwestern University. The Spread of CodeRed. Distributed IDSes.
E N D
Towards Scalable and Robust Distributed Intrusion Alert Fusion with Good Load Balancing Zhichun Li, Yan Chen and Aaron Beach Lab for Internet & Security Technology (LIST) http://list.cs.northwestern.eduNorthwestern University
Distributed IDSes • Distributed Intrusion Detection Systems (IDSes) • Crucial to identify large-scale attacks early • Robust to various scan techniques • Locate the attackers/zombies when spoofed • E.g, Symantec has 20,000 sensors in 180 countries • General architecture • IDS nodes • Generate the alarms • Heterogeneous: host- or network- based • Sensor fusion centers (SFCs) • Fuse the alarms • A subset of IDSes or dedicated hosts
Desired Features of DIDS Infrastructure • Scalability • 15 million daily intrusion alerts reported to DShield • Route only related alarms to the same SFC • Over 18,000 vulnerabilities found [CERT] • 17,500 Win32 threats and their variants [Symantec] • Hierarchical fusion cannot scale w/ diverse alerts • Distributed queries over multiple SFCs • Good load balancing • Attack resiliency
Outline • Motivation • CDDHT Design • Features of CDDHT • Evaluation • Related Work • Conclusion
Cyber Disease Distributed Hash Tables (CDDHT) • General intrusion alert fusion framework, can plug-in any alert generation or alert fusion algorithm • Part of the Router-based Anomaly/Intrusion Detection and Mitigation (RAIDM) system in LIST • High-speed network measurement with reversible sketches [IMC 2004, INFOCOM 2006] • Online flow-level anomaly/intrusion detection [IEEE ICDCS 2006] [IEEE CG&A, Security Visualization 06] • Router-based polymorphic worm signature generation [IEEE Symposium on Security and Privacy 2006]
CDDHT Design • Leverage DHT systems • O(log(n)) hops distance where n is the # of nodes • O(log(n)) maintenance overhead for routing • Guaranteed success for deterministic routing • Fault-tolerant, robust, and DoS attack resilient • Becoming increasingly popular for serious use • Eg, eMule P2P system uses Kademila • Primitives of CDDHT • Put (disease key, symptom report) • Summary report = Get (disease key)
DIDS Coverage IDS Node ID : “0” + sha2(IP of the IDS) IDS + SFC Node ID: “1” + sha2(IP of the IDS) Architecture of CDDHT Attack Injected Attack Injected Internet
Disease Key Design • Challenge: fuse the vast, diverse symptoms from heterogeneous IDSes with different views • Key generation in a decentralized and deterministic manner • Key idea: generate the disease keys which capture the uniqueness of certain attacks • Focus on popular types of attacks • Improve with features • Load balancing • Attack resilience
The Disease Key • Currently, model four types of attacks • Extensible design
Port Scan Disease Key Design • Vertical scan and block scan • Source IP • Horizontal scan and Coordinated scan • Scan port • Horizontal: + Source IP
Viruses/Worms and Botnets Disease Key Design • Viruses/Worms • Known worms: hash of the worm name • Unknown worms: worm scan port # • Botnets • Assume botnets use centralized C&C • IRC based bots: dynamic DNS • Web based bots: URL • Botnet ID = hash of the DDNS or URL
Outline • Motivation • CDDHT Design • Features of CDDHT • Evaluation • Related Work • Conclusion
Load Balancing • Challenges to load balancing • Large key space in DHT • Highly skewed alert distribution Number of ports picked Number of subnets picked
Load Balancing II • Proactive balancing with stable hot spots • Reduce key space of port # to 7 bits • 64 buckets for 64 most popular port # • Remaining 64 buckets randomly assigned to other port # • Balancing load of the key space • Node migration • Virtual node • Load-aware bootstrap • Balancing load of single hot key • IDS alarm rate limiting • Aggregation tree for large-scale attacks • Received alarms by the final SFC bounded by O(log(n))
Attack Resilience • DoS resilience comparison with hierarchical model • Proved the average number of alerts unreachable to their corresponding SFCs given one node loss • Hierarchical DIDS: O(log (n)) • CDDHT: O(1) • More in the paper • Authenticity of alarms • Dealing with compromised nodes
Outline • Motivation • CDDHT Design • Features of CDDHT • Evaluation • Related Work • Conclusion
Methodology • Implementation • Preliminary CDDHT system based on Chord simulator • Event-driven simulation • Each alarm is an event with a timestamp from certain IDSes • Datasets • DShield firewall logs (Jan. 2004) • Results from each day’s data are similar • Use January 2nd 2004 as illustration • 25 million scan logs from 1,417 providers • Randomly choose 10% to be SFCs
Evaluation Metrics • Fusion effectiveness • 100% due to deterministic routing of CDDHT • Load balancing • Consider number of alerts received at each SFC • Maximum vs. mean ratio (MMR) • Coefficient of variation (CV)
Proactive Balancing with Stable Hot Ports Proactive load balancing can reduce CV by 60% and reduce MMR by 40%
The Load Variation Comparison Between Hierarchical Scheme and CDDHT CDDHT w/ PB+VN CDDHT w/ PB+VN CDDHT CDDHT Hierarchical Hierarchical CDDHT w/ PB CDDHT w/ PB Median, 10- and 90- percentile of 10 runs CDDHT with proactive balancing (PB) and virtual nodes (VN) Compared with Hierarchical schemes, CDDHT reduces the MMR by a factor of 5.5 and CV by a factor of 5.2
Outline • Motivation • CDDHT Design • Features of CDDHT • Evaluation • Related Work • Conclusion
Related Works • WormShield uses DHT specifically to find popular content fingerprints as worm signatures, but does not work for polymorphic worms
Conclusion • Large number and diverse alerts from many distributed IDSes calls for efficient fusion of these alerts • CDDHT: Cyber Disease DHT • Efficient route alarms of different intrusions to different SFCs • Highly scalable and robust • Good load balancing • High attack resilience • Future work • Disease keys for more types of attacks and querying of CDDHT
node A node B Each node only stores part of the hash table node C node D Introduction to DHT • DHT (Distributed Hash Table): An infrastructure that enables the distribution of an ordinary hash table onto a set of cooperating nodes • Basic operations • Put(Key, Object) : From Key to find the corresponding node via DHT routing and store the Object on the node • Object=Get(Key) : From Key to find the corresponding node via DHT routing and retrieve the Object from the node
0 15 1 14 2 3 13 4 5 11 6 10 9 7 8 Introduction to DHT II • Different DHT systems • Chord • CAN • Pastry • Tapestry • Kademlia • Kademia has been used in eMule P2P software Chord DHT routing • DHT routing • Distributed and deterministic routing • The max hops to find the node corresponding to a key is bounded by O( log (n) )
DoS Attack Disease Key Design • Most DoS attack target specific IP addresses (the server) or the subnet (Bandwidth consuming attack) • But the victim IP (subnet) can be destination or source (in backscatter) • Other parts all can be variants
Related Works • Centralized/Hierarchical Model • Publish/subscribe Model • O(n2) communicate vs. O(n) • P2P Query • Scalability with frequent fusion
Attack Resilience • DoS resilience comparison with hierarchical model • Proved the average number of disconnected nodes given one node loss • in a k-way hierarchical DIDS is O(log (n)) • but the DHT based is O(1). • Authenticity of alarms • Valid the source subnets of IDS by Whois and BGP tables • Use PKI to verify the messages send by IDSes/SFCs
Attack Resilience II • Dealing with compromised nodes • IDS nodes • Voting the importance of the results by # of IDSes, IP coverages • Probability based verification for alarm aggregation • SFC nodes • The “trust but verify” principle • Envision that there is a centralized authority randomly check the fusion results for the SFCs
Proactive Balancing with Stable Hot Ports Use 7 bits encoding, can reduce MMR by 60% and reduce CV by 40%
Dynamic of Load Variation over Time MMR for CDDHT is much smaller and smoother CV also get better