380 likes | 578 Views
Automated Signature Extraction for High Volume Attacks. Yehuda Afek Anat Bremler -Barr Shir Landau Feibish. This work is part of the Kabarnit –Cyber Consortium (2012-2014) under Magnet program, funded by the chief scientist in the Israeli ministry of Industry, Trade and Labor.
E N D
Automated Signature Extraction for High Volume Attacks YehudaAfek AnatBremler-Barr Shir Landau Feibish • This work is part of the Kabarnit–Cyber Consortium (2012-2014) under Magnet program, funded by the chief scientist in the Israeli ministry of Industry, Trade and Labor. • This research was also partly supported by European Research Council (ERC) Starting Grant no. 259085.
Infrastructure-level DDoS attacks Server-level DDoS attacks Bandwidth-level DDoS attacks Current DDoS Attack Zombies on innocent computers
High volume attacks - Current Defense Many different types of attackers: • Remaining attacks: • Botnets (millions of computers) • Hard to identify behaviorally, under the radar screen • Zero-day – no known signatures Defense Line1 Defense Line 2 Defense Line 3 Defense Line n Call for HELP!! … SYN cookies, Challenge-response access control list filtering behavioral analysis
Signature based DDoS Attack Detection • Unknown (zero-day) attacks: • Some hope: Attack tools usually leave some unique footprint (repeating pattern) • Example in packet: Connection: KEEP-ALIVE • Today: Find signatures manually (human eye) • Our goal: Find it automatically • Signatures used by anti-DDoS devices and firewalls to stop attack • Mitigation in minutes, good enough for these types of attacks
Signatures also used in • NIDS/IPS (Snort, Bro, etc.) • Worm detection (automated extraction) • Previous work: • Worm behavior (address dispersion, suspicious code, etc.) • Fixed-length signatures • Non-scalable • Notable works: • Kephart et al ‘94 • Honeycomb [Kreibich et al ’04] • Earlybird [Singh et al ‘04] • Autograph[Kim et al ’04] • Hancock[Griffin et al ’09]
System Overview Our Challenge: Automatically find signatures that appear frequently only during attack Where: Input collection: • In mitigation box (DDoS Guard/firewall/anti-DDoS etc.) • In the cloud – collect data from several collectors. Peace time traffic sample Signature Extraction Attack signatures e.g. Connection: KEEP-ALIVE Attack time traffic sample
Signature Extraction - High Level Signature Extraction Find frequent strings in peace time traffic Take only strings found in attack and not in peace Peace time traffic sample Attack signatures e.g. Connection: KEEP-ALIVE Attack time traffic sample Find frequent strings in attack time traffic
Our Goal Automatically find signatures that appear frequently only during attack Requirements: • Find minimal set of signatures • Some filtering devices have limited capacity • Allow signatures of varying lengths • Don’t include signatures found in legitimate traffic • Minimum false positives • Minimize space and time usage • Large amounts of data • Quick response
Finding Frequent Strings in Traffic • Input: Sequence of packets • Output: Strings that appear frequently in packets • Common Stringology solution: use suffix trees/arrays • too much space • Our solution uses heavy hitters Find frequent strings in peace time traffic Take only strings found in attack and not in peace Peace time traffic sample Attack signatures e.g. Connection: KEEP-ALIVE Attack time traffic sample Find frequent strings in attack time traffic
Heavy Hitters (Frequent Items) • Input: N values, integer v • Output: v values each appearing at least N/v times • Approximate solution: • Uses O(v) space! • One pass over input! • Known counter based HH Algorithms: • Misra & Gries 1982 • Lossy Counting – Monku and Motwani 2002 • Space saving - Metwally et al 2005 – currently using
Space saving Heavy Hitters [Metwally et al 2005] • Algorithm: • Maintain v values, and their counters.
Space saving Heavy Hitters [Metwally et al 2005] • Algorithm: • Maintain v values, and their counters. • If next value x is one of the v, increment its counter.
Space saving Heavy Hitters [Metwally et al 2005] • Algorithm: • Maintain v values, and their counters. • If next value x is one of the v, increment its counter. • Else take item with minimal counter c: • Replace value with x • New counter is c+1 • Error rate: N/v
Our Solution • Heavy hitters usually done on numbers… how do we use it for text? • k-grams: strings of length exactly k • Trivial idea: For each packet: • Take all k-grams (sliding window) • Do Heavy hitters on them • Fixed length not good enough • Either too short: cuts up longer signatures • Substring pollution - Too many heavy hitters for one signature • Or too long : noisy signatures abcabcadefgfsdghjghnfdghfgsdhfjs b1=abca k-grams b2 = bcab b3 = cabc
Our Solution: Double Heavy Hitters • Double Heavy Hitters algorithm: two separate instances of heavy hitters • Heavy Hitters 1: Find heavy hitters of k-grams • Heavy Hitters 2: Find heavy hitters of varying-length strings created during run of Heavy Hitters 1 Heavy Hitters 1 Heavy Hitters 2 Input to Heavy Hitters 1: k-grams Input to Heavy Hitters 2: strings Output is output of Heavy Hitters 2 k k …. k k k string string string k string k k string
Double Heavy Hitters Algorithm • While processing k-grams in Heavy Hitters1 • Find max run of k-grams: • Already in Heavy Hitters 1 • Counters of consecutive k-grams maintain predefined ratio • Create string • Insert into Heavy Hitters 2 k-grams: bcab abca cabc dabc bcab abca cabc abcd cdab abca bcda Is already in Heavy Hitters 1? N N N Y Y Y N N N N Y abca abcabc Check ratio
Double Heavy Hitters Algorithm • Example: Input: abcabcabcd
Double Heavy Hitters Algorithm • Example: Input: abcabcabcd String = abca
Double Heavy Hitters Algorithm • Example: Input: abcabcabcd String = abcab
Double Heavy Hitters Algorithm • Example: String = abcabc Input: abcabcabcd
Double Heavy Hitters Algorithm • Example: String = abcabc Input: abcabcabcd
Heavy Hitters on text – improving the estimation • Problem: substrings in heavy hitters • Only longest run is in input to HH2 • Correct the count: • After run of algorithm • For all strings s in Heavy Hitters 2: • Find other strings which contain s and add their counters to s’s counter
Double Heavy Hitters Algorithm Analysis • Input: • Input to HH1: N k-grams • Input to HH2: C consecutive grams • Error bounds: • For HH1 with v items: N/v • For HH2 with v items: C/v • We Prove: • C ≤ N/(k + 1) • Overall: Error bound of the Double Heavy Hitters algorithm
Signature Extraction - High Level Signature Extraction Find frequent strings in peace time traffic Take only strings found in attack and not in peace Peace time traffic sample Attack signatures e.g. Connection: keep-ALIVE Attack time traffic sample Find frequent strings in attack time traffic Formalize with thresholds
Chose Signatures • Create signatures that never appear in legitimate traffic Thresholds: Attack-high Peace-low Peace-high Delta Strings in attack with frequency > Attack-High
Chose Signatures • Create signatures that never appear in legitimate traffic Thresholds: Attack-high Peace-low Peace-high Delta Strings in attack with frequency > Attack-High Strings in peace time Signatures False positives
Chose Signatures • Create signatures that rarely appear in legitimate traffic Thresholds: Attack-high Peace-low Peace-high Delta Strings in attack with frequency > Attack-High Strings in peace with frequency > Peace-Low Signatures False positives
Chose Signatures • Create signatures that may appear in legitimate traffic, but appear in attack traffic much more Thresholds: Attack-high Peace-low Peace-high Delta Strings in attack with frequency > Attack-High frequency > Peace-high frequency > Peace-Low Signatures Signatures only if attack frequency at least delta more than peace frequency False positives
Use peace traffic to create filters • Use our Double Heavy Hitters algorithm on peace time traffic: Double Heavy Hitters Algorithm 100% Peace time traffic packets payload: White list abcabcadefgfsdghjghnfdghfg...... b2 = bcab b3 = cabc …… b1=abca Peace-high Maybe white list Output values 50% frequency > Peace-high frequency > Peace-high frequency > Peace-Low Peace-low Not white list 0%
Extracting Attack Signatures • Now use Double Heavy Hitters algorithm on attack time traffic with filters Heavy Hitters 1 Heavy Hitters 2 Attack traffic packets payload: hagdhdadjashdklahdjkasfjasbfjabfhfgahfvhsbdfjkasnkiaywtqyeffcgfacsdxasdbas frequency > Attack-High …… b2 = agdh b3 = gdhd b1=hagd Output values Signatures string Maybe white list: White list: discard if contained in whitelist string Modified DHH
Evaluations • Overall eleven tests: • Ten real attack captures • 5 captures of peacetime traffic • 5 synthetic peacetime captures • One Synthetic attack in real peace • time traffic • Compare to human expert
Sample Signatures Could not be identified manually • Extra newline between header fields • Use of upper-case characters, where usually lower • Use of a rarely used HTTP field • Use of rare user agent.
Results – Accuracy of Double Heavy Hitters estimation • Graph of frequency of signatures • RED – Actual count (frequency) in attack traffic • BLUE – Algorithm (DHH) estimation of frequency of signatures Percent Signatures
Results - Attack Rate Estimation Tests with synthetic peace time traffic Tests with real peace time traffic Attack rate Test Number
Results – Recall and Precision Estimation Precision: relevant packets from all identified Tests with real peace time traffic Tests with synthetic peace time traffic Recall: identified packets from all relevant Average: 99.96 Worst case: 99.8 Percent Test Number
Future Work • Identify signatures always found in same packets • Good synthetic peace-time traffic, global white-list • Support regular expression signatures