Anomaly/Intrusion Detection and Prevention in Challenging Network Environments

Yan Chen Department of Electrical Engineering and Computer Science Northwestern University Lab for Internet & Security Technology (LIST) http://list.cs.northwestern.edu Anomaly/Intrusion Detection and Prevention in Challenging Network Environments

The Spread of Sapphire/Slammer Worms

Current Intrusion Detection Systems (IDS) • Mostly host-based and not scalable to high-speed networks • Slammer worm infected 75,000 machines in <10 mins • Host-based schemes inefficient and user dependent • Have to install IDS on all user machines ! • Mostly simple signature-based • Inaccurate, e.g., with polymorphism • Cannot recognize unknown anomalies/intrusions

Current Intrusion Detection Systems (II) • Cannot provide quality info for forensics or situational-aware analysis • Hard to differentiate malicious events with unintentional anomalies • Anomalies can be caused by network element faults, e.g., router misconfiguration, link failures, etc., or application (such as P2P) misconfiguration • Cannot tell the situational-aware info: attack scope/target/strategy, attacker (botnet) size, etc.

Network-based Intrusion Detection, Prevention, and Forensics System • Online traffic recording [SIGCOMM IMC 2004, INFOCOM 2006, ToN 2007] [INFOCOM 2008] • Reversible sketch for data streaming computation • Record millions of flows (GB traffic) in a few hundred KB • Small # of memory access per packet • Scalable to large key space size (232 or 264) • Online sketch-based flow-level anomaly detection [IEEE ICDCS 2006] [IEEE CG&A, Security Visualization 2006] • Adaptively learn the traffic pattern changes • As a first step, detect TCP SYN flooding, horizontal and vertical scans even when mixed • Online stealthy spreader (botnet scan) detection [IEEE IWQoS 2007]

Network-based Intrusion Detection, Prevention, and Forensics System (II) • Polymorphic worm signature generation & detection [IEEE Symposium on Security and Privacy 2006] [IEEE ICNP 2007] • Accurate network diagnostics [SIGCOMM IMC 2003, SIGCOMM 2004, ToN 2007] [SIGCOMM 2006] [INFOCOM 2007 (2)] • Scalable distributed intrusion alert fusion w/ DHT [SIGCOMM Workshop on Large Scale Attack Defense 2006]

Network-based Intrusion Detection, Prevention, and Forensics System (III) • Large-scale botnet and P2P misconfiguration event situational-aware forensics [work under submission] • Botnet attack target/strategy inference • Root cause analysis of the P2P misconfiguration/poisoning traffic • NetShield: vulnerability signature based NIDS for high performance network defense [work in progress] • Vulnerability analysis of wireless network protocols and its defense [work in progress]

RAND system RAND system Internet Internet scan port LAN Internet LAN RAND system LAN Switch Switch Splitter Switch Splitter Router Router Switch Switch Router scan port LAN LAN Switch LAN (a) HPNAIDM system (b) (c) System Deployment • Attached to a router/switch as a black box • Edge network detection particularly powerful Monitor each port separately Monitor aggregated traffic from all ports Original configuration

NetShield: Matching with a Large Vulnerability Signature Ruleset for High Performance Network Defense

Outline Motivation Feasibility Study: a Measurement Approach High Speed Parsing High Speed Matching for Large Rulesets. Evaluation Conclusions

Focus of this work Motivation • Desired Features for Signature-based NIDS/NIPS • Accuracy (especially for IPS) • Speed • Coverage: Large ruleset Cannot capture vulnerability condition well! Shield [sigcomm’04]

Vision of NetShield

Research Challenges • Background • Use protocol semantics to express vulnerability • Protocol state machine & predicates for each state • Example: ver==1 && method==“put” && len(buf)>300 • Challenges • Matching thousands of vulnerability signatures simultaneously • Sequential matching  parallel matching • High speed parsing • Applicability for large NIDS/NIPS rulesets

Outline Motivation Feasibility Study: a Measurement Approach Given a large NIDS/NIPS ruleset, what percentage of the rules can be improved with protocol semantic vulnerability signatures? High Speed Parsing High Speed Matching for Large Rulesets. Evaluation Conclusions

Measure Snort Rules • Semi-manually classify the rules. • Group by CVE-ID • Manually look at each vulnerability • Results • 86.7% of rules can be improved by protocol semantic vulnerability signatures. • Most of remaining rules (9.9%) are web DHTML and scripts related which are not suitable for signature based approach. • On average 4.5 Snort rules are reduced to one vulnerability signature. • For binary protocol the reduction ratio is much higher than that of text based ones. • For netbios.rules the ratio is 67.6.

PDU array Observation • PDU  parse tree • Leaf nodes are integers or strings • Vulnerability signature mostly based on leaf nodes Only need to parse the fields related to signatures • Traditional recursive descent parsers (BINPAC) which need one function call per node are too expensive.

Problems Formulation • Data representations • For all the vulnerability signatures we studied, we only need integers and strings • Integer operators: ==, >, < • String operators: ==, match_re(.,.), len(.), • Buffer constraint • The string fields could be too long to buffer. • Field dependency • Array • Associate array • Mutual exclusive fields. • PDU level protocol state machine

Matching Problems (cont.) • Example signature for Blaster worm • Single PDU matching problem (SPM) • Multiple PDU matching problem (MPM)

Requirement of matching • Suppose we have n signatures, each is defined on k matching dimensions (matchers) • A matcher is a two-tuple (field, operation) or a four-tuple for the associate array elements. • Challenges for SPM • Large number of signatures n • Large number of matchers k • Large number of “don’t cares” • Cannot reorder the matchers arbitrarily (buffer constraint) • Field dependency • Array • Associate array • Mutually exclusive fields.

Observations • Observation 1: Most matchers are good. • After matching against them, only a small number of signatures can pass (candidates). • String matchers are all good, and most integer matchers are good. • We can buffer bad matchers to change the matching order. • Observation 2: Real world traffic mostly does not match any signature. Actually even stronger in most traffic, no matcher is met. • Observation 3: NIDS/NIPS will report all the matched rules regardless the ordering. Different from firewall rules.

Outline Motivation Feasibility Study: a Measurement Approach Problem Statement High Speed Parsing High Speed Matching for Large Rulesets. Evaluation Conclusions

Evaluation Methodology • Fully implemented and deployed to sniff a campus router hosting university Web servers and several labs. • Run on a P4 3.8Ghz single core PC w/ 4GB memory. • Much smaller memory usage. E.g., http 791 vulnerability sigs from 941 Snort rules: DFA: 5.29 GB vs. NetShield 1.08MB

Stress Test Results • Traces from Tsinghua Univ. (TH) and Northwestern Univ. (NU) • After TCP reassembly and preload the PDU in memory • For DNS we only evaluate parsing. • For WINRPC we have 45 vulnerability signatures which covers 3,519 Snort rules • For HTTP we have 791 vulnerability signatures which covers 941 Snort rules.

Conclusions • A novel network-based vulnerability signature matching engine • Through measurement study on Snort ruleset, prove the vulnerability signature can improve most of the signatures in NIDS/IPS. • Proposed parsing state machine for fast parsing • Propose a candidate selection algorithm for matching a large number of vulnerability signature simultaneously

With Our Solutions Ongoing work: apply NetShield on Cisco signature ruleset Build a better Snort alternative

Backup

Parsing State Machine • Studied eight popular protocols: HTTP, FTP, SMTP, eMule, BitTorrent, WINRPC, SNMP and DNS and vulnerability signatures. • Protocol semantic are context sensitive • Common relationship among leaf nodes.

Example for WINRPC • Rectangles are states • Parsing variables: R0 .. R4 • 0.61 instruction/byte for BIND PDU

Matching Algorithm • Match each matcher against all the rules and combine the results together • Match single matcher • Integer range checking: Binary search tree • String exact matching: Trie • String regular expression matching: DFA, XFA, etc. • String length checking: Binary search tree

Candidate Selection for SPM • Basic algorithm: pre-computation

Matching Illustration

Refinement • SPM improvement • Allow negative conditions • Handle array case • Handle associate array case • Handle mutual exclusive case • Report the matched rules as early as possible • Extend to MPM • Allow checkpoints.

1010101 10111101 11111100 00010111 Limitations of Regular Expression Signatures Signature: 10.*01 Traffic Filtering Internet Our network X X Polymorphism! Polymorphic attack (worm/botnet) might not have exact regular expression based signature

Reason Regular expression is not power enough to capture the exact vulnerability condition! Shield RE X Cannot express exact condition Can express exact condition

Outline Motivation Feasibility Study: a measurement approach Problem Statement High Speed Parsing High Speed Matching for massive vulnerability Signatures. Evaluation Conclusions

What Do We Do? • Build a NIDS/NIPS with much better accuracy and similar speed comparing with Regular Expression based approaches • Feasibility: in Snort ruleset (6,735 signatures) 86.7% can be improved by vulnerability signatures. • High speed Parsing: 2.7~12 Gbps • High speed Matching: • Efficient Algorithm for matching a large number of vulnerability rules • HTTP, 791 vulnerability signatures at ~1Gbps

Network based IDS/IPS • Accuracy (especially for IPS) • False positive • False negative • Speed • Coverage: Large ruleset Regular expression is not power enough to capture the exact vulnerability condition!

Anomaly/Intrusion Detection and Prevention in Challenging Network Environments