Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

Hamsa: Fast Signature Generation for Zero-day Polymorphic Wormswith Provable Attack Resilience Lab for Internet & Security Technology (LIST)Northwestern University

The Spread of Sapphire/Slammer Worms

Desired Requirements for Polymorphic Worm Signature Generation • Network-based signature generation • Worms spread in exponential speed, to detect them in their early stage is very crucial… However • At their early stage there are limited worm samples. • The high speed network router may see more worm samples… But • Need to keep up with the network speed ! • Only can use network level information

Desired Requirements for Polymorphic Worm Signature Generation • Noise tolerant • Most network flow classifiers suffer false positives. • Even host based approaches can be injected with noise. • Attack resilience • Attackers always try to evade the detection systems • Efficient signature matching for high-speed links No existing work satisfies these requirements !

Outline • Motivation • Hamsa Design • Model-based Signature Generation • Evaluation • Related Work • Conclusion

Choice of Signatures • Two classes of signatures • Content based • Token: a substring with reasonable coverage to the suspicious traffic • Signatures: conjunction of tokens • Behavior based • Our choice: content based • Fast signature matching. ASIC based approach can archive 6 ~ 8Gb/s • Generic, independent of any protocol or server

Invariants Unique Invariants of Worms • Protocol Frame • The code path to the vulnerability part, usually infrequently used • Code-Red II: ‘.ida?’ or ‘.idq?’ • Control Data: leading to control flow hijacking • Hard coded value to overwrite a jump target or a function call • Worm Executable Payload • CLET polymorphic engine: ‘0\x8b’, ‘\xff\xff\xff’ and ‘t\x07\xeb’ • Possible to have worms with no such invariants, but very hard

Hamsa Architecture

Components from existing work • Worm flow classifiers • Scan based detector [Autograph] • Byte spectrum based approach [PAYL] • Honeynet/Honeyfarm sensors [Honeycomb]

Hamsa Design • Key idea: model the uniqueness of worm invariants • Greedy algorithm for finding token conjunction signatures • Highly accurate while much faster • Both analytically and experimentally • Compared with the latest work, polygraph • Suffix array based token extraction • Provable attack resilience guarantee • Noise tolerant

Hamsa Signature Generator • Core part: Model-based Greedy Signature Generation • Iterative approach for multiple worms

Maximize the coverage in the suspicious pool Suspicious pool Normal pool False positive in the normal pool is bounded by r Problem Formulation Signature Generator Signature false positive bound r Without noise, can be solve linearly using token extraction With noise NP-Hard!

t1 t2 Joint FP with t1 FP 21% 2% 9% 0.5% 17% 1% 5% Model Uniqueness of Invariants U(1)=upper bound of FP(t1) U(2)=upper bound of FP(t1,t2) The total number of tokens bounded by k*

(COV, FP) (82%, 50%) (70%, 11%) (67%, 30%) (62%, 15%) (50%, 25%) (41%, 55%) (36%, 41%) (12%, 9%) Signature Generation Algorithm token extraction t1 u(1)=15% tokens Suspicious pool Order by coverage

(COV, FP) (COV, FP) (82%, 50%) (69%, 9.8%) (68%, 8.5%) (70%, 11%) (67%, 1%) (67%, 30%) (40%, 2.5%) (62%, 15%) (35%, 12%) (50%, 25%) (41%, 55%) (31%, 9%) (36%, 41%) (10%, 0.5%) (12%, 9%) Signature Generation Algorithm Signature t1 t2 u(2)=7.5% Order by joint coverage with t1

Algorithm Runtime Analysis • Preprocessing need:O(m + n + T*l + T*(|M|+|N|)) • Running time: O(T*(|M|+|N|)) • In most case |M| < |N| so, it can reduce to O(T*|N|)

Provable Attack Resilience Guarantee • Proved the worse case bound on false negative given the false positive • Analytically bound the worst attackers can do! • Example: K*=5, u(1)=0.2, u(2)=0.08, u(3)=0.04, u(4)=0.02, u(5)=0.01 and r=0.01 • The better the flow classifier, the lower are the false negatives

Attack Resilience Assumptions • Common assumptions for any sig generation sys • The attacker cannot control which worm samples are encountered by Hamsa • The attacker cannot control which worm samples encountered will be classified as worm samples by the flow classifier • Unique assumptions for token-based schemes • The attacker cannot change the frequency of tokens in normal traffic • The attacker cannot control which normal samples encountered are classified as worm samples by the worm flow classifier

Attack Resilience Assumptions • Attacks to the flow classifier • Our approach does not depend on perfect flow classifiers • But with 99% noise, no approach can work! • High noise injection makes the worm propagate less efficiently. • Enhance flow classifiers • Cluster suspicious flows by return messages • Information theory based approaches (DePaul Univ)

Generalizing Signature Generation with noise • BEST Signature = Balanced Signature • Balance the sensitivity with the specificity • Create notation scoring function:score(cov, fp, …) to evaluate the goodness of signature • Current used • Intuition: it is better to reduce the coverage 1/a if the false positive becomes 10 times smaller. • Add some weight to the length of signature (LEN) to break ties between the signatures with same coverage and false positive

Hamsa Signature Generator Next: Token extraction and token identification

Token Exaction • Problem formulation: • Input: a set of strings, and minimum length l and minimum coverage COVmin • Output: • A set of tokens (substrings) meet the minimum length and coverage requirements • Coverage: the portion of strings having the token • Corresponding sample vectors for each token • Main techniques: • Suffix array • LCP (Longest Common Prefix) array, and LCP intervals • Token Exaction Algorithm (TEA)

Suffix Array • Illustration by an example • String1: abrac, String2: adabra • Cat together: abracadabra$ • All suffix: a$, ra$, bra$, abra$, dabra$… • Sort all the suffix: • 4n space • Sorting can be done in 4n space and O(nlog(n)) time

0-[0,10] 1-[0,4] 3-[5,6] 2-[9,10] 4-[1..2] LCP Array and LCP Intervals LCP intervals => tokens

Token Exaction Algorithm (TEA) • Find eligible LCP intervals first • Then find the tokens

Token Exaction Algorithm (TEA)

Token Identification • For normal traffic, pre-compute and store suffix array offline • For a given token, binary search in suffix array gives the corresponding LCP intervals • O(log(n)) time complexity • More sophisticated O(1) algorithm is possible, may require more space

Implementation Details • Token Extraction: extracta set of tokens with minimum length l and minimum coverage COVmin. • Polygraph use suffix tree based approach: 20n space and time consuming. • Our approach: Enhanced suffix array 8n space and much faster! (at least 20 times) • Calculate false positive when check U-bounds (Token Identification) • Again suffix array based approach, but for a 300MB normal pool, 1.2GB suffix array still large! • Optimization: using MMAP, memory usage: 150 ~ 250MB

Hamsa Signature Generator Next: signature refinement

Signature Refinement • Why refinement? • Produce a signature with same sensitivity but better specificity • How? • After we use the core algorithm to get the greedy signature, we believe the samples matched by the greedy signature are all worm samples • Reduce to a signature generation without noise problem. Do another round token extraction

Extend to Detect Multiple Worms • Iteratively use single worm detector to detect multiple worms • At the first iteration, the algorithm find the signature for the most popular worms in the suspicious pool. • All other worms and normal traffic treat as noise

Practical Issues on Data Normalization • Typical cases need data normalization • IP packet fragmentation • TCP flow reassembly (defend fragroute) • RPC fragmentation • URL Obfuscation • HTML Obfuscation • Telnet/FTP Evasion by \backspace or \delete keys • Normalization translates data into the canonical form

Practical Issues on Data Normalization (II) • Hamsa with data normalization works better • Without or with weak data normalization, Hamsa still work • But because the data many have different forms of encoding, may produce multiple signature for a single worm • Need sufficient samples for each form of encoding

Experiment Methodology • Experiential setup: • Suspicious pool: • Three pseudo polymorphic worms based on real exploits (Code-Red II, Apache-Knacker and ATPhttpd), • Two polymorphic engines from Internet (CLET and TAPiON). • Normal pool: 2 hour departmental http trace (326MB) • Signature evaluation: • False negative: 5000 generated worm samples per worm • False positive: • 4-day departmental http trace (12.6 GB) • 3.7GB web crawling including .mp3, .rm, .ppt, .pdf, .swf etc. • /usr/bin of Linux Fedora Core 4

Results on Signature Quality • Single worm with noise • Suspicious pool size: 100 and 200 samples • Noise ratio: 0%, 10%, 30%, 50%, 70% • Noise samples randomly picked from the normal pool • Always get above signatures and accuracy.

Results on Signature Quality (II) • Suspicious pool with high noise ratio: • For noise ratio 50% and 70%, sometimes we can produce two signatures, one is the true worm signature, anther solely from noise, due to the locality of the noise. • The false positive of these noise signatures have to be very small: • Mean: 0.09% • Maximum: 0.7% • Multiple worms with noises give similar results

Experiment: U-bound evaluation • To be conservative we chose k*=15. • u(k*)= u(15)= 9.16*10-6. • u(1) and ur evaluation • We tested:u(1) = [0.02, 0.04, 0.06, 0.08, 0.10, 0.20, 0.30, 0.40, 0.5] • and ur = [0.20, 0.40, 0.60, 0.8]. • The minimum (u(1), ur) works for all our worms was (0.08,0.20) • In practice, we use conservative value (0.15,0.5)

Speed Results • Implementation with C++/Python • 500 samples with 20% noise, 100MB normal traffic pool, 15 seconds on an XEON 2.8Ghz, 112MB memory consumption • Speed comparison with Polygraph • Asymptotic runtime: O(T) vs. O(|M|2), when |M| increase, T won’t increase as fast as |M|! • Experimental: 64 to 361 times faster (polygraph vs. ours, both in python)

Experiment: Sample requirement • Coincidental-pattern attack [Polygraph] • Results • For the three pseudo worms, 10 samples can get good results • CLET and TAPiON at least need 50 samples • Conclusion • For better signatures, to be conservative, at least need 100+ samplesRequire scalable and fast signature generation!

Token-fit Attack Can Fail Polygraph • Polygraph: hierarchical clustering to find signatures w/ smallest false positives • With the token distribution of the noise in the suspicious pool, the attacker can make the worm samples more like noise traffic • Different worm samples encode different noise tokens • Our approach can still work!

Noise samples Worm samples N1 W1 N2 W2 N3 W3 Merge Candidate 3 Merge Candidate 2 Merge Candidate 1 Token-fit attack could make Polygraph fail CANNOT merge further!NO true signature found!

Experiment: Token-fit attack • Suspicious of 50 samples with 50% noise • Elaborate different worm samples like different noise samples. • Results • Polygraph 100% false negative • Hamsa still can get the correct signature as before!

Related works

Conclusion • Network based signature generation and matching are important and challenging • Hamsa: automated signature generation • Fast • Noise tolerant • Provable attack resilience • Capable of detecting multiple worms in a single application protocol • Proposed a model to describe the worm invariants

Questions ?

Results on Signature Quality (II) • Suspicious pool with high noise ratio: • For noise ratio 50% and 70%, sometimes we can produce two signatures, one is the true worm signature, anther solely from noise. • The false positive of these noise signatures have to be very small: • Mean: 0.09% • Maximum: 0.7% • Multiple worms with noises give similar results

Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience