280 likes | 451 Views
Polygraph: Automatically Generating Signatures for Polymorphic Worms. Authors: James Newsome (CMU), Brad Karp (Intel Research), Dawn Song (CMU) Presenter: Abhishek Karnik. Background. IDSes block Internet Worm flows based on signatures based on a worms payload using strings matched on:
E N D
Polygraph:Automatically Generating Signatures for Polymorphic Worms Authors: James Newsome (CMU), Brad Karp (Intel Research), Dawn Song (CMU) Presenter: Abhishek Karnik
Background • IDSes block Internet Worm flows based on signatures based on a worms payload using strings matched on: • Fixed payload offsets • Arbitrary payload offsets • Regular expressions • Signatures generated manually by experts based on hours or days of observation • Recently researchers are giving attention to automating this slow process. [Honeycomb, Autograph, EarlyBird] • Automated signatures produced by extracting common byte patters across different suspicious flows
Previous Automated Methods • Signatures based on a single contiguous substring of sufficient length from a worms payload • Assumptions: • There exists a single payload substring that will remain invariant across worm connections specific to the worm • Invariant string is sufficiently long to be specific and does not occur in any non-worm payloads
Motivation • Future worms may by polymorphic and thus may evade such signatures based on single substrings. • Polymorphic obfuscator available which are capable of leaving nearly no multi-byte regions in common across its outputs.
Goal of Polygraph • Present algorithms and identify methods to generate automatic signatures suited for matching polymorphic worm payloads • Evaluate such algorithms to demonstrate that Polygraph produces signatures that exhibit low false negatives and false positives
Assumptions • A worm must exploit one or more specific server software vulnerabilities • A real-world exploit contain multiple disjoint invariant substrings in all variant payloads • Invariant bytes include protocol framing bytes, which allows the server to branch down the code path where a vulnerability exists and possibly overwrite a jump address
Approach – Exploits • Within a worm there are three classes of bytes: • Invariant bytes • Wildcard bytes • Code bytes • Over 15 software vulnerabilities spanning various OS’s and applications surveyed. Nearly All require invariant content in any exploit. • Two sources of Invariant content • Invariant Exploiting Frame • Invariant Overwrite Values
Approach - Examples • Apache-Knacher exploit
Approach - Examples • Lion Exploit
Architecture • Flow classifier reassembles flows and classifies them based on same IP and port number into innocuous and suspicious flows
Architecture • Identifying anomalous or suspicious traffic classified by use of honeypots or port scan activity. • Assumptions for Flow Classifier: • There maybe noise introduced during classification • Flow classifier does not distinguish between different worms this suspicious pool may contain a mixture of worms which may or may not be polymorphic
Signature Generator Goals • Signature quality – low false +ve’s for innocuous traffic and low –ve’s for wrm instances • Efficient signature generation • Efficient signature matching • Generation of small signature sets – small number of signatures • Robust against noise and multiple worms • Robust against evasion and subversion
Signature Algorithms • All signatures are built from substrings called tokens • Each signature is made of one or more tokens • Following algorithms extract and analyze tokens which are then used to create signatures • Token extraction eliminates irrelevant parts of suspicious flows • Preprocessing • Extract distinct substrings of minimum length ‘α’ that occur in at least K out of n samples in the suspicious pool – longest substring algorithms • Represent each suspicious flow as a sequence of tokens, and remove the rest of the payload.
Signature Algorithms • Conjunction Signatures • A signature that consists of all tokens in the set found in any order. • Matches multiple invariant tokens and is more specific than matching only one token alone. • The signature is the set of tokens. • Token-subsequence signatures • A signature that consists of an ordered set of tokens • Can be expressed using regular expressions • A signature is generated if the ordered subsequence of tokens is present in every sample in the suspicious pool.
Signature Algorithms • xxonexxxtwox – string 1 • oneyyyyyyytwoyyy – string2 • Longest subsequence is onetwo • String alignment used x x o n e x x x - - t w o x – - - - o n e y y y y y t w o y y y • Regular expression “.*one.*two.*” • An alignment is assigned a score by adding 1 and subtracting a gap penalty of Wg • “.*o.*n.*e.*z.*” has a value 4 – 3*.8 = 1.6 • “.*two.* has a value 3 – 0*.8 = 3
Signature Algorithms • Bayes Signatures • A probabilistic matching method • A signature consisting of a set of tokens each associated with a score and an overall threshold • Matching and construction is less rigid compared to conjunction and token based methods • Allows signatures to be learned from suspicious pools that contain samples of unrelated and innocuous worms • Classify a flow by the distribution from which its token set is more likely to be generated
Signature Algorithms • Pr[worm|x] / Pr[~worm|x] • Set a threshold so that the classifier reports +ve only if its surface is sufficiently far away from the decision boundary- Helps handling noise • Each item is assigned a score based on its probability or being from a certain pool. • Scores are added together and if the total is greater than the threshold the sample is classified as a worm.
Generating Multiple Signatures • Suspicious flows could contain more than one type of worm • Suspicious pool is divided into clusters each containing similar flows. • Signatures outputted per cluster • Quality of clusters • Clusters should not be too general • Clusters should not be too specific
Hierarchical Clustering • Used for token subsequence and conjunction algos. • Given s clusters initially, s signatures generated • Iteratively merge clusters producing a more sensitive signature • Determine what the merged signature might be and use innocuous flows to estimate false positives • Lower false +ve rate more specific the signature, more similar the two clusters • Stop clustering when any two clusters give a high false +ve rate of there is only one cluster
Experiments • K = 3 • α = 2 • Minimum cluster size = 3 • Network traces: Intel Research Pittsburg in October 2004 • DNS traces from a major academic institution • Intel Pentium III running on Linux 2.4.20
Conclusions • Polygraph works for polymorphic worms • Content variability is limited by nature of the software vulnerability • Use multiple, disjoint strings that are invariant across copies of a worm • Accurate signatures can be automatically generated for polymorphic worms • Demonstrated low false positives with real exploits, on real traffic traces
Strengths • A new concept in the area of Intrusion Detection which must be explored further • Well written paper covering almost all possible aspects and providing 3 algorithms
Weakness • Vulnerable to Overtraining Attacks • Long-Tail Attacks
Potential Extensions • Applying Polygraph to a distributed IDS • Adapting to IPv6