220 likes | 323 Views
Polygraph: Automatically Generating Signatures for Polymorphic Worms James Newsome, Brad Karp, and Dawn Song Carnegie Mellon University. Presented by Ryan Gates. Overview. Goal Composition of a worm Invariant bytes and Tokens Types of signatures Conjunction Token Subsequence Bayes
E N D
Polygraph: Automatically Generating Signatures for Polymorphic Worms James Newsome, Brad Karp, andDawn SongCarnegie Mellon University Presented by Ryan Gates
Overview • Goal • Composition of a worm • Invariant bytes and Tokens • Types of signatures • Conjunction • Token Subsequence • Bayes • Polygraph Signature Generator • Metrics • Results • Evaluation
Goal • Automate the generation of worm signatures • Specifically polymorphic worms • Prevent polymorphic worms from going undetected • Including perfectly polymorphic instances
Decomposition of a worm Figure 1. Polymorphed ApacheKnacker • Invariant bytes • Wild card bytes • Code bytes
Invariant Bytes • Invariant framing • Reserved key words or well known binary constants that are part of the wire protocol • For example "HTTP" or "GET" • Invariant overwrite values • High order bytes of the overwritten address • For example in BIND-TSIG "\xFF\xBF" • Many invariant substrings are not sufficiently long to not prevent false positives. • The solution is to let each set of invariant bytes be represented by a token
Tokens • Tokens must not be a substring of another token • For example HTTP not TTP • Conjunction Signature • Token Sub-sequence Signature • Bayes Signature • Each token value represents the probability of that token being present in an actual worm flow.
Conjunction Signatures • Every token in the conjunction signature must be found in the payload for there to be a match • All tokens are required to match • Reduce false positives • For example in the Apache-Knacker signature, ‘GET’, ‘HTTP/1.1\r\n’,’:’ are tokens in a conjunction signature
Token Subsequence Signatures • Similar to the conjunction signature, but more restrictive. • All tokens must be present in the correct order to reduce false positives • Typically modeled using Regular Expressions • For example in the BIND-TSIG signature, “GET.*HTTP/1.1\r\n.*…”
Bayes Signature • Set of tokens, and each with a score • If the sum the tokens exceeds a threshold then it is considered a match. • A sample signature would include ‘\x00\x00\xFA’: 1.7574 • Benefits • Less rigid, which helps prevent false positives for common tokens. • Higher quality signatures with a more diverse suspicious pool.
Limitations of Signature Types • Bayes signature is unaffected by noise, until it grows beyond 80%. At this point there will be 100% false negatives. • Flow classifier did a very poor job of classifying the flows. • Conjunction and Token Subsequence cannot handle multiple types of worms • The solution is to use clustering to separate the worms into manageable clusters
Clustering • Clustering helps the conjunction and token subsequence signatures deal with variety • Used to divide the suspicious flows into a number of different pools. • Divide the suspicious pool into several clusters which contain types of flows • Clusters should not be too general • Clusters should not be too specific
Polygraph Signature Generator • The polygraph monitor must have access to the network's packet flow. • An imperfect flow classifier sorts packet flows into either the suspicious or innocuous pool.
Polygraph Signature Generator • It will not distinguish between different worms, but merely suspicious flows and innocuous flows. • Flow classifier is reliable, but imperfect. • The result is noise.
Polygraph Signature Generator • Uses samples to determine appropriate signatures for worms present in the suspicious flow pool. • Resilient to noise in the system
Metrics • Quality • Low percentage of false positives and false negatives • Efficiency in generation • Lower computational cost • Efficiency in matching • Should not inhibit the network traffic • Generate small signature sets • Limit the number of signatures • Robustness • Yield high quality signature even with noise and a variety of worms • Resistance to clever evasion by worms
Results | ApacheKnacker • Table 1. ApacheKnacker signatures. These signatures were successfully generated for innocuous pools containing at least 3 worm samples. • Best performer was Token Subsequence • The ordering used in the Token Subsequence signature helps reduce the number of false positives.
Results | BIND-TSIG • Table 2. BINDTSIG signatures. These signatures were successfully generated for innocuous pools containing at least 3 worm samples. • The best performers were Conjunction and Token Subsequence. • Bayes signature quality is degraded when the tokens are common in other innocuous flows.
Results | Coincidental Pattern • Coincidental Patter attack injects invariant bytes in wildcard bytes to confuse the signature generater.
Contribution • Polygraph helps to automate signature generation • Examined the effects that implementing polymorphism on worms could have on worm signature generation and matching. • Introduced imperfections in the classifying of network flows
Limitations • Worms that lack invariant code • Requires a flow classifier and at least 3 worm samples • If the innocuous pool is too diverse, there will be too many false positives.
Improvements and Future Work • Take advantage of multiple cores. • Incorporate the design of an efficient flow classifier • Determine how feasible it is to inspect network traffic • Determine an algorithm to choose best signature to use
References • J. Newsome, B. Karp, and D. Song. Polygraph: Automatically generating signatures for polymorphic worms. In IEEE Security and Privacy Symposium, 2005.