470 likes | 624 Views
Mining Specifications of Malicious Behavior. Mihai Christodorescu (work done at University of Wisconsin) Somesh Jha University of Wisconsin Christopher Kruegel Technical University Vienna. IBM Research. Wide Spectrum of Detectors. Static detectors:. Dynamic/hybrid detectors, host IDS:.
E N D
Mining Specifications of Malicious Behavior Mihai Christodorescu(work done at University of Wisconsin) Somesh Jha University of Wisconsin Christopher Kruegel Technical University Vienna IBM Research
Wide Spectrum of Detectors • Static detectors: • Dynamic/hybrid detectors, host IDS: …… Behavior-based spyware detection [Kirda et al. 2006] Semantics-aware malware detection [Christodorescu et al. 2005] Model checking-based malware detection [Kinder et al. 2005] Shadow Honeypots [Anagnostakis et al. 2005] Mihai Christodorescu
Misuse Detection Distinct techniques, fundamentally similar… They all require high-quality specifications of malicious behavior. …… Mihai Christodorescu
Current Specification Generation • Specifications are manually developed by experts. Two issues: • Time consuming • Error prone Spec ? Mihai Christodorescu
Problem 1: Specification Delay Time from appearance of new malware to availability of specification: • Manual analysis of the binary • Testing of the specification Anti-virus industry: 4-18 hours to generate a new specification. window of vulnerability Mihai Christodorescu
Problem 2: Spec. Imprecision Too general = false positives Angry users Infected machines Too specific = false negatives Mihai Christodorescu
Our Solution MINIMAL: a technique for mining malicious specifications • Automatic • Flexible specification language • Fast • Performs well (compared to a human expert) Mihai Christodorescu
What’s In a Specification? Requirements for obfuscation resilience: • Describe only relevant operations • Capture dependencies where present • Preserve independence of operations Mihai Christodorescu
Specifying Malicious Operations • We chose system calls • Compatible with specifications for behavior-based detectors • Define interface between trusted OS and untrusted programs • Mining algorithm is not restricted to the system-call interface. Mihai Christodorescu
Specifying Malicious Constraints • Program operations are insufficient to distinguish malicious from benign. • We need to capture relations between operations: F=open(“file”) ; read(F,buf) ; send(S,buf) Constraints = logical formulas over system-call arguments Mihai Christodorescu
A Sample Specification (Malspec) • Mass-mailing malware: X:=socket() S:=process_name() connect(X2) Z:=open(S2) send(X3,“EHLO”) Y:=read(Z2) send(X4,“DATA”) send(X5,T) Mihai Christodorescu
X:=socket() S:=process_name() connect(X2) Z:=open(S2) send(X3,“EHLO”) Y:=read(Z2) send(X4,“DATA”) send(X5,T) The Specification Mining Problem Known malware Specification of malicious behavior MINIMAL Known benign programs Mihai Christodorescu
O O A A B O A W R R R M M A A O W R R C C R C C A C Q A O A O O O Q C Q M C P F M C O O O Q F Step 1 Compute dependence graphs O Step 2 Compute graph difference M C P The Basic Mining Operation Known malware Malware dependence graph Minimalmalspec Benign dependence graph Known benign program Mihai Christodorescu
O Q P O O M C O O O O O O Q Q M M C C P P Multi-Program Mining Maximal union of malspecs: vs. Mihai Christodorescu
Step 1 System-Call Dependence Graph • We use a dynamic analysis to construct the dependence graph • Static analysis too imprecise on binary code • Steps: • Collect system-call trace • Infer dependencies between system calls • Construct (an underapproximation of the) dependence graph Mihai Christodorescu
Step 1 Discovering Dependences NtOpenKey( 372, 0x20019, {24, 356, "ActiveComputerName", 0x40, 0, 0} ) NtQueryValueKey( 372, "ComputerName", Full, { TitleIdx=0, Type=1, Name="ComputerName", Data="Z...“ }, 108, 76 ) Def-Use Dependences Substring Dependences NtClose( 372 ) Mihai Christodorescu
Step 1 Discovering Local Constraints • Access to well-defined resources: • Windows registry • Access to self • System files/directories NtCreateFile ( …, { …, "I-Worm.Mydoom.l.exe", … }, … ) Mihai Christodorescu
Step 2 Graph Differencing Problem: Find the smallest subgraph of malicious operations that does not appear in any benign graph. Solution: Minimal Contrast Subgraph [Ting, Bailey SDM 2006] Mihai Christodorescu
Step 2 Minimal Contrast Subgraphs • Idea: Minimal contrast subgraphs and maximal common edge sets are duals. • Vertex and edge labels (i.e., system calls and constraint formulas) help the search. Mihai Christodorescu
Step 2 O O A A B O A W R R R M M A A O W R R C C R C C A C Q A O A O O O Q C Q M C P F M C F O Mining Contrast Subgraphs • Size of graphs: 100K-1.5M nodes, similar for edges • Worst-case complexity: O(N!) Malware dependence graph Benign dependence graph Mihai Christodorescu
Heuristics Reduce Problem Size • Normalize dependence graph • Replace system-call sequences with shorter equivalents • Eliminate disconnected subgraphs • Eliminate trivial subgraphs [see paper for details] Mihai Christodorescu
Evaluating MINIMAL • Goals: • Compare MINIMAL malspecs with those from human expert • Use mined malspecs with behavior-based detector Mihai Christodorescu
Experimental Setup • Trace collection in Windows 2000: • Malware samples run with no user input (cf. expected execution model) • Benign samples run with normal user input • Execution for 1 or 2 minutes • 16 malware samples: • Netsky, MyDoom, Beagle • 6 benign programs: • Firefox, Thunderbird, installers Mihai Christodorescu
MINIMAL vs. Human Expert Average success rate: 77.26% Average mining time: 8 minutes Behavioral features as given by Symantec website. Mihai Christodorescu
Mined Malspecs for Netsky.A Mihai Christodorescu
Future Work Limitations of MINIMAL • Sensitive to test environment • Malicious behavior might not be observed during tracing. • Underapproximation of dependence graph • Complex constraints are not discovered. • Sensitive to test-set selection • Not all differences are malicious behaviors. Mihai Christodorescu
Questions? Mining Specifications of Malicious Behavior Mihai Christodorescu mihai@cs.wisc.edu
Mining Malspecs from Malware Dynamic differential analysis Malware system-call trace Benign program system-call trace Malspec: Mihai Christodorescu
Specifying Malicious Behavior • Example • Now manual • NEED: automatic Mihai Christodorescu
A C Q A O O O Q C F O Q O M C P F O M C Mihai Christodorescu
Only Relevant Operations • System calls Approach: Collect system-call traces. Mihai Christodorescu
Dependences Mihai Christodorescu
Independence of Operations Mihai Christodorescu
Good Specifications • One canwrite specifications satisfying the requirements. • The algorithm to generate specifications must write specifications satisfying the requirements. Mihai Christodorescu
Misuse-Detection Fundamentals Malware detector • Database of specifications (“signatures”) defines what is malicious. Unknown executable Protected computer Protected computer Protected computer Protected computer Protected computer Sig DB Mihai Christodorescu
Failures of Current Detectors • Byte signatures are not rich enough to capture malicious behavior. Hackers evade detection through obfuscation. All mass-mailing worms Overapproximating byte signature Underapproximating byte signature Mihai Christodorescu
Behavior-Based Detectors • New detection techniques use higher level specifications Semantics-Aware Malware Detection (2005) Model Checking-based Malware Detection (2005) Behavior-based Spyware Detection (2006) These still depend on the quality of the specifications! Mihai Christodorescu
A C Q A O O O Q C F O Q O M C P F O M C System-call trace NtOpenKey( 372, 0x20019, {24, 356, "ActiveComputerName", 0x40, 0, 0} ) MINIMAL: Malware-Mining Algorithm Known malware Monitor system calls during execution Step 1 Mihai Christodorescu
A C Q A O O O Q C F O Q O M C P F O M C Data dependence graph System-call trace A C Q A O A O O O Q C Q M C P F M C F O MINIMAL: Malware-Mining Algorithm Discover dependences between syscalls Step 2 Mihai Christodorescu
O O A A B O A W R R R M M A A O W R R C C R C C A C Q A O A O O O Q C Q M C P F M C O O O Q F O M C P MINIMAL: Malware-Mining Algorithm Malware dependence graph Benign dependence graph Find malicious–benign graph difference Step 3 Malspec Mihai Christodorescu
Step 2 Dependence Graph Normalization • Many sequences of system calls are equivalent: Aggregationreplaces such sequences with a canonical sequence. (see paper for details) read( …, 1 ) read( …, 1 ) read( …, 1 ) read( …, 1 ) read( …, 1 ) ≡ read( …, 5 ) Mihai Christodorescu
MINIMAL Specs in Detection • Missed malspecs: • Due to unrecovered dependences (e.g., ZIP compression of data) • Due to incompleteness of trace data (e.g., certain behaviors did not execute) • Using mined malspecs in semantics-aware malware detection: Netsky.A malspec Netsky.D, E, F, … Mihai Christodorescu
Conclusions • The mining malicious of behavior can be automated • Mined malspecs compare well with those from human experts • Mining time significantly reduced over manual specification Mihai Christodorescu