280 likes | 509 Views
EECS 583 – Class 21 Research Topic 3: Dynamic Taint Analysis. University of Michigan December 5, 2012. Announcements. Exams returned Answer Key can be found on the course website One week to turn in written requests for exam regrades Sign up for project presentation slots
E N D
EECS 583 – Class 21Research Topic 3: Dynamic Taint Analysis University of Michigan December 5, 2012
Announcements • Exams returned • Answer Key can be found on the course website • One week to turn in written requests for exam regrades • Sign up for project presentation slots • Friday on Scott’s office door, Monday in class • Today’s class reading • "Dynamic taint analysis for automatic detection, analysis, and signature generation of exploits on commodity software,"James Newsome and Dawn Song, Proceedings of the Network and Distributed System Security Symposium, Feb. 2005. • Optional background reading: "All You Ever Wanted to Know About Dynamic Taint Analysis and Forward Symbolic Execution (but might have been afraid to ask),"Edward J. Schwartz, Thanassis Avgerinos, David Brumley, Proceedings of the 2010 IEEE Symposium on Security and Privacy, May 2010.
EECS 583 Exam Statistics Exam Score Mean: 70.6 StDev: 12.8 High: 94 Low: 44
The Problem • Unknown Vulnerability Detection ** • Buffer Overflows • Format string vulnerabilties • The goal of TaintCheck • Malware Analysis • What is the software doing with sensitive data? • Data propagation from triggers • Ex. TaintDroid • More Generally - Tracking the flow of data through a program
Buffer Overflow • Example Stack Buffer Overflow
Format String Vulnerability • Attacker controls a format string
Format String Vulnerability • Attacker controls a format string • Overwrite arbitrary memory • Take control of program execution • %n commonly used in conjunction with other format specifiers • Writes number of bytes written thus far to the stack • Dump memory • ex. printf ("%08x.%08x.%08x.%08x.%08x\n"); • Crash the program • ex. printf ("%s%s%s%s%s%s%s%s%s%s%s%s"); • Denial of Service
Dynamic Taint Analysis • Track information flow through a program at runtime • Identify sources of taint – “TaintSeed” • What are you tracking? • Untrusted input • Sensitive data • Taint Policy – “TaintTracker” • Propagation of taint • Identify taint sinks – “TaintAssert” • Taint checking • Special calls • Jump statements • Format strings • Outside network
TaintCheck • Performed on x86 binary • No need for source • Implemented using Valgrind skin • X86 -> Valgrind’s Ucode • Taint instrumentation added • Ucode -> x86 • Sources -> TaintSeed • Taint Policy -> TaintTracker • Sinks -> TaintAssert • Add on “Exploit Analyzer”
tainted untainted Δ t = IsUntrusted(src) Input Var Val x = get_input() y = x + 42 … gotoy get_input(src)↓ t x 7 Input is tainted τ Tainted? Var TaintSeed T x All You Ever Wanted to Know AboutDynamic Taint Analysis
tainted untainted Δ Var Val x = get_input() y = x + 42 … gotoy x 7 y 49 Data derived from user input is tainted τ Tainted? Var TaintTracker t1 = τ[x1] , t2 = τ[x2] BinOp T x x1 + x2 ↓ t1 v t2 T y All You Ever Wanted to Know AboutDynamic Taint Analysis
tainted untainted Δ Var Val x = get_input() y = x + 42 … gotoy x 7 y 49 Policy Violation Detected τ Tainted? Var TaintAssert T x T y Pgoto(ta) = ¬ ta(Must be true to execute) All You Ever Wanted to Know AboutDynamic Taint Analysis
x = get_input() y = … … gotoy Jumping to overwritten return address … strcpy(buffer,argv[1]) ; … return ; All You Ever Wanted to Know AboutDynamic Taint Analysis
TaintCheck - Exploit Analyzer • TaintPolicy - Keep track of Taint propagation info • Backtrace chain of taint structures • Allow generation of attack signatures • Transfer control to sandbox for analysis
TaintCheck - Evaluation • False Negatives • Use control flow to change value without gathering taint • Example: if (x == 0) y=0; else if (x == 1) y=1; • Equivalent to x=y; • Tainted index into a hardcoded table • Policy – value translation is not tainted • Enumerating all sources of taint • False Positives • Vulnerable code? • Sanity Checks not removing taint • Requires fine-tuning • Taint sanitization problem • 0 false positives? • Thoughts?
Memory Load Δ μ Variables Memory Var Addr Val Val x 7 42 7 τμ τ Tainted? Tainted? Addr Var 7 T F x Carnegie Mellon University
Problem: Memory Addresses Var Val Δ x = get_input() y = load( x ) … goto y x 7 Addr Val μ 7 42 Tainted? Addr All values derived from user input are tainted?? τμ 7 F Carnegie Mellon University
Policy 1: Taint depends only on the memory cell v = Δ[x] , t = τμ[v] Load Var Val load(x) ↓ t Δ x = get_input() y = load(x ) … goto y Undertainting Failing to identify tainted values - e.g., missing exploits x 7 Jump target could be any untainted memory cell value Addr Val μ 7 42 Tainted? Addr τμ Taint Propagation 7 F All You Ever Wanted to Know AboutDynamic Taint Analysis
v = Δ[x] , t = τμ[v], ta = τ[x] Policy 2: If either the address or the memory cell is tainted, then the value is tainted Load load(x) ↓ t v ta x = get_input() y = load(jmp_table +x % 2 ) … goto y Overtainting Unaffected values are tainted - e.g., exploits on safe inputs Address expression is tainted jmp_table Policy Violation? Taint Propagation All You Ever Wanted to Know AboutDynamic Taint Analysis
General ChallengeState-of-the-Art is not perfect for all programs Undertainting:Policy may miss taint Overtainting:Policy may wrongly detect taint All You Ever Wanted to Know AboutDynamic Taint Analysis
TaintCheck - Attack Detection • Synthetic Exploits • Buffer overflow -> function pointer • Buffer overflow -> format string • Format string -> info leak • Success! • Actual Exploits • 3 real world examples • Random sample? • Prevalence of protected exploits? slightly more convincing
Performance • Implementation decisions • Use of Valgrind • Better on IO bound tasks • How much performance would you give up?
TaintCheck Uses • Individual Sites • Cost? • Honeypots • “TaintCheck plus OS Randomization” • Rely on OS randomization to cause crashes • Log and reproduce with tainting • Sampling • Users rather than requests • Do I just need more infected computers? • Distributed Sampling
Signatures • Automatic Semantic Analysis • Exploit Analyzer • Generated signature validation
Other considerations • Effectiveness in the wild • Side-channel attacks • Can the attacker determine when someone is using taint tracking? • Speed? • Perform the attack only on native systems • Similar to existing virtual machine and honeypot problems • Circumventing the taint system • Clean your data before exploit • Depends on policy • Example: Hex to ASCII translation • Cause false positives • Raises cost of running the system • Admins may turn it off • Difficult to evaluate without motivated attackers