340 likes | 453 Views
Temporal Search: Detecting Hidden Malware Timebombs with Virtual Machines. Jedidiah R. Crandall Related paper accepted to ASPLOS-XII (pending shepherd approval) Joint work with Gary Wassermann, Daniela A. S. de Oliveira, Zhendong Su, S. Felix Wu, and Frederic T. Chong
E N D
Temporal Search: Detecting Hidden Malware Timebombs with Virtual Machines Jedidiah R. Crandall Related paper accepted to ASPLOS-XII (pending shepherd approval) Joint work with Gary Wassermann, Daniela A. S. de Oliveira, Zhendong Su, S. Felix Wu, and Frederic T. Chong University of California, Davis and University of California, Santa Barbara
Conclusions • Automated, behavior-based analysis not only faster, but potentially more accurate • Malware time-dependent behavior does not follow a linear timetable • Automated temporal search is possible but more work is needed
Automated, Behavior-Based Analysis Faster and potentially more accurate Automation Traditional malware analysis techniques Appearance-based Environment Behavior-based
Why Behavior-Based Analysis? (1) “An ant, viewed as a behaving system, is quite simple. The apparent complexity of its behavior over time is largely a reflection of the complexity of the environment in which it finds itself.” –Herbert Simon
Why Behavior-Based Analysis? (2) • Malware obfuscation • Packing, polymorphism, metamorphism, cryptovirology • Malware speed • Drawback: Blackbox (complexity you put into it = complexity you get out)
Other Behavior-Based Work • “Siren: Detecting Evasive Malware”, Borders et al. Oakland 2006 • “Behavior-based Spyware Detection”, Kirda et al. USENIX Security 2006 • Probably many I’m missing and many more to come…
Automated Temporal Search • Speedy analysis makes aversion possible (example: Sober.X) • Complexity of the environment • Kernel rootkits • Drawback: Automated techniques can be detected, averted, misled
Botnets • Malware obfuscation • Cryptocounters • Malware speed • Attack payload may be loaded minutes before it is executed • Difficulty of Analysis • Quirks in Gregorian calendar, file creation time dependency, time zones, various synchronization protocols, etc.
Outline • Time Tutorial • Finding Timers • Formalizing Temporal Search • Lessons from Code Red, Kama Sutra, Sober.X, MyParty
Outline • Time Tutorial • Finding Timers • Formalizing Temporal Search • Lessons from Code Red, Kama Sutra, Sober.X, MyParty
Time Hardware • PIT running at 1.193182 MHz • RAM refresh • PC speaker tone • Programmable interrupt • Others: CMOS real time clock, local APIC timers, ACPI timers, the Pentium CPU’s Time Stamp Counter, or the High Precision Event Timer
OS Timekeeping • Linux kernel 2.4, PIT @ 100 Hz • Seconds since 1970 • Linux kernel 2.6, PIT @ 1000 Hz • Ditto • Windows, PIT @ 64 Hz to 1000 Hz • Hectonanoseconds since 1600 • Only epoch that matters is the CMOS on boot (or NTP, time protocol, …) • Shouldn’t make assumptions about the integrity of the OS kernel
Outline • Time Tutorial • Finding Timers • Formalizing Temporal Search • Lessons from Code Red, Kama Sutra, Sober.X, MyParty
Past Work • “On Deriving Unknown Vulnerabilities…” Crandall et al. CCS 2005 • Full-system symbolic execution on every machine instruction
Finding Timers: Basic Idea • Run with PIT at different rates of perceived time • Correlation between PIT interrupts and updates of a physical memory location • Symbolic execution to discover a series • Predicate inversion to discover dependent timers or behaviors
Symbolic Execution (1) • Linux “jiffies”: • Linux “xtime.tv_usec”:
Symbolic Execution (2) • Not a timer (“xtime_lock”):
Predicate Inversion (1) • Predicate on “xtime.tv_usec”:
Predicate Inversion (2) • Discovering “xtime.tv_sec”:
Outline • Time Tutorial • Finding Timers • Formalizing Temporal Search • Lessons from Code Red, Kama Sutra, Sober.X, MyParty
Subversion • Detect VM • Don’t use Presburger arithmetic • Use homegrown version of NTP • Create a lot of noise • …
Outline • Time Tutorial • Finding Timers • Formalizing Temporal Search • Lessons from Code Red, Kama Sutra, Sober.X, MyParty
VM-based Analysis • Working: • Code Red • MyParty.A • Not fully working (yet): • Sober.X • Kama Sutra • Our analysis could be wrong (we still need to clarify/independently confirm some of this)
Setup ARP cache poisoning, DNS spoofing, etc. using scapy Windows XP @ 192.168.33.2 Host @ 192.168.33.1 w/ DNS, NTP, HTTP, TIME, etc. Bochs emulator w/ DACODA tuntap interface
Code Red (eEye analysis) • “Each worm thread checks the infected computer's system time.” • “If the date is past the 20th of the month (GMT), the thread will stop searching for systems to infect and will instead attack www.whitehouse.gov.” … • “If the date is between the 1st and the 19th of the month, this worm thread will not attack www.whitehouse.gov and will continue to try to find and infect new web servers.”
Code Red CAIDA Analysis • “The worm is programmed to stop infecting other machines on the 20th of every month. In its next attack phase, the worm launches a Denial-of-Service attack against www1.whitehouse.gov from the 20th-28th of each month.” • Only re-infection can turn a spreading host into a DoS host.
A Thought • We need formal ways to specify malware behaviors for dissemination
Kama Sutra • “…programmed to overwrite files on Friday February 3, and the third day of every month thereafter.” (www.theregister.co.uk) • “But computer security groups … said few users lost data because of the bug. Experts speculated that the publicity prior to [the] trigger date may have prompted people to clean up machines and prepare defences [sic].” (BBC news)
Sober.X • Symantec Security Reponse: “Checks the network connection of the compromised computer, and the current date, by connecting to one of the following NTP servers on TCP port 37” • Lists 40 servers
Myparty • Mcaffee.com: “This virus only attempts to massmail itself on January 25, 26, 27, 28 or 29, 2002.” • Symantec Security Response: “This worm is capable of spreading itself only between January 25, 2002, and January 29, 2002” • Our analysis: equality check with file creation time • ????????
A Thought • Could we use weaknesses in a botnet’s time-dependent behavior to take it down (i.e. Sober.X, Code Red, Kama Sutra)? • Or any sort of behavior, for that matter.
Conclusions • Automated, behavior-based analysis not only faster, but potentially more accurate • Malware time-dependent behavior does not follow a linear timetable • Automated temporal search is possible but more work is needed
Future (or ongoing) work • Full-system deterministic replay • (ReVirt only works for UMLinux) • Replay-based entropy control • Malware’s use of entropy • Alternative to taint marking (challenges of tainting spelled out in Fenton’s 1973 Ph.D. thesis)