280 likes | 452 Views
DroidScope : Seamlessly Reconstructing the OS and Dalvik Semantic Views for Dynamic Android Malware Analysis. Lok Kwong Yan, and Heng Yin Syracuse University Air Force Research Laboratory USENIX 2012. Presentation: 2012-09-11 曾毓傑. Outline. Introduction Background Architecture
E N D
DroidScope: Seamlessly Reconstructing the OS and Dalvik Semantic Views for Dynamic Android Malware Analysis LokKwong Yan, and Heng Yin Syracuse University Air Force Research Laboratory USENIX 2012 Presentation: 2012-09-11 曾毓傑
Outline • Introduction • Background • Architecture • Interface & Plugins • Evaluation • Discussion& Conclusion
Introduction • Malicious applications exist in official and unofficial marketplace with a rate of 0.02% and 0.2% respectively • Virtualization-based analysis approach • Analysis runs underneath the entire virtual machine • Difficult for an attack within VM to disrupt the analysis • Loss the semantic contextual information when the analysis component is moved out of the box • We need to intercept certain kernel events and parse kernel data structure to reconstruct the semantic knowledge
DroidScope • Reconstruct two levels of semantic knowledge • OS-level: to understand the activities of the malware process and its native components • Java-level: comprehend the behaviors in the Java components • Built on top of QEMU emulator • Build tools for analysis • Native instruction tracer • Dalvik instruction tracer • API tracer • Taint tracker
Android System Overview Android System Parent process for all Android processes libdvm.so provide Java-level abstraction Kernel data structure
Architecture • Integrating the changes into the QEMU emulator • Came from Android SDK • Leave Android system unchanged • For different virtual devices can be loaded • Reconstruct OS-level and Java-level views • Monitors how malware’s Java components communicate with Android Java Framework • Monitors how malware’s native components interact with the Linux Kernel • Monitors how malware’s Java components and native components communicate through the JNI interface
Reconstructing OS-level View • Basic Instrumentation • Insert extra instructions during the code translation phase for system status Target Instructions Add additional code for detection Tiny Code Generator(TCG) Native Instructions
Reconstructing OS-level View(Cont.) • For example, context switch in ARM architecture would change the c2_base0 and c2_base1 registers, which stores the page table address • Extract semantic knowledge • System calls • Running processes, threads • Memory maps
Reconstructing OS-level View(Cont.) • System calls • ARM architecture use service zero instruction svc #0 as making system calls, and system call number is in register R7 • Processes and Threads • Read task_struct structure for process information • pid,tgid, pgd, uid, gid, euid, egid, comm, cmdline, thread_info • sys_fork, sys_execve, sys_clone, and sys_prctl system calls trigger the information update • Memory maps • mm_struct • sys_mmap2 triggers the information update
Reconstructing Java-level View • Dalvik Instructions • Knowing which instruction is executing right now • Register R15 points to the currently executing Dalvik instruction
Reconstructing Java-level View (Cont.) • Just-In-Time Compiler • Some hot, heavily used instructions are compiled into native machine code • Those code execution would skip the mterp component Call dvmGetCodeAddr() for address of compiled code Flush JIT cache, return NULL and reset counter to disable JIT function
Reconstructing Java-level View (Cont.) • Dalvik Virtual Machine States • Record Register R4 to R8 for storing DVM states R4: Program Counter R5: Stack Frame Pointer R6: InterpState Structure R7: Instruction Counter R8: mterp Base Address
Reconstructing Java-level View (Cont.) • Java Objects • Obtaining data inside Java objects such as string data
Symbol Information • Native library symbols • Use objdump to retrieve symbol information • Some malwares often stripped of all symbol information • Dalvik or Java symbols • Use dexdump to retrieve symbol information • Data structures of DVM also contains some symbol information • InterpState Structure(Register R6) has a method field points to the Method structure for the currently executing method • Method structure has a name field points to method name
Interface & Plugins • APIs for analysis customization • The instrumentation logic in DroidScope is complex and dynamic • An event based interface to facilitate custom analysis tool developement
Sample Plugin • Setup which program to be analyzed and print all Dalvikopcode information
API Implementation • API tracer • Instrument the invoke* and execute* Dalvikbytecodes to identify and log method invocations • Native instruction tracer • Gather each instruction including the raw instruction, its operands, and their values • Dalvik instruction tracer • Decode instructions into dexdump format, including values and all available symbol information • Taint Tracker • Monitor sensitive information and keep track data propagation
Evaluation • Benchmark checking efficiency and capability • 7 benchmark apps • AnTuTu Benchmark • AnTuTuCaffeineMark • CaffeineMark • CF-Bench • Mobile Processor Benchmark • Benchmark by Softweg • Linpack
Evaluation • Performance • Capability • Analysis of DroidKongFu • Analysis of DroidDream
Discussion • Limited Code Coverage • One drawback of dynamic analysis • By manipulating the return value of function call, we may increase the code coverage • Other Dalvik Analysis Tools • Dalvik/Java Static Analysis: Woodpecker, DroidMoss • Native Static Analysis: IDA, binutils, BAP • Android Dynamic Analysis: TaintDroid, DroidRanger • Linux Kernel Dynamic Analysis: logcat, adb
Conclusion • We presented DroidScope, a fine grained dynamic binary instrumentation tool for Android that rebuilds two levels of semantic information