1 / 31

AccessMiner Using System-Centric Models for Malware Protection

AccessMiner Using System-Centric Models for Malware Protection. Andrea Lanzi, Davide Balzarotti , Christopher Kruegel, Mihai Christodorescu and Engin Kirda ACM CCS 2010 Oct. OUTLINE. Malware Detection System Call Data Collection Program-Centric Models and Detection

dewei
Download Presentation

AccessMiner Using System-Centric Models for Malware Protection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AccessMiner Using System-Centric Models for Malware Protection Andrea Lanzi, Davide Balzarotti , Christopher Kruegel, Mihai Christodorescu and Engin Kirda ACM CCS 2010 Oct.

  2. OUTLINE • Malware Detection • System Call Data Collection • Program-Centric Models and Detection • System-Centric Models and Detection • Discussion and Conclusion

  3. OUTLINE • Malware Detection • System Call Data Collection • Program-Centric Models and Detection • System-Centric Models and Detection • Discussion and Conclusion

  4. Malware Detection • Signature • Static content • Byte strings, instruction sequences =>Code obfuscation • Behavior • Dynamic actions • Sequences of System calls, API functions • A program-centric approach • …good results?

  5. Malware Detection Problem • Test case • Small scale • About 10 benign applications • Limited execution • A few minutes, sandbox • Synthetic inputs • Single machine

  6. Malware Detection Problem(cont.) • Program-centric model • Narrow view on a program • Diversity of system call information • How benign programs interact with their environment? • Their models may specific to a small set of benign applications only

  7. OUTLINE • Malware Detection • System Call Data Collection • Program-Centric Models and Detection • System-Centric Models and Detection • Discussion and Conclusion

  8. System Call Data Collection • A Microsoft Windows kernel module • Collect, anonymize, and upload system call logs • Hooks the System Services Descriptor Table • Mindful of system resource

  9. Kernel collector • 79 different system calls • Related to files, regs, processes and threads, networking, memory. • Same subset in Anubis • <timestamp, program, pid, ppid, system call, args, result>

  10. System Call Data • Sensitive data are replaced • Non-system paths, user-root registry key, IP addresses

  11. System Call Data Collection • Large and diverse set of system call traces • Ten different machines, different users • Serveral weeks • 114.5GB of data • 1.556 billion system call • 362,600 processes • 242 applications

  12. Data set • 2~4 days with 2~12 hours • Production systems, development systems

  13. Data Normalization • Raw data(system call logs) =>Accessed resources and access type • Tracking the access operations • The set of resources open at any given time • OS handles • Until the resource is released(NtClose) • Execution path and file name: • NtOpenFile, NtCreateSection, NtCreateThread

  14. OUTLINE • Malware Detection • System Call Data Collection • Program-Centric Models and Detection • System-Centric Models and Detection • Discussion and Conclusion

  15. Analysis of System Call Data • How diverse is the collected system call data? • Focus on types • Long tradition in the security community • Most models rely upon characteristic patterns • Ignore argument values

  16. Creating n-gram Models • Follow a ”standard” approach 1.Extract n-grams Models for a set of malware programs and a set of benign programs 2.Find all n-gramsappear in malware programs but not in benign programs 3.Hope those n-grams are characteristic for malware programs

  17. Unique n-gram analysis

  18. n-gram Models • 10,838 malware samples from Anubis • Ten experiments(ten machines) • System call traces from 9 machines and 2/3 of the malware set to train an n-grams • Perform detection with remaining system calls traces and 1/3 malwares

  19. Detection Results

  20. Program-Centric Models and Detection • Since system-call sequences invoked by benign applications are diverse • Have difficulties in distingushing normal and malicious behaviors • A large amount of data is needed

  21. OUTLINE • Malware Detection • System Call Data Collection • Program-Centric Models and Detection • System-Centric Models and Detection • Discussion and Conclusion

  22. System-CentricModels and Detection • Generalize how benign programs interact with the operating system • Record the files and the registry entries • Read, write, execute • It is “convergence”

  23. Access Activity Model • A set of labels for operating system resources A label “L” is a set of access tokens • {t0,t1,…,tn} A token “t” is a pair <a,op> • <firefox,write>, <*,execute> a => application op => type of access

  24. Initial Access Activity Model(1) • Use system-call traces of all benign processes • A virtual file system tree Application “a” C:\foo\a.txt (write) Application “b” C:\foo\bar\b.rar (exec) bar <b,exec> foo <a,write> C:

  25. Model Pre-processing(2) • Remove some elements in the tree • Microsoft Windows services • Desktop indexing programs • Anti-virus software • Identify applications that start processes with different names • C:\Windows\system32 => win_core

  26. Model Generalization(3) • Propagated • Container • All children are private(without *) • C:\Program Files • Merged • <x.write> => <x.read>

  27. System-CentricModel Detection • For any op • Find the longest prefix P shared between the path to the resource and the folders in the virtual tree stored by our model • Ten experiments • File system access activity model • About 100 labels • Registry access activity model • About 3000 labels • Full access activity model

  28. Detection Results(Files) • //Looks sobering • Many samples(Malware) don’t work(!) • 10,838 -> 7,847 • Use only write operation • Our own logging component • Software updates

  29. Detection Results(Regs) • HKEY_USER\Software\Microsoft • Need a larger training set

  30. OUTLINE • Malware Detection • System Call Data Collection • Program-Centric Models and Detection • System-Centric Models and Detection • Discussion and Conclusion

  31. Discussion and Conclusion • Full access activity model • 91% detection / 0% false positives • System-centric approach • Policy violations occurred only for few, specific classes of programs • Network limitation • MAC policy • SELinux

More Related