1 / 40

Exploiting diverse observation perspectives to get insights on the malware landscape

Corrado Leita Symantec Research Labs Ulrich Bayer Technical University Vienna Engin Kirda Institute Eurecom @ iSecLab. Exploiting diverse observation perspectives to get insights on the malware landscape. Outline. Introduction Related Work SGNET and EPM Clustering Results

rhonda
Download Presentation

Exploiting diverse observation perspectives to get insights on the malware landscape

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CorradoLeita Symantec Research Labs Ulrich Bayer Technical University Vienna EnginKirda Institute Eurecom @ iSecLab Exploiting diverse observation perspectives to get insights on the malware landscape

  2. Outline • Introduction • Related Work • SGNET and EPM Clustering • Results • Conclusion ADLab Meeting

  3. Introduction ADLab Meeting

  4. Introduction • 30,000 samples per day submitted to VirusTotal website • About the order of millions of samples per month • Malware writers can generate new code by existing code bases or by re-packing the binaries using code obfuscation tools • e.g., Allaple Worms. ADLab Meeting

  5. Introduction • A complete picture on the complexity of the malware landscape is possible only by discerning polymorphic instances from new variants • Get quantitative insights on the interrelations among the different families, and on the extent to which malware writers share code and produce patches to known variants ADLab Meeting

  6. Introduction • SGNET dataset • Combine clustering techniques based on either static or behavioral characteristics of the malware samples ADLab Meeting

  7. Related Work ADLab Meeting

  8. Related Work • Ghorghescu, 2005 • Disassembling • Comparing their basic blocks • Kolter and Maloof, 2006 • Comparing a hex dump of their code segments • Wicherski, 2009, peHash • Polymorphic binaries receive the same hash value • According to the portions of the PE header that are not mutated ADLab Meeting

  9. Related Work • Lee and Mody, 2006 • Based on system call traces • First attempts to cluster malware according to its behavior • Bailey et al., 2007 • The first builds a clustering system that described a sample’s behavior in more abstract terms • O(n^2) ADLab Meeting

  10. Related Work • Anubis • http://anubis.iseclab.org/ • Data tainting • The tracking of sensitive compare operations • Dynamic analysis system for capturing a sample’s behavior ADLab Meeting

  11. SGNET and EPM clustering ADLab Meeting

  12. SGNET and EPM Clustering • SGNET focuses on the collection of detailed information on code injection attacks and on the sources responsible these attacks • Virus Total • Anubis ADLab Meeting

  13. SGNET and EPM Clustering • SGNET • ScriptGen • Learning 0-day behavior • Argos • Program flow hijack detection • Nepenthes • Shellcode emulation • Malware download ADLab Meeting

  14. SGNET and EPM Clustering • Sensor: ScriptGen FSM • Sample Factory: Argos • Shellcode handlers: Nepenthes ADLab Meeting

  15. EPM Clustering ADLab Meeting

  16. EPM Clustering • Epsilon-Gamma-Pi-Mu (EPGM) model • Exploit (ε) • Bogus control data (γ) • Payload (π) • Malware (μ) • Assumption: any randomization performed by attacker has a limited scope • Do not consider γ due to lack of host-based information in the SGNET dataset ADLab Meeting

  17. EPM Clustering • Phase 1: feature definition ADLab Meeting

  18. EPM Clustering • Pi • PUSH-based interaction • PULL-based interaction • Central repository • Mu • PE header characteristics seem to be more difficult to mutate • The change in their value is likely to be associated to a modification or recompilation of existing codebase ADLab Meeting

  19. EPM Clustering • Clearly, all of the features taken into account for the classification could be easily randomized by the malware writer • More complex (costly) polymorphic approaches might appear in the future ADLab Meeting

  20. EPM Clustering • Phase 2: invariant discovery • An invariant value is a value that is not specific to a certain .. • Attack instance • Attacker • Destination • Threshold-based: • At least 10 different attack instances • At least 3 different attackers • At least 3 honeypot IPs ADLab Meeting

  21. EPM Clustering • Phase 3: pattern discovery • T = v1, v2, v3, …, vn ADLab Meeting

  22. EPM Clustering • Phase 4: pattern-based classification • Clustering • Multiple patterns could match the same instance • Each instance is always associated with the most specific pattern matching its feature values • All the instances associated to the same pattern are said to belong to the same EPM cluster ADLab Meeting

  23. EPM Clustering • E-clusters • Exploit • P-clusters • Payload • M-clusters • Malware ADLab Meeting

  24. EPM Clustering • B-Cluster • Anubis • Compare two samples based on their behavioral profile ADLab Meeting

  25. Results ADLab Meeting

  26. Results • Data: Jan 2008 ~ May 2009, collected by SGNET deployment • 6353 malware samples • Only 5165 can be correctly executed in Anubis • Some malwares can not download correctly by Nepenthes ADLab Meeting

  27. Results • 39 E-clusters • 27 P-clusters • 260 M-clusters • 972 B-clusters ADLab Meeting

  28. Results ADLab Meeting

  29. Results • #(exploit/payload combinations) is low • Most malware variants seem to be sharing few distinct exploitation routines for propagation • #(B-clusters) is lower than #(M-clusters) • Some M-clusters are likely to correspond to variations of the same codebase ADLab Meeting

  30. Results • Clustering anomalies • 860 B-clusters are composed of a single malware sample and are associated to a single attack instance in the SGNET dataset • A small number of size-1 B-clusters have a 1-1 association with a static M-cluster • Mostly… ADLab Meeting

  31. Results ADLab Meeting

  32. Results • P-pattern 45: • PUSH-based download • TCP port 9988 ADLab Meeting

  33. Results • M-cluster 13: ADLab Meeting

  34. Results • M-cluster 13 is a polymorphic malware associated to several different B-clusters • MD5 is not an invariant • Allaple mutates its content at each attack instance ADLab Meeting

  35. Results • Each behavioral profile corresponds to an execution time of 4 mins • Bot? Honeypots may help! ADLab Meeting

  36. Results ADLab Meeting

  37. Results • Allaple • Worm exploiting MS04-007 • DoS attacks ADLab Meeting

  38. Results • IRC servers ADLab Meeting

  39. Conclusion ADLab Meeting

  40. Conclusion • Combine different clustering techniques • Improve effectiveness in building intelligence on the threats economy ADLab Meeting

More Related