1 / 25

Binary Analysis for Botnet Reverse Engineering & Defense

Binary Analysis for Botnet Reverse Engineering & Defense. Dawn Song UC Berkeley. Binary Analysis Is Important for Botnet Defense. Botnet programs: no source code, only binary Botnet defense needs internal understanding of botnet programs C&C reverse engineering

sophie
Download Presentation

Binary Analysis for Botnet Reverse Engineering & Defense

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Binary Analysis for Botnet Reverse Engineering & Defense Dawn Song UC Berkeley

  2. Binary Analysis Is Important for Botnet Defense • Botnet programs: no source code, only binary • Botnet defense needs internal understanding of botnet programs • C&C reverse engineering • Different possible commands, encryption/decryption • Botnet traffic rewriting • Botnet infiltration • Botnet vulnerability discovery

  3. BitBlaze Binary Analysis Infrastructure: Architecture • The first infrastructure: • Novel fusion of static, dynamic, formal analysis methods • Loop extended symbolic execution • Grammar-aware symbolic execution • Whole system analysis (including OS kernel) • Analyzing packed/encrypted/obfuscated code Vine: Static Analysis Component TEMU: Dynamic Analysis Component Rudder: Mixed Execution Component BitBlazeBinary Analysis Infrastructure

  4. BitBlaze: Security Solutions via Program Binary Analysis • Unified platform to accurately analyze security properties of binaries • Security evaluation & audit of third-party code • Defense against morphing threats • Faster & deeper analysis of malware Detecting Vulnerabilities Generating Filters Dissecting Malware BitBlazeBinary Analysis Infrastructure

  5. The BitBlazeApproach & Research Foci • Semantics based, focus on root cause:Automatically extracting security-related properties from binary code for effective vulnerability detection & defense • Build a unified binary analysis platform for security • Identify & cater common needs of different security applications • Leverage recent advances in program analysis, formal methods, binary instrumentation/analysis techniques for new capabilities • Solve real-world security problems via binary analysis • Extracting security related models for vulnerability detection • Generating vulnerability signatures to filter out exploits • Dissecting malware for real-time diagnosis & offense: e.g., botnet infiltration • More than a dozen security applications & publications

  6. Plans • Building on BitBlaze to develop new techniques • Automatic Reverse Engineering of C&C protocols of botnets • Automatic rewriting of botnet traffic to facilitate botnet infiltration • Vulnerability discovery of botnet

  7. Preliminary Work • Dispatcher: Enabling Active Botnet Infiltration using Automatic Protocol Reverse-Engineering • Binary code extraction and interface identification for botnet traffic rewriting • Botnet analysis for vulnerability discovery

  8. Dispatcher: Enabling Active Botnet Infiltration using Automatic Protocol Reverse-Engineering Juan Caballero Pongsin Poosankam Christian Kreibich Dawn Song

  9. Automatic Protocol Reverse-Engineering • Process of extracting the application-level protocol used by a program, without the specification • Automatic process • Many undocumented protocols (C&C, Skype, Yahoo) • Encompasses extracting: • the Protocol Grammar • the Protocol State Machine • Message format extraction is prerequisite

  10. Challenges for Active Botnet Infiltration • Goal: Rewrite C&C messages on either dialog side • Access to one side of dialog only • Understand both sides of C&C protocol • Message structure • Field semantics • Handle encryption/obfuscation

  11. Technical Contributions • Buffer deconstruction, a technique to extract the format of sent messages • Earlier work only handles received messages • Field semantics inference techniques, for messages sent and received • Designing and developing Dispatcher • Extending a technique to handle encryption • Rewriting a botnetdialog using information extracted by Dispatcher

  12. Message Format Extraction • Extract format of a single message • Required by Grammar and State Machine extraction GET / HTTP/1.1 HTTP/1.1 200 OK [Polyglot] [Dispatcher]

  13. Message Field Tree HTTP/1.1 200 OK\r\n\r\n Field Range: [3:3] Field Boundary: Fixed Field Semantics: Delimiter Field Keywords: <none> Target: Version MSG [0:18] Delimiter StatusLine [17:18] [0:16] Reason Version Delimiter Status-Code Delimiter Delimiter [0:7] [8:8] [9:11] [13:14] [15:16] [12:12] Message format extraction has 2 steps: • Extract tree structure • Extract field attributes

  14. Sent vs. Received • Both protocol directions from single binary • Different problems • Taint information harder to leverage • Focus on how message is constructed, not processed • Different techniques needed: • Tree structure  Buffer Deconstruction • Field attributes  New heuristics

  15. Outline Problem Techniques Buffer Deconstruction Field Semantics Inference Handling encryption Evaluation Introduction

  16. Buffer Deconstruction • Intuition • Programs keep fields in separate memory buffers • Combine those buffers to construct sent message • Output buffer • Holds message when “send” function invoked • Or holds unencrypted message before encryption • Recursive process • Decompose a buffer into buffers used to fill it • Starts with output buffer • Stops when there’s nothing to recurse

  17. Buffer Deconstruction HTTP/1.1 200 OK\r\n\r\n • Message field tree = inverse of output buffer structure • Output is structure of message field tree • No field attributes, except range MSG [0:18] C(8) D(1) E(3) F(1) G(2) H(2) Status Line Delimiter [0:16] [17:18] A(17) B(2) [0:7] [8:8] [9:11] [12:12] [13:14] [15:16] Output Buffer (19) Version Delimiter StatusCode Reason Delimiter Delimiter

  18. Field Attributes Inference • Attributes capture extra information • E.g., inter-field relationships • Techniques identify • Keywords • Length fields • Delimiters • Variable-length field • Arrays

  19. Field Semantics • A field attribute in the message field tree • Captures the type of data in the field • Programs contain much semantic info  leverage it! • Semantics in well-defined functions and instructions • Prototype • Similar to type inference • Differs for received and sent messages

  20. Field Semantic Inference File path GET /index.html HTTP/1.1 HTTP/1.1 200 OK Content-Length: 25 <html>Hello world!</html> File length stat(“index.html”, &file_info); OUT IN OUT int stat(const char*path, struct stat *buf); struct stat { … off_tst_size; /* total size in bytes */ … }

  21. Detecting Encoding Functions • Encoding functions = (de)compression, (de)(en)cryption, (de)obfuscation… • High ratio of arithmetic & bitwise instructions • Use read/write set to identify buffers • Work-in-progress on extracting and reusing encoding functions

  22. MegaD C&C protocol type MegaD_Message = record { msg_len : uint16; encrypted_payload: bytestring &length = 8*msg_len; } &byteorder = bigendian; type encrypted_payload= record { version : uint16; mtype : uint16; data : MegaD_data (mtype); }; • type MegaD_data (msg_type: uint16) = • case msg_type of { • 0x00 -> m00 : msg_0; • […] • default -> unknown : bytestring &restofdata; • }; • C&C on tcp/443 using proprietary encryption • Use Dispatcher’s output to generate grammar • 15 different messages seen (7 recv, 8 sent) • 11 field semantics

  23. MegaD Dialog C&C Server SMTP Test Server TestSMTP Cmd? EHLO Failed

  24. MegaD Rewriting C&C Server SMTP Test Server Template Server TestSMTP EHLO Cmd? Failed Get Template Grammar Success Template?

  25. Summary • Buffer deconstruction, a technique to extract the format of sent messages • Field semantics inference techniques, for messages sent and received • Designed and developed Dispatcher • Extended technique to handle encryption • Rewrote MegaD dialog using information extracted by Dispatcher

More Related