1 / 21

Automatic Protocol Format Reverse Engineering through Context-Aware Monitored Execution

The 15 th Annual Network and Distributed System Security Symposium. Automatic Protocol Format Reverse Engineering through Context-Aware Monitored Execution Zhiqiang Lin 1 Xuxian Jiang 2 , Dongyan Xu 1 , Xiangyu Zhang 1. 1 Purdue University 2 George Mason University February 12 th , 2007.

vail
Download Presentation

Automatic Protocol Format Reverse Engineering through Context-Aware Monitored Execution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The 15th Annual Network and Distributed System Security Symposium Automatic Protocol Format Reverse Engineering through Context-Aware Monitored Execution Zhiqiang Lin 1 Xuxian Jiang 2, Dongyan Xu 1, Xiangyu Zhang 1 1Purdue University 2George Mason University February 12th, 2007

  2. Motivation • Protocol reverse engineering • A process to recover protocol specifications • E.g., fields and their relationships • Applications: • Network-based Intrusion detection – DoS attacks, Port Scans, Computer Systems • Network management – correctly recognize and monitor traffic • Fuzz Testing – s/w testing technique • …

  3. Challenges • Multiple fields in a single message • Non-static size of fields • Complex relationships among protocol fields 0x0040: cd46 4745 5420 2f6e 6577 732e 6874 6d6c 0x0050: 2048 5454 502f 312e 300d 0a55 7365 722d 0x0060: 4167 656e 743a 2057 6765 742f 312e 3130 0x0070: 2e32 2028 5265 6420 4861 7420 6d6f 6469 0x0080: 6669 6564 290d 0a41 6363 6570 743a 202a 0x0090: 2f2a 0d0a 486f 7374 3a20 3132 392e 3137 0x00a0: 342e 3838 2e37 310d 0a43 6f6e 6e65 6374 0x00b0: 696f 6e3a 204b 6565 702d 416c 6976 650d. 0x00c0: 0a0d 0a Hierarchical Parallel Sequential

  4. Challenges HTTP-Request = Request-Line (( general-header | request-header | entity-header ) CRLF)* CRLF [ message-body ] Request-Line = Method SP Request-URI SP HTTP-Version CRLF Parallel Sequential Hierarchical A BNF Specification of HTTP Request (RFC2616) **Hierarchical relation: A field can be further divided into multiple sub-fields **Sequential relation : Captures the ordering between adjacent fields in a protocol. **Parallel relation: The positions of two or more fields are exchangeable in the protocol specification. Note: SP and CRLF are separators

  5. Related Work • Network Trace • Protocol Informatics • Discoverer [W. Cui et. al. Security’07] • Binary Analysis • Polyglot [J. Caballero et. al. CCS’07] • Automatic Network Protocol Analysis [G. Wondracek et. al. NDSS’08]

  6. Observation 119 intread_header(intsid) { ... 129 sgets(line, sizeof(line)-1, conn[sid].socket); … 137 if (sscanf(line, "%[^ ] %[^ ] %[^ ]", conn[sid].dat->in_RequestMethod, conn[sid].dat->in_RequestURI, conn[sid].dat->in_Protocol)!=3) ... 147 while (strlen(line)>0) { ... 154 if (strncasecmp(line, "Cookie: ", 8)==0) 155 strncpy(conn[sid].dat->in_Cookie, (char *)&line+8, sizeof(conn[sid].dat->in_Cookie)-1); 156 if (strncasecmp(line, "Host: ", 6)==0) 157 strncpy(conn[sid].dat->in_Host, (char *)&line+6, sizeof(conn[sid].dat->in_Host)-1); … 160 if (strncasecmp(line, "User-Agent: ", 12)==0) 161 strncpy(conn[sid].dat->in_UserAgent, (char *)&line+12, sizeof(conn[sid].dat->in_UserAgent)-1); 162 } ... 187 } REQUEST LINE field divided into METHOD, REQUEST URI and HTTP VERSION • Cookie , host, user-agent are  Parallel fields Code snippet in http.c (null-httpd-0.5.0)

  7. AutoFormat -- Basic Idea Protocol Fields Execution Context G E T / n e w s … Context One Field Another Field

  8. System Overview GET /news.html Context-aware Execution Monitor Log call stack EIP input 0 'G' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr 1 'E' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr 2 'T' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr… 24 ‘\n’ main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr… 0 'G' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x1F7F3 ->0xF5A8->ap_read_request->ap_getword_white

  9. Protocol Field Identifier • Analyze log file • Step 1: build protocol field tree from the logged data. • Step 2: refine the tree using three heuristics • Step 3: output the result

  10. Example: Apache log data 0 'G' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr 1 'E' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr 2 'T' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr… 24 ‘\n’ main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x4BA56A2 ->0xF5A8->ap_read_request->ap_rgetline_core->ap_get_brigade->0x2D2CE->ap_get_brigade->0x2D667 ->apr_brigade_split_line->memchr … 24 '\n' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x26187 ->0xF5A8->ap_read_request->ap_rgetline_core 23 '\r‘ main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x26322 ->0xF5A8->ap_read_request->ap_rgetline_core 0 'G' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x1F7F3 ->0xF5A8->ap_read_request->ap_getword_white 1 'E' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x1F7F3 ->0xF5A8->ap_read_request->ap_getword_white 2 'T' main->ap_mpm_run->0x15C57->0x15B38->0x15941->ap_process_connection->ap_run_process_connection 0x1F7F3 ->0xF5A8->ap_read_request->ap_getword_white … GET /news.html HTTP/1.0\r\n \n \r GET

  11. Step 1 -- Building Protocol Field Tree root Parent node contains offsets of its children Contains offsets of all input data User−Agent: Wget/1.10.2 (Red Hat modified)\r\nAccept: */*\r\n…. HTTP/1.0 GET /news.html HTTP/1.0\r\n GET /news.html GET

  12. Step 1: Building Protocol Field Tree Redundancy in fields Overly fine grained fields Missing SPACE before “ /n” GET /news.html HTTP/1.0\r\n GET /news.html HTTP/1.0\r\n GET /news.html HTTP/1.0 \r \n GET H TTP/1.0 / news.html / news.html H TTP/1.0 / news.html H TTP/1.0

  13. Step 2: Refinement (Tokenization) GET /news.html HTTP/1.0\r\n GET /news.html HTTP/1.0\r\n GET /news.html HTTP/1.0 \r \n GET H TTP/1.0 / news.html GET /news.html HTTP/1.0\r\n / news.html H TTP/1.0 GET /news.html HTTP/1.0\r\n / news.html H TTP/1.0 GET /news.html HTTP/1.0 \r\n GET HTTP/1.0 /news.html Merge 2 child nodes if their content can form one token –based on TEXT-BASED PROTOCOLS /news.html HTTP/1.0 /news.html HTTP/1.0

  14. Step 2: Refinement (Redundant Node Deletion) GET /news.html HTTP/1.0\r\n GET /news.html HTTP/1.0\r\n GET /news.html HTTP/1.0\r\n GET /news.html HTTP/1.0 \r\n GET /news.html HTTP/1.0 \r\n GET /news.html GET HTTP/1.0 /news.html An internal node is redundant if it has only 1 child /news.html HTTP/1.0 /news.html HTTP/1.0

  15. Step 2: Refinement (Node Insertion) GET /news.html HTTP/1.0\r\n GET /news.html HTTP/1.0 \r\n Insert a new child node to parent IF the offsets of children do not match the parent. GET /news.html GET /news.html HTTP/1.0\r\n GET /news.html HTTP/1.0 \r\n GET /news.html

  16. Step 3: Output the Result 4 Parallel & Sequential GET /news.html HTTP/1.0\r\n Hierarchical 3 2 GET /news.html HTTP/1.0 \r\n 1 HTTP/1.0 \r\n GET /news.html GET /news.html Parallel: *Collect execution history of each node * For a parent- if child nodes share similar history –MARK it Sequential: *Pre-order traversal of tree -lists the leaf nodes -parent of multiple parallel nodes

  17. Evaluation • Implemented on top of Valgrind-3.2.3 • Also applies to QEMU, PIN • Benchmark • 30 messages with six known protocols and one unknown protocol. • Evaluation Metric • Re: Ratio of exact match |(A ∩ W)|/|W| • A: set of fields identified by AutoFormat • W: set of fields identified by Wireshark For context aware execution monitor

  18. Re(F): Re for finest-grained fields • Re(H): Re for hierarchical fields • Re(P): Re for parallel fields Overall Result 100% match with Wireshark * (-) => |P| for Wireshark=0 Averages: Re(F) = 88.5% Re(H) = 98.0% Re(P) = 100.0% Re=93.4%

  19. Discussion • Dynamic Trace Dependency -AutoFormat does not detect message formats not present in the execution trace • Byte granularity – AutoFormat does not detect protocol fields at bit level • Protocol State Machine – AutoFormat does not correlate multiple messages of same protocol session. • Obfuscated binaries- AutoFormat does not handle these type of inputs.

  20. Conclusion • Paper also includes the Slapper Worm Messages as a part of second experimental results set. • AutoFormat • A tool for automatic protocol format extraction. • Key insight • A protocol implementation is programmed to recognize the protocol format and usually contains protocol field-specific execution context, and we can actually leverage such context to infer the hierarchical structure of protocol fields, and even get their BNF structures.

  21. Q & A Thank you For more information: {zlin, dxu, xyzhang}@cs.purdue.edu xjiang@gmu.edu

More Related