1 / 31

Bin-Carver: Automatic recovery of binary executable files

Bin-Carver: Automatic recovery of binary executable files. Presented by: Ryan O’Donnell. What is file carving?. The process of reassembling files from disk fragments in the absence of metadata. When would we need file carving?. Accidental user deletions Intentional user deletions Malware.

gigi
Download Presentation

Bin-Carver: Automatic recovery of binary executable files

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bin-Carver: Automatic recovery of binary executable files Presented by: Ryan O’Donnell

  2. What is file carving? The process of reassembling files from disk fragments in the absence of metadata.

  3. When would we need file carving? • Accidental user deletions • Intentional user deletions • Malware

  4. Traditional file carving method Using .jpeg file as an example • Find header (FF D8) • Know footer pair (FF D9) • Find all contiguous data

  5. Problems with traditional method • fragmentation • doesn’t work without exact header and footer information • doesn’t work with all file types • focuses on documents of forensic interest • binary executables not included

  6. Bin-carver overview -1 • recover Executable Linkable Format (ELF) file e from disk image D • D strictly consists of file content blocks • Assume D is an EXT2 file system, block size 4k

  7. Bin-carver overview -2 • file content has not been overwritten • file content is stored in increasing order • ELF file e has n blocks in the disk We want to link these n blocks together utilizing internal graph node logic.

  8. Bin-carver overview -3

  9. Challenges • Filename recovery is typically not possible without the file system metadata • Fragmentation

  10. Components • ELF-header scanner • scan all possible ELF headers hi using ELF-file magic value • block node linker • scans disk image, identifies nodes and links them • conflict-node resolver • removes conflict nodes and outputs ELF-file ei

  11. System Overview Diagram

  12. Scanner -1 Headers hold a “road map” describing ELF file organization. Searching for the magic number sequence 7f 45 4c 46 allows us to locate headers, telling us how to traverse all other sections .

  13. Scanner -2 Each header is 52k and contains: • program header table (PHT) • array of program headers • section header table (SHT) • array of section headers

  14. Searching SHT • usually located at end of ELF file • can serve as a footer because of this • since A(footer) > A(hi) we can start our search at the 0x14 disk block • gives us a multitude of other constraints that allow us to calculate the location of the footer

  15. Searching PHT • locates segments that create memory image of the program • each program header is 32 bytes • usually starts right after ELF headers • same 4k block

  16. Searching PHT • from program header, infer vase virtual address of image file • keep iterating and build our road map • our goal is to find every fill this road map with content (bi)

  17. Finished? With no fragmentation, our job is done. But, with any garbage gap, this approach would fail. So how do we link each individual bi if the disk is fragmented?

  18. Block-node linker -1 We have to logically connect bi and bj We explore the caller-callee relationship: • fill block place of bcaller and bcallee • find address • logically link them together • function prologue signature (local calls) • PLT instruction sequence (library calls)

  19. Block-node liner -2 On a library call • use PLT block number as an anchor • use this anchor to identify absolute block number of the caller block On a local call • only determines distance • only works with blocks starting with e8 (CALL opcode) Most cases library calls are used to resolve block numbers

  20. Conflict-node resolver -1 A particular placeholder i could have several candidates. To eliminate redundant placeholders: • use identified non-conflict nodes • explore logic connections • resolve node • iterate through until a fixed point is reached

  21. Conflict-node resolver -2 Block-node linker only focuses on linking code blocks. Conflict-node resolver handles other data blocks (.data, .debug).

  22. Conflict-node resolver -3 To retrieve data blocks: • treat data sections as a block between the ELF header and the first block of code section • resolvers explores constraints defined in PHT and SHT • worst case scenario: data section does not have identifiable sections and we must use dynamic execution to eliminate bogus permutations • essentially, if the recovered binary file doesn’t crash, it may have been recovered successfully

  23. Evaluation - Comparison Comparisons were intended to be made to other similar tools, both Foremost and Scalpel do not support carving for fragmented ELF binary files.

  24. Evaluation -1

  25. Evaluation -2 • All files are ELF binaries • worst case, high false positive rates • addition of heterogeneous data irrelevant • performance of algorithm is invariant to size of the disk • performance relies on number of files to be recovered

  26. Evaluation -3 To evaluate accuracy, need to prove the recovered files are true elf files. Need to create an MD5 hash of first block and every individual block for each true ELF binary to detect true data in worst case fragmentation scenario.

  27. Effectiveness -1 Identification rate: • shows portion that can be identified no matter how fragmented the disk is • must be able to match hash values Recovery Rate • valid files in the system that were identified and recovered

  28. Effectiveness -3 Overall, very effective. On average: • Identification rate of 96.3% • Recovery rate of 93.1%

  29. Effectiveness -3

  30. Runtime Analysis -1 All performance slowdowns occur during linker and resolver phases. Large gaps hurt performance, and the large number of caller-callee instructions cause performance penalties.

  31. Runtime Analysis -2

More Related