1 / 9

Program Provenance Guessing the Source Compiler from Binary Code

Program Provenance Guessing the Source Compiler from Binary Code. Nathan Rosenblum. Why compiler provenance?. IDA Pro. Why should this work?. int bar(int foo) { int i, j; for(i=0;i<foo;++i) { i = j + i; j *= i; } return j; }. GCC. ICC.

deborahross
Download Presentation

Program Provenance Guessing the Source Compiler from Binary Code

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Program ProvenanceGuessing the Source Compiler from Binary Code Nathan Rosenblum

  2. Why compiler provenance? IDA Pro Guessing the Source Compiler

  3. Why should this work? Guessing the Source Compiler

  4. int bar(int foo) { int i, j; for(i=0;i<foo;++i) { i = j + i; j *= i; } return j; } GCC ICC test edi,edi jle 4004ae <bar+0x16> mov eax,0x0 lea eax,[rdx+rax] imuledx,eax add eax,0x1 cmpedi,eax jg 4004a1 <bar+0x9> moveax,edx ret xoredx,edx test edi,edi jle 400989 <bar+0x11> add edx,eax imuleax,edx inc edx cmpedx,edi jl 40097e <bar+0x6> ret Guessing the Source Compiler

  5. Modeling binary code gcc gcc gcc gcc program binary icc icc 8d b4 26 00 00 00 008d bc 27 00 00 00 00 90 𝑦i₋₁ 𝑦i 𝑦 i ₊₁ 𝑦 i ₊₂ icc icc none icc addrs. 80 4c 90 80 4c 94 80 4c 98 80 4c 9b … … padding compiler labels match_init zp_init_keys seekable … c7 04 24 10 70 05 08 ff d0 c9 c3 90 81 ec e4 00 00 00 8b b4 24 ec 00 00 00 … data underlying bytes Guessing the Source Compiler

  6. Describing code single-instruction wildcard 〈mov [IMM], RAX ; * ; sub [IMM], RAX〉 instruction-level abstracts several IA32 opcodes hide immediate values 011101011010101010101110101001010101110001001001011010110011010101010101010010011110 branch control flow-level … … 〈mov [IMM], RAX ; * ; sub [IMM], RAX〉 〈add[IMM], RDX ; * ; sub RAX, RCX〉 + [math elided] 〈push EBP ; mov ESP, EBP〉 〈 *; * ; sub [IMM], RAX〉 〈shl[IMM], RAX ; shr[IMM], RAX〉 Guessing the Source Compiler

  7. Results [R, Miller, Zhu PASTE ‘10] error types single compiler 2.8% or 01110101101010101010111010100101010111000100100101101011001101010101010101001 01110101101010101010111010100101010111000100100101101011001101010101010101001 01110101101010101010111010100101010111000100100101101011001101010101010101001 92.5% 6.4% GCC ICC MSVC mixed compiler 5.3% 01110101101010101010111010100101010111000100100101101011001101010101010101001 01110101101010101010111010100101010111000100100101101011001101010101010101001 01110101101010101010111010100101010111000100100101101011001101010101010101001 93.7% 2.3% Guessing the Source Compiler Guessing the Source Compiler

  8. Finer detail: compiler versions, optimization Major versions? easy 99% GCC 3.x vs 4.x Minor versions? easy 85-99% GCC 4.2 vs 4.3 Low optimization vs. high optimization? easy 99% GCC -O0 vs -O3 Highly optimized code? hard 60% GCC –O2 vs –O3 Guessing the Source Compiler

  9. Future work int bar(int foo) { int i, j; for(i=0;i<foo;++i) { i = j + i; j *= i; } return j; } ... 01110101101010101010111010100101010111000100100101101011001101010101010101001001111010101110100101101010 Guessing the Source Compiler

More Related