1 / 29

Analyzing Memory Accesses in x86 Executables

Analyzing Memory Accesses in x86 Executables. Gogul Balakrishnan Thomas Reps University of Wisconsin. Motivation. There has been a growing need for tools that analyze executables. It’s important to be able to decipher the behavior of worms, mobile code and virus-infected code . (w/o source)

calla
Download Presentation

Analyzing Memory Accesses in x86 Executables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analyzing Memory Accessesin x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin

  2. Motivation • There has been a growing need for tools that analyze executables. • It’s important to be able to decipher the behavior of worms, mobile code and virus-infected code. (w/o source) • Existing tools either treat memory accesses extremely conservatively, or assume the presence of symbol-table or debugging information. – Neither approach is satisfactory.

  3. Goal (1) • Create an intermediate representation (IR) that is similar to the IR used in a compiler • CFGs • call graph • used, killed, may-killed variables for CFG nodes • points-to sets • Why? • a tool for a security analyst • a general infrastructure for binary analysis

  4. Whole-program analysis CodeSurfer • CFGs • call graph • used, killed, may-killed • variables for CFG nodes • points-to sets • Initial estimate of • memory addr. & offsets • procedures and call sites • malloc sites Codesurfer/x86 Architecture Binary ParseBinary IDA Pro Build CFGs Connector Client Applications Value-setAnalysis Build SDG Browse

  5. Outline • Example • Challenges • Value-set analysis (VSA) • Performance • [Future work]

  6. Tutorial on x86 Instructions • mov ecx, edx ; ecx = edx • mov ecx, [edx] ; ecx = *edx • mov [ecx], edx ;*ecx = edx • lea ecx, [esp+8] ; ecx = &a[2]

  7. Running Example int arrVal=0, *pArray2; int main() { int i, a[10], *p; /* Initialize pointers */ pArray2 = &a[2]; p = &a[0]; /* Initialize Array */ for( i = 0 ; i<10 ; ++i ) { *p = arrVal; p++; } /* Return a[2] */ return *pArray2; } ; ebx  i ; ecx  variable p 1 sub esp, 40 ;adjust stack 2 lea edx, [esp+8] ; 3 mov [4], edx ;pArray2=&a[2] 4 lea ecx, [esp] ;p=&a[0] 5 mov edx, [0] ; 6 loc_9: 7 mov [ecx], edx ;*p=arrVal 8 add ecx, 4 ;p++ 9 inc ebx ;i++ 10 cmp ebx, 10 ;i<10? 11 jl short loc_9 ; 12 mov edi, [4] ; 13 mov eax, [edi] ;return *pArray2 14 add esp, 40 15 retn

  8. return_address Running Example – Address Space 0ffffh a(40 bytes) ; ebx  i ; ecx  variable p sub esp, 40 ;adjust stack lea edx, [esp+8] ; mov [4], edx ;pArray2=&a[2] lea ecx, [esp] ;p=&a[0] mov edx, [0] ; loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; mov edi, [4] ; mov eax, [edi] ;return *pArray2 add esp, 40 retn Data local to main (Activation Record) pArray2(4 bytes) Global data 4h arrVal(4 bytes) 0h

  9. Running Example – Address Space 0ffffh return_address Data local to main (Activation Record) No debugging information Global data 0h

  10. Challenges • No debugging/symbol-table information • Explicit memory addresses • need something similar to C variables • a-locs • Only have an initial estimate of • code, data, procedures, call sites, malloc sites • extend IR • disassemble data, add to CFG, . . . • similar to elaboration of CFG/call-graph in a compiler because of calls via function pointers

  11. … g f … g f global Memory-regions • An abstraction of the address space • Idea : group similar runtime addresses • collapse the runtime ARs for each procedure • one region for all global data • one region for each malloc site f f g global

  12. ret_main Example – Memory-regions ; ebx  i ; ecx  variable p sub esp, 40 ;adjust stack lea edx, [esp+8] ; mov [4], edx ;pArray2=&a[2] lea ecx, [esp] ;p=&a[0] mov edx, [0] ; loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; mov edi, [4] ; mov eax, [edi] ;return *pArray2 add esp, 40 retn (main, 0) (GL,8) (GL,0) Global Region (main, -40) Region for main

  13. “Need Something Similar to C Variables” • A-locs • locations between consecutive addresses • locations between consecutive offsets • Registers • Use a-locs instead of variables in static analysis • e.g., killed a-loc  killed variable

  14. ret_main Example – A-locs ; ebx  i ; ecx  variable p sub esp, 40 ;adjust stack lea edx, [esp+8] ; mov [4], edx ;pArray2=&a[2] lea ecx, [esp] ;p=&a[0] mov edx, [0] ; loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; mov edi, [4] ; mov eax, [edi] ;return *pArray2 add esp, 40 retn (main, 0) (GL,8) [4] (GL,4) [0] (GL,0) [esp+8] (main, -32) Global Region [esp] (main, -40) Region for main

  15. ret_main Example – A-locs ; ebx  i ; ecx  variable p sub esp, 40 ;adjust stack lea edx, [esp+8] ; mov [4], edx ;pArray2=&a[2] lea ecx, [esp] ;p=&a[0] mov edx, [0] ; loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; mov edi, [4] ; mov eax, [edi] ;return *pArray2 add esp, 40 retn (main, 0) (GL,8) mem_4 mainv_20 (GL,4) mem_0 (GL,0) (main, -32) Global Region mainv_28 (main, -40) Region for main

  16. ret_main Example – A-locs ; ebx  i ; ecx  variable p sub esp, 40 ;adjust stack lea edx, &mainv_20 ; mov mem_4, edx ;pArray2=&a[2] lea ecx, &mainv_28 ;p=&a[0] mov edx, mem_0; loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; mov edi, mem_4 ; mov eax, [edi] ;return *pArray2 add esp, 40 retn (main, 0) (GL,8) mem_4 mainv_20 (GL,4) mem_0 (GL,0) (main, -32) Global Region mainv_28 (main, -40) Region for main

  17. Example – A-locs Locals : mainv_28, mainv_20 {a[0], a[2]} Globals : mem_0, mem_4 {arrVal, pArray2} ; ebx  i ; ecx  variable p sub esp, 40 ;adjust stack lea edx, &mainv_20 ; mov mem_4, edx ;pArray2=&a[2] lea ecx, &mainv_28 ;p=&a[0] mov edx, mem_0; loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; mov edi, mem_4 ; mov eax, [edi] ;return *pArray2 add esp, 40 retn edx mainv_20 mem_4 edi ecx mainv_28

  18. Value-Set Analysis • Resembles a pointer-analysis algorithm • interprets pointer-manipulation operations • pointer arithmetic, too • Resembles a numeric-analysis algorithm • over-approximate the set of values/addresses held by an a-loc • range information • stride information • interprets arithmetic operations on sets of values/addresses

  19. Value-Set Analysis • Over approximate/estimate • global variable • local variable • struct • array • (static address) • (static stack-frame offset) or (register) • (static address) + (offset) • (static address) + (offest) C compiler Base + (index * scale) + offset

  20. Value-set • An a-loc  a variable • the address of an a-loc (memory-region, offset within the region) , (rgn, offset) • An a-loc  an aggregate variable • addresses of elements of an a-loc (rgn, {o1, o2, …, on}) • Value-set = a set of such addresses {(rgn1, {o1, o2, …, on}), …, (rgnr, {o1, o2, …, om})} “r” – number of regions in the program

  21. Value-set - RIC • Idea: approximate {o1, …, ok} with a numeric domain • {1, 3, 5, 7} represented as 2[0,3]+1 • Reduced Interval Congruence (RIC) (e.g.)[0,3] == 0,1,2,3 • common stride • lower and upper bounds • displacement • Set of addresses is an r-tuple: (ric1, …, ricr) • ric1: offsets in global region • a set of numbers: (ric1, , …, )

  22. ret_main 1 2 2 1 Example – Value-set analysis (main, 0) ; ebx  i ; ecx  variable p sub esp, 40 ;adjust stack lea edx, [esp+8 ] ; mov [4], edx ;pArray2=&a[2] lea ecx, [esp] ;p=&a[0] mov edx, [0]; loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; mov edi, [4] ; mov eax, [edi] ;return *pArray2 add esp, 40 retn (GL,8) mem_4 mainv_20 (GL,4) mem_0 (GL,0) (main, -32) Global Region mainv_28 (main, -40) Region for main Global main ecx (  , 4[0,∞]-40) ebx (1[0,9] , ) esp ( , -40) edi (  , -32) esp (  , -40)

  23. ret_main 1 2 2 1 Example – Value-set analysis (main, 0) ; ebx  i ; ecx  variable p sub esp, 40 ;adjust stack lea edx, [esp+8] ; mov [4], edx ;pArray2=&a[2] lea ecx, [esp] ;p=&a[0] mov edx, [0] ; loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; mov edi, [4] ; mov eax, [edi] ;return *pArray2 add esp, 40 retn (GL,8) mem_4 mainv_20 (main, -32) (GL,4) mem_0 (main, -40) (GL,0) Global Region mainv_28 ecx (, 4[0,∞]-40) Region for main edi (, -32) A stack-smashing attack?

  24. Affine-Relation Analysis • Value-set domain is non-relational • cannot capture relationships among a-locs • Imprecise results • e.g. no upper bound for ecx at loc_9 • ecx  (, 4[0,∞]-40) . . . loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; . . .

  25. Affine-Relation Analysis • Obtain affine relations via static analysis • Use affine relations to improve precision • e.g., at loc_9 ecx=esp+(4ebx), ebx=([0,9],), esp=(,-40) •  ecx=(,-40)+4([0,9]) •  ecx=(,4[0,9]-40) •  upper bound for ecx at loc_9 …… . . . loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; . . . ecx esp (==pArray2) ebx(==i)

  26. ret_main 1 2 1 Example – Value-set analysis (main, 0) ; ebx  i ; ecx  variable p sub esp, 40 ;adjust stack lea edx, [esp+8] ; mov [4], edx ;pArray2=&a[2] lea ecx, [esp] ;p=&a[0] mov edx, [0] ; loc_9: mov [ecx], edx ;*p=arrVal add ecx, 4 ;p++ inc ebx ;i++ cmp ebx, 10 ;i<10? jl short loc_9 ; mov edi, [4] ; mov eax, [edi] ;return *pArray2 add esp, 40 retn (GL,8) mem_4 mainv_20 (main, -32) (GL,4) mem_0 (main, -40) (GL,0) Global Region mainv_28 ecx (, 4[0,9]-40) Region for main No stack-smashing attack reported

  27. Affine-Relation Analysis • Affine relation • x1, x2, …, xn – a-locs • a0, a1, …, an – integer constants • a0 +i=1..n(ai xi) = 0 • Idea: determine affine relations on registers • use such relations to improve precision

  28. Performance Table 1.Running times for VSA and Affine-relation analysis

  29. Future Work • Aggregate Structure Identification • Ramalingam et al. [POPL 99] • Ignore declarative information • Identify fields from the access patterns • Useful for • improving the a-loc abstraction • discovering type information

More Related