Recovery of Variables and Heap Structure in x86 Executables

Recovery of Variables and Heap Structure in x86 Executables Gogul Balakrishnan Thomas Reps University of Wisconsin

Overview • Introduction • Challenges • Background • Recovering A-locs via Iteration • An Abstraction for Heap-Allocated Storage • Experiments

Introduction • The Need of Analyzing Executables • What You See Is Not What You eXecute • Many Obstacles in Analyzing Executables • Data Objects are Not Easily Identifiable. • Absence of Symbol Table & Debugging Information • DeterminingtheMemoryAddressesofDataObjects • Difficult to Track the Flow of Data through Memory • Challengingtogetusefulinformationabouttheheap e.g) memset(password, ‘\0’, len); free(password);

Challenges(1/3) • Recovering Variable-like Entities • The layout of Memory is known at Compile time or Assembly time (IDAPro’ Approach) • To Recover y, the Set of Values that eax Holds at 5 Needs to be Determined. void main() { int x, y; x = 1; y = 2; return; } proc main 1 mov ebp, esp 2 sub esp, 8 3 mov [ebp-8], 1 4 mov eax, ebp 5 mov [eax-4], 2 6 add esp, 8 7 retn

Challenges(2/3) • GranularityofRecoveredVariable-likeEntities • Affects the complexity and accuracy of subsequent analyses • The Structure of Heap-Allocated Objects • Only the Size of the Allocated Block is Known. • Using Abstract-Refinement Algorithm

Challenges(3/3) • Resolving Virtual-Function Calls • A Definite Link between the Object and the Virtual Function Table is Never Established. (Weak Update) one-variable-per-malloc-site abstraction

Background(1/6) • Abstract Locations (A-locs) • Memory Region • A Set of Disjoint Memory Areas • Represents a Group of Locations that have Similar Runtime Properties • Abstract Locations • Locations between two addresses/offsets in Memory-Region • Address & Offsets are Statically Determined

Background(2/6) • Abstract Locations (cont’d) proc main 0 mov ebp,esp 1 sub esp,40 2 mov ecx,0 3 lea eax,[ebp-40] L1: mov [eax], 1 5 mov [eax+4],2 6 add eax, 8 7 inc ecx 8 cmp ecx, 5 9 jl L1 10 mov eax,[ebp-36] 11 add esp,40 12 retn

Background(3/6) • Value-Set Analysis (VSA) • Combined Numeric-Analysis & Pointer-Analysis • Over-Approximation of the values that each a-loc holds at each program point • Value-Set • The Set of Addresses and Numeric Values • N-tuple of strided intervals of the form s[l, u] • (Global Region, Procedure Region, …) • (1[0, 9], ∮) versus (∮, -8[-40, -8]) N : the number of memory-regions e.g) 8[-40, -8] = {-40, -32, -24, -16, -8}

Background(4/6) • Value-Set Analysis (cont’d) • The Value-Set of eax at L1 • (∮, 8[-40, -8]) • eax holds the offsets {-40, -32, -24, -16, -8} • Starting Addresses of Field x of p proc main 0 mov ebp,esp 1 sub esp,40 2 mov ecx,0 3 lea eax,[ebp-40] L1: mov [eax], 1 5 mov [eax+4],2 6 add eax, 8 7 inc ecx 8 cmp ecx, 5 9 jl L1 10 mov eax,[ebp-36] 11 add esp,40 12 retn Typedef struct { int x, y; } Point; int main() { int i; Point p[5]; for(i=0; i<5; ++i) { p[i].x = 1; p[i].y = 2; } return p[0].y; }

Background(5/6) • Aggregate Structure Identification (ASI) • Can Distinguish between Accesses to Different Parts of the Same Aggregate • Aggregate is broken up into smaller parts (atoms) • Data-Access Constraint Language (DAC) • Specifying Data-Access Pattern in the Program

Background(6/6) • Aggregate Structure Identification (cont’d) • Data-Access Constraint Language (DAC) • DataRef [l : u] refers to bytes l through u in DataRef • DataRef n : n isthe number of elements • ASI DAG e.g) P[0:11] 3 = P[0:3], P[4:7], or P[8:11] return_main p[0:39] 5[0:3] ≈ const_1[0:3]; p[0:39] 5[4:7] ≈ const_2[0:3]; return_main[0:3] ≈ p[4:7]

Recovering A-locs via Iteration • Problems of VSA • Can only Represent a Contiguous Sequence of Memory Locations • Cannot Detect Internal Substructure • Basic Idea • VSA is used to obtain memory-access patterns in the executable; • ASI is used as a heuristic to determine a set of a-locs according to the memory-access patterns obtained from the information recovered by VSA. Final Value-Sets ASI VSA IDAPro

Recovering A-locs via Iteration • Generating Data-Access Constraints from Value Input : (r, s[l, u], length) Output : (ASI Ref, Boolean) <Algorithm 1 SI2ASI> ifs[l,u] is a singleton then return <“r[l : l+length-1]”, true> else size ← max(s, length) n ← (u – l + size – 1) / size ref ← “r[l : u+size-1] n[0 : size-1]” return <ref, (s = size)> enf if e.g) s[l, l] Actual Byte Range The number of array elements (AR_main, 8[-40, -8], length) => {AR_main[(-40):(-1)] 5[0:7]} AR_main[-40:-33][0:7] AR_main[-32:-25][0:7] AR_main[-24:-17][0:7] AR_main[-16:-9][0:7] AR_main[-8:-1][0:7]

Recovering A-locs via Iteration • Generating Data-Access Constraints from Value e.g) eax : (1[0:9], ∮) ecx : (∮, 16[-160, -16]) In case of [ecx+eax] => AR[-160:-1] 10[0:15] [0:9] 10[0:0] Row-major order <Algorithm 2> if (s1[l1,u1] or s2[l2,u2] is a singleton then return SI2ASI(r, s1[l1, u1] ⊕ s2[l2, u2], length) end if if s1≥ (u2 – l2 + length) then baseSI ← s1[l1, u1] indexSI ← s2[l2, u2] else if s2≥ (u1 – l1 + length) then baseSI ← s2[l2, u2] indexSI ← s1[l1, u1] else return SI2ASI(r, s1[l1, u1] ⊕ s2[l2, u2], length) end if <baseRef, exactRef> ← SI2ASI(r, baseSI, stride(baseSI)) if exactRef is false then return SI2ASI(r, s1[l1, u1] ⊕ s2[l2, u2], length) else return concat(baseRef, SI2ASI(‘’, indexSI, length)) endif Base Addr Determine base register Index Addr Base Addr

Recovering A-locs via Iteration • Interpreting Indirect Memory-References • Lookup Algorithm • NodeDesc : <name, length> • NodeDescList :An Ordered List of NodeDesc • Three Operations name :the name associated with the ASI tree node length : the length of above node e.g) [nd1, nd2, …, ndn]

Recovering A-locs via Iteration • Lookup Algorithm Examples e.g) Lookup p[0:39] 5[0:3] GetChildren(p) = [<a3, 4>, <a4, 4>, <i2, 32>] GetRange(0, 39) = [<a3, 4>, <a4, 4>, <i2, 32>] GetArrayElements(5) = [<a3, 4>, <a4, 4>], [<a5, 4>, <a6, 4>] GetRange(0, 3) = [<a3, 4>, <a5, 4>]

An Abstraction for Heap-Allocated Storage • Previous Abstraction • Recency Abstraction • Allowing VSA & ASI to recover Info. About virtual-function tables • Use Two Memory-Regions per allocation site s • MRAB[s] : Most Recently Allocated Block • NMRAB[s] : Non-Most Recently Allocated Block • count : How many concrete blocks the memory-region represents (MRAB[s].count, NMRAB[s].count) • SmallRange = {[0, 0], [0, 1], [1, 1], [0, ∞], [1, ∞], [2, ∞]} • size : over-approximation of the size of block (MRAB[s].size, NMRAB[s].size) All of the nodes allocated at a given allocation site s are folded together into a single summary node ns.

An Abstraction for Heap-Allocated Storage • Operation • AbsEnv[s]:MRAB[s]/NMRAB[s]→<count,size,alocEnv> • AlocEnv = a-loc → ValueSet • Allocation site s transforms absEnv to absEnv’ • absEnv’(MRAB[s]) = <[0,1], size, a-loc.Value-Set> • absEnv’(NMRAB[s]).count = absEnv(NMRAB[s]).count + absEnv(MRAB[s]).count • absEnv’(NMRAB[s]).size = absEnv(NMRAB[s]).size ∪ absEnv(MRAB[s]).size • absEnv’(NMRAB[s]).alocEnv = absEnv(NMRAB[s]).alocEnv ∪ absEnv(MRAB[s]).alocEnv

An Abstraction for Heap-Allocated Storage

Experiments • Environments • Software

Experiments • Results of Virtual-Function Call Resolution

Experiments • Results of A-loc Identification • Comparing the Results of Algorithm with Debugging Information The structure of 87% of the local variables is correct

Experiments • Results of A-loc Identification The structure of 72% of the objects in the heap is correct

Q & A

Recovery of Variables and Heap Structure in x86 Executables

Recovery of Variables and Heap Structure in x86 Executables

Presentation Transcript

Analyzing Memory Accesses in x86 Executables

Impact of Environment and Cultural Variables on Organization Structure and Style

A heap of Clusters

Heap And Heap Sort

Heap and Others

Computer Structure X86 Virtual Memory and TLB

Model Checking x86 Executables with CodeSurfer/x86 and WPDS++

Detecting Malicious Executables

CodeSurfer / x86 A Platform for Analyzing x86 Executables

Shareable Executables

Heap structure

PC hardware and x86

Analyzing Memory Accesses in x86 Executables

Heap

Review of C and x86

Heap

Structure Equation With Nonnormal Variables

Heap

ADT Table and Heap

Analyzing Memory Accesses in x86 Executables

The Heap Data Structure