190 likes | 293 Views
Overview of Back-end for CComp. Zhaopeng Li Software Security Lab. June 8, 2009. Outline. Design Points Assembly Language : “x86” Low-level Intermediate Language Future Work. Design Points. Assembly Language Target : SCAP with x86 abstract machine ;
E N D
Overview of Back-end for CComp Zhaopeng Li Software Security Lab. June 8, 2009
Outline • Design Points • Assembly Language : “x86” • Low-level Intermediate Language • Future Work
Design Points • Assembly Language • Target : SCAP with x86 abstract machine; • Maybe next version the program logic is changed; • Or another machine will be used. • Low-level Intermediate Language • Hide some machine-specific things; • Note that, this level can be just a helper to generate code and proof.
Some Topics about “x86” • Data Representation • 32-bit vs “fake” 32-bit • Don’t care how to store the data as bits. • Integer : 4 bytes • Pointer : 4 bytes • Data Alignment • Callee-saved Registers • EBX, ESI, EDI, EBP
Some Topics about “x86” (cont.) • Calling convention: • Parameters passed on the stack, pushed from right to left; Or the first three are passed through register EAX, ECX and EDX, and the other are passed on the stack; • Register EAX, ECX, and EDX are used in the callee; Other registers must be saved on the stack and pop before the return of the function; • Return value is stored in the register EAX ; • Caller cleans up the stack (parameter).
Some Topics about “x86” (cont.) Prolog (typical) Epilog(typical) mov ebp, esp ;reset the stack to ; "clean" away the local variables pop ebp ;restore the original base pointer ret ;return from the function _function: push ebp ;store the old base pointer mov esp, ebp ;make the base ; pointer point to the current stack; location sub x, esp ; x is the size, in bytes leave ret enter x, 0 esp … local variables local variables ebp old ebp old ebp esp old eip old eip old eip esp parameters parameters parameters … … … ebp ebp after the return func. entry after Stack frame setup
Assembly Abstract Machine “m86” • Code Heap (C) • Code storage, • Unchanged during execution • Machine State • Memory (M) • Register File (R) • Instruction Pointer (eip), • current instruction c = C(eip) • Or just use instruction sequence (I)
Assembly Language : “x86” • “AT&T-syntax” • Reg. r ::= eax | ebx | ecx | edx | esi | edi | esp | ebp • FReg. fr ::= sf | zf • Int. b ::= n (integer) • Instr. i ::= add r1, r2 | addi n, r | sub r1, r2 | subi n, r | mul r1, r2 | muli n, r | mov r1, r2 | movi n, r | movs r1, n(r2) | movl n(r1), r2 | push r | pop r | cmp r1, r2 | cmpi n, r | je r, b | jne r, b | jg r, b | jge r, b | jmp b | call b | ret | enter n, 0 | leave | malloc r | free r
Program Logic • Based on SCAP • Specification (p, g) • p : State -> Prop • g : State -> State -> Prop • Inference Rules • Well-formed program • Well-formed basic block • Well-formed instruction
Main Objects • Code Generation • Minimize the proof size • Eg. the temporary result should be put in register not on the stack • Assertion • Building (p, g) for each basic block • Generating (p, g) for each program point • Proof • Generating proof for functions/basic blocks • (reusing the proof of VC in source level)
Assertion Relationship f : {(p’, g)} f : {p} //{q} Basic block1 Basic block1 L1 : {p1} L1 : {(p’1,g1)} Basic block2 Basic block2 p’ = trans(p) /\ paramp/\stack-regp g = trans(q) /\ callee-saved-regg /\ stackg p’ 1= trans(p1) /\ paramp 1/\ stack-regp 1 g1 = ? Intermediate Language x86 Assembly Lanuage
Figure Out G R f : {R’(ebp)=R(ebp)/\R’(esp)=R(esp)+4} push ebp mov esp, ebp sub $12, esp R0(ebp) = R(ebp) /\ R0(esp) = R(esp) -4 R’(ebp) = R0(ebp) /\ R’(esp)=R0(esp)+8 R0 R’(ebp) = R(ebp) /\ R0(ebp) = R(ebp) /\ R’(esp)=R(esp)+4 /\ R0(esp) = R(esp) -4 g0 L1 : {g1} Basic block2 • The method: • Get state relation by rule of operational semantics; • Use the g of previous program point; • Do substitution and arithmetic. Leave ret R’
Figure Out G (cont.) R f : {R’(ebp)=R(ebp)/\R’(esp)=R(esp)+4} push ebp mov esp, ebp sub $12, esp R’(ebp) = R0(ebp) /\ R’(esp)=R0(esp)+8 R0 g0 R1 R1(ebp) = R0(esp) /\ R1(esp) = R0(esp) R’(ebp) = M1(R1(ebp)) /\ R’(esp)=R1(esp)+8 R’(ebp) = R0(ebp) /\ R1(ebp) = R0(esp) /\ R’(esp)=R0(esp)+8 /\ R1(esp) = R0(esp) g1 L1 : {g1} Basic block2 • The method: • Get state relation by rule of operational semantics; • Use the g of previous program point; • Do substitution and arithmetic. Leave ret R’
Figure Out G (cont.) R f : {R’(ebp)=R(ebp)/\R’(esp)=R(esp)+4} push ebp mov esp, ebp sub $12, esp R’(ebp) = R0(ebp) /\ R’(esp)=R0(esp)+8 R0 g0 R1 R’(ebp) = M1(R1(ebp)) /\ R’(esp)=R1(esp)+8 g1 R2 R2(ebp) = R1(ebp) /\ R2(esp) = R1(esp)-12 R’(ebp) = M2(R2(ebp)) /\ R’(esp)=R1(esp)+20 L1 : {g1} R’(ebp) = M1(R1(ebp)) /\ R2(ebp) = R1(ebp) /\ R’(esp)=R1(esp)+8 /\ R2(esp) = R1(esp)-12 Basic block2 g2 • The method: • Get state relation by rule of operational semantics; • Use the g of previous program point; • Do substitution and arithmetic. Leave ret R’
Potential Benefits • Hide some machine-specific things; • Some optimizations could be done (optional); • Make the implementation simple and reusable • (*Note that, this level is just a helper to generate code and proof.*) • Only add codes for translating from this level when targeting different assembly logic
The Language • Loc. l ::= r | s • Int. o,b ::= n (integer) • Slot. s ::= local(o) | incoming(o) | outgoing(o) • Reg. r ::= r1 | r2 | r3 | … //infinite pseudo-registers • Instr. i ::= bop(bop, l1,l2, l) | uop(uop, l1, l) | load(r, o, l) | store(l, r, o) | getstack(s, r) | setstack(r, s) | call(id, l) | return r | malloc(r) | free(r) | goto b | label (b) | cond(l1, cmp,l2, btrue) • BinOp. bop::= add | sub | mul | … • UnOp. Uop::= minus | … • Comp. cmp::= gt | ge | eq | ne | lt | le
Code Generation (optional) • Do some optimizations which do no affect proof, such as: • Branch tunneling • Dead code elimination • Future optimizations • Other low-level optimizations may be done here