1 / 19

Overview of Back-end for CComp

Overview of Back-end for CComp. Zhaopeng Li Software Security Lab. June 8, 2009. Outline. Design Points Assembly Language : “x86” Low-level Intermediate Language Future Work. Design Points. Assembly Language Target : SCAP with x86 abstract machine ;

tanner
Download Presentation

Overview of Back-end for CComp

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview of Back-end for CComp Zhaopeng Li Software Security Lab. June 8, 2009

  2. Outline • Design Points • Assembly Language : “x86” • Low-level Intermediate Language • Future Work

  3. Design Points • Assembly Language • Target : SCAP with x86 abstract machine; • Maybe next version the program logic is changed; • Or another machine will be used. • Low-level Intermediate Language • Hide some machine-specific things; • Note that, this level can be just a helper to generate code and proof.

  4. Assembly Language : “x86”

  5. Some Topics about “x86” • Data Representation • 32-bit vs “fake” 32-bit • Don’t care how to store the data as bits. • Integer : 4 bytes • Pointer : 4 bytes • Data Alignment • Callee-saved Registers • EBX, ESI, EDI, EBP

  6. Some Topics about “x86” (cont.) • Calling convention: • Parameters passed on the stack, pushed from right to left; Or the first three are passed through register EAX, ECX and EDX, and the other are passed on the stack; • Register EAX, ECX, and EDX are used in the callee; Other registers must be saved on the stack and pop before the return of the function; • Return value is stored in the register EAX ; • Caller cleans up the stack (parameter).

  7. Some Topics about “x86” (cont.) Prolog (typical) Epilog(typical) mov ebp, esp ;reset the stack to ; "clean" away the local variables pop ebp ;restore the original base pointer ret ;return from the function _function: push ebp ;store the old base pointer mov esp, ebp ;make the base ; pointer point to the current stack; location sub x, esp ; x is the size, in bytes leave ret enter x, 0 esp … local variables local variables ebp old ebp old ebp esp old eip old eip old eip esp parameters parameters parameters … … … ebp ebp after the return func. entry after Stack frame setup

  8. Assembly Abstract Machine “m86” • Code Heap (C) • Code storage, • Unchanged during execution • Machine State • Memory (M) • Register File (R) • Instruction Pointer (eip), • current instruction c = C(eip) • Or just use instruction sequence (I)

  9. Assembly Language : “x86” • “AT&T-syntax” • Reg. r ::= eax | ebx | ecx | edx | esi | edi | esp | ebp • FReg. fr ::= sf | zf • Int. b ::= n (integer) • Instr. i ::= add r1, r2 | addi n, r | sub r1, r2 | subi n, r | mul r1, r2 | muli n, r | mov r1, r2 | movi n, r | movs r1, n(r2) | movl n(r1), r2 | push r | pop r | cmp r1, r2 | cmpi n, r | je r, b | jne r, b | jg r, b | jge r, b | jmp b | call b | ret | enter n, 0 | leave | malloc r | free r

  10. Program Logic • Based on SCAP • Specification (p, g) • p : State -> Prop • g : State -> State -> Prop • Inference Rules • Well-formed program • Well-formed basic block • Well-formed instruction

  11. Main Objects • Code Generation • Minimize the proof size • Eg. the temporary result should be put in register not on the stack • Assertion • Building (p, g) for each basic block • Generating (p, g) for each program point • Proof • Generating proof for functions/basic blocks • (reusing the proof of VC in source level)

  12. Assertion Relationship f : {(p’, g)} f : {p} //{q} Basic block1 Basic block1 L1 : {p1} L1 : {(p’1,g1)} Basic block2 Basic block2 p’ = trans(p) /\ paramp/\stack-regp g = trans(q) /\ callee-saved-regg /\ stackg p’ 1= trans(p1) /\ paramp 1/\ stack-regp 1 g1 = ? Intermediate Language x86 Assembly Lanuage

  13. Figure Out G R f : {R’(ebp)=R(ebp)/\R’(esp)=R(esp)+4} push ebp mov esp, ebp sub $12, esp R0(ebp) = R(ebp) /\ R0(esp) = R(esp) -4 R’(ebp) = R0(ebp) /\ R’(esp)=R0(esp)+8 R0 R’(ebp) = R(ebp) /\ R0(ebp) = R(ebp) /\ R’(esp)=R(esp)+4 /\ R0(esp) = R(esp) -4 g0 L1 : {g1} Basic block2 • The method: • Get state relation by rule of operational semantics; • Use the g of previous program point; • Do substitution and arithmetic. Leave ret R’

  14. Figure Out G (cont.) R f : {R’(ebp)=R(ebp)/\R’(esp)=R(esp)+4} push ebp mov esp, ebp sub $12, esp R’(ebp) = R0(ebp) /\ R’(esp)=R0(esp)+8 R0 g0 R1 R1(ebp) = R0(esp) /\ R1(esp) = R0(esp) R’(ebp) = M1(R1(ebp)) /\ R’(esp)=R1(esp)+8 R’(ebp) = R0(ebp) /\ R1(ebp) = R0(esp) /\ R’(esp)=R0(esp)+8 /\ R1(esp) = R0(esp) g1 L1 : {g1} Basic block2 • The method: • Get state relation by rule of operational semantics; • Use the g of previous program point; • Do substitution and arithmetic. Leave ret R’

  15. Figure Out G (cont.) R f : {R’(ebp)=R(ebp)/\R’(esp)=R(esp)+4} push ebp mov esp, ebp sub $12, esp R’(ebp) = R0(ebp) /\ R’(esp)=R0(esp)+8 R0 g0 R1 R’(ebp) = M1(R1(ebp)) /\ R’(esp)=R1(esp)+8 g1 R2 R2(ebp) = R1(ebp) /\ R2(esp) = R1(esp)-12 R’(ebp) = M2(R2(ebp)) /\ R’(esp)=R1(esp)+20 L1 : {g1} R’(ebp) = M1(R1(ebp)) /\ R2(ebp) = R1(ebp) /\ R’(esp)=R1(esp)+8 /\ R2(esp) = R1(esp)-12 Basic block2 g2 • The method: • Get state relation by rule of operational semantics; • Use the g of previous program point; • Do substitution and arithmetic. Leave ret R’

  16. Low-level Intermediate Language

  17. Potential Benefits • Hide some machine-specific things; • Some optimizations could be done (optional); • Make the implementation simple and reusable • (*Note that, this level is just a helper to generate code and proof.*) • Only add codes for translating from this level when targeting different assembly logic

  18. The Language • Loc. l ::= r | s • Int. o,b ::= n (integer) • Slot. s ::= local(o) | incoming(o) | outgoing(o) • Reg. r ::= r1 | r2 | r3 | … //infinite pseudo-registers • Instr. i ::= bop(bop, l1,l2, l) | uop(uop, l1, l) | load(r, o, l) | store(l, r, o) | getstack(s, r) | setstack(r, s) | call(id, l) | return r | malloc(r) | free(r) | goto b | label (b) | cond(l1, cmp,l2, btrue) • BinOp. bop::= add | sub | mul | … • UnOp. Uop::= minus | … • Comp. cmp::= gt | ge | eq | ne | lt | le

  19. Code Generation (optional) • Do some optimizations which do no affect proof, such as: • Branch tunneling • Dead code elimination • Future optimizations • Other low-level optimizations may be done here

More Related