480 likes | 603 Views
Mobility, Security, and Proof-Carrying Code Peter Lee Carnegie Mellon University. Lecture 3 July 12, 2001 VC Generation and Proof Representation. Lipari School on Foundations of Wide Area Network Programming. Whew!. Recap. When the host system receives certified code, it
E N D
Mobility, Security, andProof-Carrying Code Peter LeeCarnegie Mellon University Lecture 3 July 12, 2001 VC Generation and Proof Representation Lipari School on Foundations of Wide Area Network Programming
Recap • When the host system receives certified code, it • inspects the code, generating verification conditions (VCs), and • finds a proof for each VC (if it can). • [Abstractly, one thinks of generating a single predicate, which is the conjunction of all the VCs.] • Generation of VCs is done relative to a safety policy.
High-Level Architecture Code Verification condition generator Checker Explanation Agent Safety policy Host
What Is a “Safety Policy”? • Yesterday, we gave the intuition of a reference interpreter that aborts the program just prior to any unsafe operation. • In this case, the reference interpreter essentially defines the safety policy.
Safety Policies • More formally, we begin by defining the small-step operational semantics of a machine, call it the s86. • , , pcinstr’, pc’ • We define the machine so that only safe executions are defined. program program counter register state
Safety Policies, cont’d • For convenience we choose the s86 to be a restriction of the x86. • Hence all s86 programs will execute faithfully on a real x86. • The goal then is to prove that any given program always makes progress (or returns) in the s86. • With such a proof, the x86 is then just as good as an s86.
Verification Conditions • The point of the verification conditions, then, is to provide such progress theorems for each instruction in the program. • In other words, a VC’s validity says that the corresponding instruction has a defined execution in the s86 operational semantics.
Symbolic Evaluator • We can define the verification condition generator (VCGen) via a symbolic evaluator • SE,,0,Post(i, , L) • The result of symbolic evaluation is a conjunction of VCs, so the overall progress theorem is then • Pre SE,,0,Post(i, , L) annotations LF signature entry point postcondition
Soundness • For particular operational semantics (a safe x86 and a safe Alpha), we have presented theorems that say, essentially: • Thm: If Pre SE,,0,Post(i, , L), then execution of , given Pre and 0, and starting from entry point i, will always make progress (or return).
Getting from Concept to Implementation • In an actual implementation, it is also handy to have a bit more than just a VC generator. • Precise syntax for VCs. • Pre/post-conditions for each entry point expected by the host in any downloaded code. • Precisely specified logical system for proving the VCs.
Safety Policy Implementations • Safety policies are thus given in three parts: • A verification-condition generator (VCGen). • A specification of the pre & post conditions for all required procedures. • A specification of the inference rules for constructing valid proofs. • LF is used for the rule and pre/post specifications, C for the VCGen.
C?!@$#@! • The use of C to define and implement the VCGen is, at best, expedient and at worst dubious. • However, since any code-inspection system must parse object files (not trivial!) and understand the instruction set, this seems to have practical benefits. • Clearly, a more formal approach would be desirable.
ExampleJava Type-Safety Specification • Our largest example of a safety-policy specification is for the “SpecialJ” Java native-code compiler. • It contains about 140 inference rules. • Roughly speaking, these rules can be separated into 5 classes.
Safety PolicyRule Excerpts 1. Standard syntax and rules for first-order logic. Syntax of predicates. /\ : pred -> pred -> pred. \/ : pred -> pred -> pred. => : pred -> pred -> pred. all : (exp -> pred) -> pred. pf : pred -> type. truei : pf true. andi : {P:pred} {Q:pred} pf P -> pf Q -> pf (/\ P Q). andel : {P:pred} {Q:pred} pf (/\ P Q) -> pf P. ander : {P:pred} {Q:pred} pf (/\ P Q) -> pf Q. … Type of valid proofs, indexed by predicate. Inference rules.
Safety PolicyRule Excerpts 2. Syntax and rules for arithmetic and equality. “csuble” means in the x86 machine. = : exp -> exp -> pred. <> : exp -> exp -> pred. eq_le : {E:exp} {E':exp} pf (csubeq E E') -> pf (csuble E E'). moddist+: {E:exp} {E':exp} {D:exp} pf (= (mod (+ E E') D) (mod (+ (mod E D) E') D)). =sym : {E:exp} {E':exp} pf (= E E') -> pf (= E' E). <>sym : {E:exp} {E':exp} pf (<> E E') -> pf (<> E' E). =tr : {E:exp} {E':exp} {E'':exp} pf (= E E') -> pf (= E' E'') -> pf (= E E'').
Safety PolicyRule Excerpts 3. Syntax and rules for the Java type system. jint : exp. jfloat : exp. jarray : exp -> exp. jinstof : exp -> exp. of : exp -> exp -> pred. faddf : {E:exp} {E':exp} pf (of E jfloat) -> pf (of E' jfloat) -> pf (of (fadd E E') jfloat). ext : {E:exp} {C:exp} {D:exp} pf (jextends C D) -> pf (of E (jinstof C)) -> pf (of E (jinstof D)).
Safety PolicySample Rules 4. Rules describing the layout of data structures. aidxi : {I:exp} {LEN:exp} {SIZE:exp} pf (below I LEN) -> pf (arridx (add (imul I SIZE) 8) SIZE LEN). wrArray4: {M:exp} {A:exp} {T:exp} {OFF:exp} {E:exp} pf (of A (jarray T)) -> pf (of M mem) -> pf (nonnull A) -> pf (size T 4) -> pf (arridx OFF 4 (sel4 M (add A 4))) -> pf (of E T) -> pf (safewr4 (add A OFF) E). This “sel4” means the result of reading 4 bytes from heap M at address A+4.
Safety PolicySample Rules 5. Quick hacks. nlt0_0 : pf (csubnlt 0 0). nlt1_0 : pf (csubnlt 1 0). nlt2_0 : pf (csubnlt 2 0). nlt3_0 : pf (csubnlt 3 0). nlt4_0 : pf (csubnlt 4 0). Sometimes “unclean” things are put into the specification...
Homework Exercise • 4. Some of the proof rules are specific to the type system of the source language (Java), even though we are actually verifying x86 machine code. • Why has this been done?
A Note about Memory • We define a type for valid heap memory states: • mem : exp • and operators for reading and writing heap memory: • (sel M A) • (upd M A E)
High-Level Architecture Code Verification condition generator Checker Explanation Agent Safety policy Host
Example: Source Code public class Bcopy { public static void bcopy(int[] src, int[] dst) { int l = src.length; int i = 0; for(i=0; i<l; i++) { dst[i] = src[i]; } } }
Example: Target Code L7: ANN_LOOP(INV = { (csubneq ebx 0), (csubneq eax 0), (csubb edx ecx), (of rm mem)}, MODREG = (EDI,EDX,EFLAGS,FFLAGS,RM)) cmpl %esi, %edx jae L13 movl 8(%ebx, %edx, 4), %edi movl %edi, 8(%eax, %edx, 4) incl %edx cmpl %ecx, %edx jl L7 ret L13: call __Jv_ThrowBadArrayIndex ANN_UNREACHABLE nop L6: call __Jv_ThrowNullPointer ANN_UNREACHABLE nop ANN_LOCALS(_bcopy__6arrays5BcopyAIAI, 3) .text .align 4 .globl _bcopy__6arrays5BcopyAIAI _bcopy__6arrays5BcopyAIAI: cmpl $0, 4(%esp) je L6 movl 4(%esp), %ebx movl 4(%ebx), %ecx testl %ecx, %ecx jg L22 ret L22: xorl %edx, %edx cmpl $0, 8(%esp) je L6 movl 8(%esp), %eax movl 4(%eax), %esi
Cut Points • Each loop entry must be annotated as a cut point. • VCGen requires this so that checking can be performed in a single scan of the code. • As a convenience, the modified registers are also declared in the cut annotations.
Example: Target Code L7: ANN_LOOP(INV = { (csubneq ebx 0), (csubneq eax 0), (csubb edx ecx), (of rm mem)}, MODREG = (EDI,EDX,EFLAGS,FFLAGS,RM)) cmpl %esi, %edx jae L13 movl 8(%ebx, %edx, 4), %edi movl %edi, 8(%eax, %edx, 4) incl %edx cmpl %ecx, %edx jl L7 ret L13: call __Jv_ThrowBadArrayIndex ANN_UNREACHABLE nop L6: call __Jv_ThrowNullPointer ANN_UNREACHABLE nop ANN_LOCALS(_bcopy__6arrays5BcopyAIAI, 3) .text .align 4 .globl _bcopy__6arrays5BcopyAIAI _bcopy__6arrays5BcopyAIAI: cmpl $0, 4(%esp) je L6 movl 4(%esp), %ebx movl 4(%ebx), %ecx testl %ecx, %ecx jg L22 ret L22: xorl %edx, %edx cmpl $0, 8(%esp) je L6 movl 8(%esp), %eax movl 4(%eax), %esi VCGen requires annotations in order to simplify the process.
Example: Source Code public class Bcopy { public static void bcopy(int[] src, int[] dst) { int l = src.length; int i = 0; for(i=0; i<l; i++) { dst[i] = src[i]; } } }
The VCGen Process (1) _bcopy__6arrays5BcopyAIAI: cmpl $0, src je L6 movl src, %ebx movl 4(%ebx), %ecx testl %ecx, %ecx jg L22 ret L22: xorl %edx, %edx cmpl $0, dst je L6 movl dst, %eax movl 4(%eax), %esi L7: ANN_LOOP(INV = … A0 = (type src_1 (jarray jint)) A1 = (type dst_1 (jarray jint)) A2 = (type rm_1 mem) A3 = (csubneq src_1 0) ebx := src_1 ecx := (sel4 rm_1 (add src_1 4)) A4 = (csubgt (sel4 rm_1 (add src_1 4)) 0) edx := 0 A5 = (csubneq dst_1 0) eax := dst_1 esi := (sel4 rm_1 (add dst_1 4))
The VCGen Process (2) L7: ANN_LOOP(INV = { (csubneq ebx 0), (csubneq eax 0), (csubb edx ecx), (of rm mem)}, MODREG = (EDI, EDX, EFLAGS,FFLAGS,RM)) cmpl %esi, %edx jae L13 movl 8(%ebx,%edx,4), %edi movl %edi, 8(%eax,%edx,4) … A3 A5 A6 = (csubb 0 (sel4 rm_1 (add src_1 4))) edi := edi_1 edx := edx_1 rm := rm_2 A7 = (csubb edx_1 (sel4 rm_2 (add dst_1 4)) !!Verify!! (saferd4 (add src_1 (add (imul edx_1 4) 8)))
The Checker (1) The checker is asked to verify that (saferd4 (add src_1 (add (imul edx_1 4) 8))) under assumptions A0 = (type src_1 (jarray jint)) A1 = (type dst_1 (jarray jint)) A2 = (type rm_1 mem) A3 = (csubneq src_1 0) A4 = (csubgt (sel4 rm_1 (add src_1 4)) 0) A5 = (csubneq dst_1 0) A6 = (csubb 0 (sel4 rm_1 (add src_1 4))) A7 = (csubb edx_1 (sel4 rm_2 (add dst_1 4)) The checker looks in the PCC for a proof of this VC.
The Checker (2) In addition to the assumptions, the proof may use axioms and proof rules defined by the host, such as szint : pf (size jint 4) rdArray4: {M:exp} {A:exp} {T:exp} {OFF:exp} pf (type A (jarray T)) -> pf (type M mem) -> pf (nonnull A) -> pf (size T 4) -> pf (arridx OFF 4 (sel4 M (add A 4))) -> pf (saferd4 (add A OFF)).
Checker (3) A proof for (saferd4 (add src_1 (add (imul edx_1 4) 8))) in the Java specification looks like this (excerpt): (rdArray4 A0 A2 (sub0chk A3) szint (aidxi 4 (below1 A7))) This proof can be easily validated via LF type checking.
VCGenSummary • VCGen is a symbolic evaluator for the object language. • It essentially implements a reference interpreter, except: • it uses symbolic values in order to model all possible executions, and • instead of performing run-time checks, it asks a Checker to verify the safety of “dangerous” instructions.
Homework Exercises • 5. When a loop invariant is encountered for the second time, what actions must the VCGen perform? • 6. In principle, how big can a VC get, relative to the size of the program? • 7. What kind of program might make a VC get very large?
Another Example[by George Necula] void fir (int *data, int dlen, int *filter, int flen) { int i, j; for (i=0; i<=dlen-flen; i++) { int s = 0; for (j=0; j<flen; j++) s += filter[j] * data[i+j]; data[i] = s; } } Skip this example
Compiled Example /* rd=data, rdl=dlen, rf=filter, rfl=flen */ ri = 0 sub t1 = rdl, rfl L0: CUT(ri,rj,rs,t2,t3,t4,rm) le t2 = ri, t1 jeq t2, L3 rs = 0 rj = 0 L1: CUT(rj,rs,t2,t3,t4) lt t2 = rj, rfl jeq t2, L2 ult t2 = rj, rfl jeq t2, Labort ld t3 = [rf + 4*rj] add t2 = ri, rj ult t4 = t2, rdl jeq t4, Labort ld t2 = [rd + 4*t2] mul t2 = t3, t2 add rs = rs, t2 add rj = rj, 1 jmp L1 L2: ult t2 = ri, rdl jeq t2, Labort st [rd + 4*ri] = rs add ri = ri, 1 jmp L0 L3: ret Labort: call abort
The Safety Policy • The safety policy defines verification conditions of the form: • true, E = E • saferd(M, E), safewr(M, E, E) • array(EA, ES, EL), vector(EA, ES, EL) • Prefir = array(rd,4,rdl), vector(rf,4,rfl) • Postfir = true
VCGen Example Set rd=cd; rdl=cdl; rf=cf; rfl=cfl; rm=cm Assume precondition: array(cd,4,cdl) vector(cf,4,cfl) Set ri = 0 ri = 0 sub t1 = rdl, rfl L0: CUT(ri,rj,rs,t2,t3,t4,rm) le t2 = ri, t1 jeq t2, L3 … L3: ret Set t1 = sub(cdl,cfl) Set ri=ci; rj=cj; rs=cs; t2=c2; t3=c3; t4=c4; rm=cm’ Set t2 = le(ci, sub(cdl,cfl)) Assume not(le(ci, sub(cdl,cfl))) Checkpostcondition; Checkrd,rdl,rf,rfl have initial values
VCGen Example Set ri = 0 ri = 0 sub t1 = rdl, rfl L0: CUT(ri,rj,rs,t2,t3,t4,rm) le t2 = ri, t1 jeq t2, L3 rs = 0 rj = 0 L1: CUT(rj,rs,t2,t3,t4) lt t2 = rj, rfl jeq t2, L2 … L2: ult t2 = ri, rdl jeq t2, Labort st [rd + 4*ri] = rs Set t1 = sub(cdl,cfl) Set ri=ci; rj=cj; rs=cs; t2=c2 t3=c3; t4=c4; rm=cm’ Set t2 = le(ci, sub(cdl,cfl)) Assume le(ci, sub(cdl,cfl)) Set rs = 0 Set rj = 0 Set rj=cj’; rs=cs’; t2=c2’; t3=c3’; t4=c4’ Set t2 = lt(cj’, cfl) Assume not(lt(cj’, cfl)) Set t2 = ult(ci, cdl) Assume ult(ci, cdl) Check safewr(cm’, add(cd,mul(4,ci)),cs’)
More on the Safety Policy • The safety policy is defined as an LF signature. rdarray : saferd(M,add(A,mul(S,I))) <- array(A,S,L), ult(I,L). rdvector : saferd(M,add(A,mul(S,I))) <- vector(A,S,L), ult(I,L). wrarray : safewr(M,add(A,mul(S,I)),V) <- array(A,S,L), ult(I,L).
The Checker • When the Checker is invoked on • safewr(cm’, add(cd,mul(4,ci)), cs’) • There are assumptions: • assume0 : ult(ci,cdl). • assume1 : not(lt(cj’,cfl)). • assume2 : le(ci, sub(cdl,cfl)). • assume3 : vector(cf,4,cfl). • assume4 : array(cd,4,cdl).
The Checker, cont’d • The VC • safewr(cm’, add(cd,mul(4,ci)), cs’) • can be verified by using the rule • wrarray : safewr(M,add(A,mul(S,I)),V) <- • array(A,S,L), ult(I,L). • and assumptions • assume0 : ult(ci,cdl). • assume4 : array(cd,4,cdl).
Proof Representation • A simple (but somewhat naïve) representation of the proof is simply the sequence of proof rules: • wrarray, assume4, assume0 • We shall see that better representations are possible. • LF typechecking is sufficient for proofchecking.
Optimized Code • The previous example was somewhat simplified. • More realistic code is optimized, usually based on inferences about integer values. • Such optimizations require that arithmetic invariants be placed in the cut points.
Optimized Example /* rd=data, rdl=dlen, rf=filter, rfl=flen */ ri = 0 sub t1 = rdl, rfl L0: CUT(ri>0,{ri,rj,…}) le t2 = ri, t1 jeq t2, L3 rs = 0 rj = 0 L1: CUT(rj>0,{rj,rs,…}) lt t2 = rj, rfl jeq t2, L2 ld t3 = [rf + 4*rj] add t2 = ri, rj ld t2 = [rd + 4*t2] mul t2 = t3, t2 add rs = rs, t2 add rj = rj, 1 jmp L1 L2: st [rd + 4*ri] = rs add ri = ri, 1 jmp L0 L3: ret
VCGen Example Set ri = 0 ri = 0 sub t1 = rdl, rfl L0: CUT(ri>0, {ri,rj,rs,t2,t3,t4,rm} le t2 = ri, t1 jeq t2, L3 rs = 0 rj = 0 … Set t1 = sub(cdl,cfl) Set ri=ci; rj=cj; rs=cs; t2=c2 t3=c3; t4=c4; rm=cm’ Assume >(ci,0) Set t2 = le(ci, sub(cdl,cfl)) Assume le(ci, sub(cdl,cfl))