220 likes | 362 Views
CS711 Foundational PCC. Greg Morrisett Cornell University. Claimed Contributions. Types and typing rules not baked in PCC has to prove soundness & consistency of these rules at meta-level Allocation & initialization PCC/TAL treated these issues in a funky way
E N D
CS711Foundational PCC Greg Morrisett Cornell University
Claimed Contributions • Types and typing rules not baked in • PCC has to prove soundness & consistency of these rules at meta-level • Allocation & initialization • PCC/TAL treated these issues in a funky way • Much wider variety of type constructors • records, tagged unions, 1st class functions and labels, ADTs, unions, intersections, covariant recursive types • Remove VCGen (i.e., decompilation of machine code into logic) • no need to prove correctness of VCGen Lang. Based Security
The Logic • Higher-order logic • Natural numbers, arithmetic, induction • Semantics of machine instructions • Predicates describing readability/writeability/jumpability of machine addresses Lang. Based Security
Semantics: Relational upd(f,d,x,f') =def=All z.(d = z & f'(z) = x) or (d != z & f'(z) = f(z)) add(d,s1,s2)(r,m,r',m') =def=upd(r,d,r(s1)+r(s2),r') & m = m' load(d,s,c)(r,m,r',m') =def=readable(r(s)+c) & upd(r,d,m(r(s)+c),r') & m = m' Lang. Based Security
Semantics Contd. store(s1,s2,c)(r,m,r',m') =def=writable(r(s2)+c) & upd(m,r(s2)+c,r(s1),m') & r = r' jump(d,s,c)(r,m,r',m') =def=Exists r''.upd(r,pc,r(s)+c,r'') & upd(r'',d,pc,r') & m = m' Lang. Based Security
Decoding is Explicit • decode(e,m,i) • address e of machine m has instruction i • Exists d,s1,s2. format(m(v),0,d,s1,s2) & i = add(d,s1,s2)or Exists d,s1,s2. format(m(v),1,d,s1,s2) & i = addi(d,s1,s2)or ... • Here, format(...) is a predicate for showing that a given instruction assembles into a particular number. Lang. Based Security
The Step and Multi-Step Rlns • step(r,m,r',m') =def= Exists i, r''.decode(r(pc),m,i) & upd(r,pc,r(pc)+1,r'') & i(r'',m,r',m') • only holds when code is "safe" • The multi-step rule is a co-inductive extension of the single-step • intuition: start with set of all (finite & infinite) sequences of states and weed out only those that can't arise from the step relation. Lang. Based Security
Neat Things • No restriction (in principle) on self-modifying code • In practice, you immediately assume as part of the safety policy that code is immutable. Lang. Based Security
Types • int(m)(v) =def= true • any value is an integer • record(t1,t2) m v =def= readable(v) & readable(v+1) & t1 m(m v) & t2 m(m(v + 1)) • v and v+1 must be readable • the values in memory m at locations v and v+1 must have types t1 and t2 respectively (in memory m) • Necula's rule for traversal of a pair is then trivial Lang. Based Security
Prob: how to create a pair? • Because the types are inductively defined, any write to memory could invalidate the type of something else. • record2(record2(int,int),int)) m r1 • means in memory m, r1 has type (int*int)*int • if I do a store, then I get a new memory m' • so, it is no longer the case that r1 has type (int*int)*int • we could establish this fact as long as we knew the update either: • (a) preserved the type of r1 or • (b) didn't write any locations in common with r1's traversal (e.g., r1, m(r1), m(r1+1), m(m(r1)), m(m(r1)+1)) Lang. Based Security
Soln for allocation: • Types indexed not only by memory, but also the current set of allocated locations: • record(t1,t2)(a,m) v =def= v in a & v+1 in a & readable(v) & readable(v+1) & t1(a,m)(m v) & t2(a,m)(m,v) • a is a predicate on the set of allocated values. • In this paper, just set of values in some pre-determined range (e.g., 100 up to r6) Lang. Based Security
Extension • To support allocation, a value should retain its type even if: • the allocated set grows • there are writes to the unallocated set • Hence, the notion of validity on types: • valid(t) =def= All a,a',m,v. (a subset a') => t(a,m)v => t(a',m)v & All a,m,m',v. (All x in a.m(x) = m'(x)) => t(a,m)v => t(a,m')v Lang. Based Security
Remarks • What they're doing is giving a semantics for types using the machine. • We're used to seeing a semantics for types as sets or PERs or some other mathematical objects. • These extension properties had to be shown for TAL as well. • if a heap H is described by T, then H[x->v] is also described by T. • Of course, we had to show this was true. • Here, you can't define a type unless it is true. Lang. Based Security
Given This Setup constty i (a,m) v =def= v=i char(a,m) v =def= 0 <= v < 256 boxed(a,m) v =def= v >= 256 ptr t (a,m) v =def= v in a & readable(v) & t (a,m) (m v) offset i t (a,m) =def= t(a,m)(v+i) field i t =def= offset i (ptr t) union(t1,t2)(a,m) v =def= t1(a,m)v or t2(a,m)v intersect(t1,t2)(a,m)v =def= t1(a,m)v & t2(a,m)v record2(t1,t2) =def= intersect(field 0 t1, field 1 t2) Lang. Based Security
More Constructors aref t(a,m) v =def= v in a & readable(v) & Exists a'.(a' subset a) & v not in a' & t(a',m)(m v) • acyclic mutable refs • giving a semantics for (possibly) cyclic refs is hard (see next paper) existential(F)(a,m)v =def= Exists t.(F t)(a,m)v & valid(t) universal(F)(a,m)v =def=All t.valid(t) => (F t)(a,m) v Lang. Based Security
Code Pointers codeptr(P)(a,m) v =def=All r',m'. r'(pc) = v & stdp(r',m') & // global invariants P(stda(r',m'),m')(r') // "call convention" => safe(r',m') • The code type wraps up the implicit global invariants (e.g., the heap can grow with out invalidating the old types) as well as the calling convention (e.g., this register holds the allocation pointer.) Lang. Based Security
Recursive Types • subtype(t1,t2) =def= All a,m,v.t1(a,m)v => t2(a,m)(v) • rec(F) =def= All t.valid(t) => subtype(F(t),t) => t(a,m)(v) • e.g., list(char) =def= rec(fn t => union(constty 0, intersect(boxed,record2(char,t)))) Lang. Based Security
Validity & Recursive Types • rec(F) is valid for any function that preserves validity and is monotone: • preserves_valid(F) =def= All t.valid(t) => valid(F(t)) • monotone(F) =def= All t1,t2. subtype(t1,t2) => subtype(F(t1),F(t2)) • In particular, this gives us:rec(F)(a,m)(v) <=> (F(rec(F)))(a,m)(v) Lang. Based Security
Bad News • Does not handle code types (in general) • e.g., datatype d = D of d->d • any negative occurrence will not be monotone • This is (also) the problem with supporting refs in general • when the heap has cycles, you need recursive types to describe it. • Could construct a more elaborate semantics based on domain theory • but then you have to code up all of domain theory in the logic Lang. Based Security
Good News • See the next paper w. McAllester • there is a simple model • easy to code up in the logic • (go over this next time) Lang. Based Security
Stepping Back What's been accomplished with FPCC? • assume only h.o. logic and machine semantics (TCB) • build enough math to encode the semantics of your types • proofs are much more detailed • less brittle than syntactic approaches? • yes: working at the machine level, can define new derived type constructors w/out reproving correctness, admissible rules, etc. • no: have to bake in details (e.g., allocation framework, safety policy, etc.) and scaling the semantics to realistic languages is much harder Lang. Based Security