670 likes | 845 Views
PhD Thesis Defense. Modular Machine Code Verification. Zhaozhong Ni Advisor: Zhong Shao Committee: Zhong Shao, Paul Hudak Carsten Sch ü rmann, David Walker Department of Computer Science, Yale University Nov. 29, 2006. 19 Lines of Code on Every PC. ; load new context
E N D
PhD Thesis Defense Modular Machine Code Verification Zhaozhong Ni Advisor: Zhong Shao Committee: Zhong Shao, Paul Hudak Carsten Schürmann, David Walker Department of Computer Science, Yale University Nov. 29, 2006
19 Lines of Code on Every PC ; load new context mov eax, [esp+8] mov esp, [eax+28] mov ebp, [eax+24] mov edi, [eax+20] mov esi, [eax+16] mov edx, [eax+12] mov ecx, [eax+8] mov ebx, [eax+4] mov eax, [eax+0] ret swapcontext: ; store old context mov eax, [esp+4] mov [eax+0], OK mov [eax+4], ebx mov [eax+8], ecx mov [eax+12], edx mov [eax+16], esi mov [eax+20], edi mov [eax+24], ebp mov [eax+28], esp
19 Lines of Code in Every ms swapcontext: • Runs thousands of time per second • Used by assembly, C, MSIL, JVML, etc. • Basis of multi-tasking, OS, and software • Safety and correctness taken for granted
19 Lines of Code Looks Simple swapcontext: … call swapcontext … eax a1 retp ebx OK a2 old b1 ecx a3 new edx a4 … b2 esi a5 b3 edi b4 a6 b5 ebp a7 esp a8 a8 b6 b7 b8 … … … retp’ …
19 Lines of Code Proven Hard swapcontext: • Simple code, complex reasoning! • stack / heap / memory mutation • procedure call / first-class code pointer • protection / polymorphism • Lack specification and verification that are • formal (machine checkable in sound logic) • general (allows all possible usage of context) • realistic (usable from assembly and C level)
Outline • Introduction • The XCAP Framework • Mini Thread Library • Connect XCAP to TAL • Conclusion
Software Reliability • Bugs are costly • Especially important for • mission-critical software • consumer electronics software • internet software
Test-Patch Approach • Works most of the time • Gives no guarantee • Could make things worse test debug yes pre-release? no create patch
Language-based Approach • Uses types and other formal specifications • Excludes all bugs in certain categories illegal command, overflow, dangling pointer, etc. • Successful and popular ML, Java, C#, etc. • Reached virtual machine code level JVML, MSIL, TIL, TAL, etc. • Meta-theorems can make guarantees
Traditional Assumptions • Types are for application software you can not write OS without (void *) • Types are for high-level languages not much to talk about 89 84 24 07 5B CD 15 • Types are only for “no blue screen” how about “variable x is a prime number” • Type safety are bad for performance turn off array-bound checking before release
Program Specification syntactic types bool prime (int n) { assert (n > 0); for (int i = 2; i < n; i ++) // n mod 2,…,i-1 ≠ 0 if (n % i == 0) return false; // n mod 2,…,n-1 ≠ 0 return true; } machine-logical specifications meta-logical specifications
Machine Code Verification • Motivations • everything goes down to binary • high-level safety efforts lost in compilation • critical code directly written in low level • Challenges • Expressiveness • Modularity • Goals • both user and system level code • modular specification + certification
Proof-Carrying Code • Proposed 10 years ago [Necula & Lee] • machine code • machine checkable proof Code Specification Proof Meta theory Checker
Foundational PCC • Proposed by [Appel] Code Specification Proof Meta theory Checker mathematic logic theory mathematic logic checker
Approaches to PCC • Type-based PCC • TAL [Morrisett98] • Touchstone PCC [Colby00] • Syntactic FPCC[Hamid02] • FTAL [Crary03] • LTAL[Chen03] • … • Modular • Generate proof easily • Type safety • Logic-based PCC • Original PCC [Necula98] • Semantic FPCC [Appel01] • CAP [Yu03] • Open Verifier [Chang05] • CCAP/CMAP [Yu04, Feng05] • … • Expressive • Advanced properties • Good interoperability
PCC After 10 Years In principle, can verify any machine code! In reality, many programs are not verified. For some code, we do not know HOW! Code Specification Proof Meta theory Checker
User-level Code: List Append Adapted from [Reynolds02] ……
User-level Code: List Append Adapted from [Reynolds02] ……
User-level Code: List Append Adapted from [Reynolds02]
ECP Problem w. Hoare Logic • Embedded code pointers (ECP) Examples: computed GOTOs, higher-order functions, indirect jumps, continuations, return addresses “… are difficult to describe in … Hoare logic”[Reynolds02] • Previous approaches • Ignore ECP [Necula98, Yu04] • Limit ECP specifications to types [Hamid04] • Sacrifice modularity [Yu03] • Use complex indexed semantic models [Appel01]
Outline • Introduction • The XCAP Framework • Mini Thread Library • Connect XCAP to TAL • Conclusion
The XCAP Framework [POPL’06] • A logic-based PCC framework • modular verification of machine code • supports ECP without compromise • Support both system and user code • Consists of • target machine (not fixed) • assertion language (consistency) • inference rules (soundness)
Certified Assembly Programming [Yu03, Hamid04, Yu04, Feng05] • Hoare logic in CPS • Use general predicate logic for assertions example: • Mechanized in a proof assistant (Coq) • Extensions made: CCAP, CMAP, etc.
The ECP Problem cptr(f, a) = ?
Previous Approach • Internalize Hoare-derivation for ECP Circularity! • Stratification [OHearn97, Naumann01] • Works for simple case • Hard for assembly • Hard for polymorphism • Step-Indexing [Appel01, Appel02, Schneck03] • Works for polymorphism • Heavyweight • Not standard Hoare logic
CAP’s Approach • Specify ECP by checking against code spec • Verify all code specs are indeed valid • Modularity problem
The XCAP Approach • Specify ECP independent of code spec • Check ECP against global code spec • Verify global code spec is indeed valid
How XCAP Works with ECP (SEQ) (ECP) (JMP) (JD)
Impredicative Polymorphisms • Important for ECP • Naïve interpretation function fails
New Interpretation Interpretation Soundness of interpretation Consistency
Recursive Specification • Simple recursive data structures • linked list, queue, stack, tree, etc. • supported via inductive definition of Prop • Complex recursive structures with ECP • object (self refers to the entire object) • threading invariant (each thread assumes others) • Recursive specification
Memory Mutation • Strong update • special conjunction (p * q) in separation logic • directly definable in Prop and PropX • explicit alias control, popular in system level • Weak update (general reference) • mutable reference (int ref) in ML • managed data pointers (int __gc*) in .NET • rely on GC to recycle memory • popular in user level
Weak Update • Reference cell • Interpretation • Record macro
Implementation in Coq • PropX can share similar tactics with Prop
Outline • Introduction • The XCAP Framework • Mini Thread Library • Connect XCAP to TAL • Conclusion
Why Thread Library? • Concurrent verification • primitives’ correctness is assumed • primitives are not really “primitive”! • poor portability due to lack of formal spec • Core of OS kernel • assignment 1 of OS course • written in C and Assembly • requires both safety and efficiency
A Mini Thread Library • Modeled after Pth • Non-preemptive user level threads • Written in (subset of) x86 assembly
Verify That 19 Lines of Code Step 1: specify machine context Step 2: specify function call/return Step 3: specify swapcontext() Step 4: prove it!
Machine Context typedef struct mctx_st *mctx_t; struct mctx_st { int eax,int ebx,int ecx,int edx, int esi, int edi, int ebp,int esp }; mctx retv public bx cx private dx cs si di bp sp ret … … … …
Function Call / Return excess space local storage esp return address argument 1 argument 2 … argument n caller frames