380 likes | 510 Views
Doctoral Dissertation Defense. Formalizing Memory Consistency Models for Program Analysis. Jason Yue Yang. This work was supported in part by NSF Research Grant No. CCR-0081406 and SRC Task 1031.001. Motivation. Memory architectures - more aggressive. Data dependence. Load/store.
E N D
Doctoral Dissertation Defense Formalizing Memory Consistency Models for Program Analysis Jason Yue Yang This work was supported in part by NSF Research Grant No. CCR-0081406 and SRC Task 1031.001.
Motivation Memory architectures - more aggressive Data dependence Load/store Load-acquire/store-release Memory fence Write atomicity Semaphore • Multithreaded software – popular, BUT hard to analyze • - Thread libraries: e.g., P-thread, Win32, Solaris • - Language level support of threads: e.g., Java • Central Problem – shared memory consistency models • - Need a clear specification of memory ordering rules • Need an executable version of memory ordering rules • - Need a method to analyze thread executions against the rules
What Is a Memory Model? It defines the legalorderings of memory operations that can be perceived at the user level Example (Itanium assembly code, initially: a = b = 0) 0 is OK Can’t observe 0 st a,1; stb,1; ld.acqr1,b; <1> ld r2,a; <1> ldr1,b; <1> ld r2,a; <0> st a,1 ; st.relb,1; CPU CPU CPU CPU memory memory store/load less restriction store-release/load-acquire more restriction
Classical Memory Models Sequential Consistency(SC) Non-operational View: Operational View: • Common total order • Program order • Read sees the “latest” write They execute as if connected to a single memory through a non-deterministic switch memory Other Weaker Models: Parallel Random Access Memory (PRAM), Coherence, Causal Consistency, Processor Consistency, Release Consistency, Lazy Release Consistency, Location Consistency, and more …
Industrial Memory Models Example: TheIntelItanium® Memory Model • Intel application note contains more than • 30 pages of semi-formal rules • English + large amount of special notations • Many non-obvious consequences • Use litmus tests to illustrate properties • Cannot automatically execute litmus tests • Use pencil-and-paper reasoning
Language Level Memory Models Example: The Java Memory Model (JMM) • Original JMM: Chapter 17 of Java Language Specification • Poorly understood • Flawed • too weak (may introduce security hole) • too strong (prevents common optimizations) • Currently under revision (JSR-133) • Extensive discussions for more than 3 years • Several replacement proposals • Issues still remain
Why Does a Memory Model Matter? Example:Peterson’s Algorithmfor Mutual Exclusion Initially, flag1 = flag2 = false, turn = 0. Thread 1 Thread 2 flag1 = true; turn = 2; while (turn == 2 && flag2) ; <critical section> flag1 = false; flag2 = true; turn = 1; while (turn == 1 && flag1) ; <critical section> flag2 = false; • Can both threads enter the critical section simultaneously? • For sequential consistency: No (the “intended behavior” is guaranteed) • For many weaker models: Yes (the algorithm would be broken)
Do Programmers Really Care? Another example:Double-Checked Lockingfor Singleton creation class foo { private static Helper helper = null; public static Helper get() { if (helper == null) { synchronized (this) { if (helper == null) helper = new Helper(); } } return helper; } } Only use locking as needed “Double-check” the reference
Broken Under the Current JMM class foo { private static Helper helper = null; public static Helper get() { if (helper == null) { synchronized (this) { if (helper == null) helper = new Helper(); } } return helper; } } Only use locking as needed “Double-check” the reference Problem: Broken under the JMM! -on weak architectures - with race conditions - reference can be “visible” before constructor completes Can’t guarantee Helper is fully constructed!
Problems with Previous Approaches Virtually for all industrial weak memory models • They don’t have formal specifications For those that do have a formal spec on paper • They can’t be executed For those that have a machine-readable formal spec • They use a “statemachine” approach that • employ architecture-specific data structures • cannot be decomposed into orthogonal components • have not been verified against higher level rules No support for verifying “programmer expectations” in multithreaded software
Analysis of Multithreaded Software More precise More Scalable Intra-thread Inter-thread Intra-procedural Inter-procedural Memory-model sensitive Memory-model insensitive My thesis work
Contributions Operational style framework - UMM Applications: Operational Specification Method Java Memory Model, Classical memory models Non-Operational style framework - Nemos Applications: Axiomatic Specification Method Intel Itanium Memory Model, Classical memory models Prototype tools based on various solvers: CLP, SAT, QBF Incremental SAT solving; Different encoding Constraint Solving Method Language level memory model issues Applications: Concurrency Analysis Execution validation Race detection Atomicity verification
Operational Approach: UMM Uniform Memory Model • Supports formal verification • Integrates a model checker (Murphi) • Inspired by Park & Dill’s work on Sparc • Employs a generic memory abstraction • To eliminate architecture-specific complexities • Uniform notation • A parameterized method
UMM Abstract Machine Threadi Threadj LIBi LIBj - Only two layers LIB– Local Instruction Buffer GIB– Global Instruction Buffer GIB - GIB can grow as needed Key insight: make it easy to configure program order and visibility order
General Strategy in UMM Enabling mechanism • - Program order may be relaxed to enable • certain interleaving • - Controlled via bypassing table Filtering mechanism • - Visibility order constructed from GIB following • proper ordering requirements • - Enforced in read selection rules
UMM Example: Sequential Consistency Transition Table Program order ready(i) jLIBt(i): pc(j) < pc(i) BYPASS[op(j)][op(i)] = No legalWrite(r, w) op(w) = Write var(w) = var(r) ( w’GIB : op(w’) = Write var(w’) = var(r) time(r) > time(w’) time(w’) > time(w)) Visibility order
Non-Operational Approach: Nemos (Non-operational yet Executable Memory Ordering Specifications) Solutions Declarative (axiomatic) Predicate logic “Higher order” logic Make “hidden” rules explicit Desired Features Easy to understand, flexible Precise Compositional, modular Executable • Key insights • Make the rules higher order - pass down the order relation • through all the rules • - Compositional, reusable, scalable, easy to compare • (2) Make all rules explicit • - Executable using a constraint-programming system
Nemos Example: Sequential Consistency (ops is the execution; order is the ordering relation) Formal Definition of SC legal ops order requireProgramOrder ops order requireReadValue ops order requireWeakTotalOrder ops odder requireTransitiveOrder ops order requireAsymmetricOrder ops order - Program order - Read sees “latest” write - Common total order requireProgramOrder ops order i, j ops. (t i = t j pc i < pc j) (t i = t_init t j t_init) order i j order is repeatedly refined requireTransitiveOrder ops order i, j, k ops. (order i j order j k) order i k Hidden rules are explicit
The Itanium Memory Ordering Rules legal ops order requireLinearOrder ops order requireWriteOperationOrder ops order requirePO ops odder requireMemoryDataDependence ops order requireDataFlowDependence ops order requireCoherence ops order requireReadValue ops order requireAtomicWBRelease ops order requireNoUCBypass ops order
Specification Hierarchy for Itanium • requireCoherence • Local/Local case • Remote/Remote case • requireReadValue • ValidWr • ValidLocalWr • ValidRemoteWr • ValidDefaultWr • ValidRd • requireAutomicWBRelease • requireSequentialUC • RAR Rule • RAW Rule • WAR Rule • WAW Rule • requireNoUCBypasss • requireLinearOrder • Irreflexive • Transitive • Total • Asymmetric • requireWriteOperationOrder • Local/Remote case • Remote/Remote case • requireProgramOrder • Acquire Rule • Release Rule • Fence Rule • requireMemoryDataDependence • MD:RAW • MD:WAR • MD:WAW • requireDataFlowDependence • DF:RAW • DF:WAR • DF:WAW
How to Make an Axiomatic Specification Executable? Test Program SAT Solver CLP SAT QBF Memory Model Specification Constraints UNSAT Execution Validation: validateExecution ops order. legal ops order - Effective for revealing critical properties • - Effective for verifying common programming patterns
Using Constraint Logic Programming (CLP) • Implementation in FD-Prolog is straightforward • Universal quantification handled via enumeration • Existential quantification handled via backtracking • Built-in constraint solver from FD-Prolog: - logical variables - Finite-domain (FD) variables
Precedence matrix M j i x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x Values ofentry Mij: 1: i is ordered before j 0: i is not ordered before j x: value not bound yet How to Encode the Ordering Relation? nnEncoding: The Method: • Given a test program with N operations, use a 2D precedence matrix with N2 constraint variables • Interpret the symbolic execution, impose constraints to the 2D matrix • When interpretation finishes, x values reveal latitude in weak order • When an x changes to a 1, an attempt to set it to 0 later triggers backtracking
Example of Prolog Implementation Formal Specification (e.g., requireProgramOrder) requireProgramOrder ops order i, j ops. (t i = t j pc i < pc j) (t i = t_init t j t_init) order i j SICStus Prolog Code requireProgramOrder(Ops,Order):- for_each_elem(Ops,Order,doProgramOrder). elem_prog(doProgramOrder,Ops,Order,I,J):- nth(I,Ops,Oi), nth(J,Ops,Oj), p(Oi,P_i), p(Oj,P_j), pc(Oi,PC_i), pc(Oj,PC_j), length(Ops,N), matrix_elem(Order,N,I,J,Oij), (T_i #= T_j #/\ PC_i #< PC_j) #\/ T_i #= 0 #/\ T_j #\= 0) #=> Oij.
Interactive and IncrementalAnalysis Itanium Test Program Execution (ops) P1 P2 (1) st_local(a,1); (7) ld(1,b); (2) st_remote1(a,1); (8) ld(0,a); (3) st_remote2(a,1); (4) st_local(b,1); (5) st_remote1(b,1); (6) st_remote2(b,1); Initially, a = b = 0. P1 st a,1; stb,1; P2 ldr1,b; <1> ld r2,a; <0> Can r1 = 1 and r2 = 0? 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 1 0 1 1 x x x x x 0 0 1 x x x x x 0 0 0 x x x x 0 x x x 0 1 1 1 x x x x 0 0 1 1 x x x x 0 0 0 1 x x x x 0 0 0 0 x x x 1 x x x x 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 1 1 0 1 1 1 0 0 1 1 0 1 1 1 0 0 0 1 0 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 0 2 2 Result: legal 3 3 4 4 Interleaving: 8 4 5 6 7 1 2 3 5 5 6 6 7 7 8 8 Order satisfying all constraints An instantiated Order
The SAT/QBF Approach Initially, we “retro-fit” our Prolog version with SAT-generating code - Showed speed improvement in constraint solving, BUT … - Still slow in CNF generation - Very difficult to debug So we re-engineered our tool: (Done by Prof. Ganesh Gopalakrishnan) - “Stamping out” a finite execution as a QBF formula - “Stamping out” a finite execution as a CNF formula - Experimenting different encoding method: nn vs. nlogn - Check pointing SAT generation
Gist of Results • SAT seems to be better than QBF • The nn encoding method is better than nlogn • - despite using more bits • - many unit clauses, good for SAT solving • 2. Check pointing method does pay-off up to 64 tuples • We can easily handle 128 operations • Latest result: completed Intel-provided test run • (experiment done by Hemanthkumar Sivaraj) • - test contains 500 Itanium memory operations • - had to suppress the total-order constraint, UNSAT • - takes 10 sec to generate SAT instance; 0.1 sec to solve • - still lots of room for improvement
Programsemantics + Memory modelsemantics Test Program SAT Solver Constraints Program properties e.g., race / atomicity UNSAT (1) Define both intra-thread and inter-thread semantics as constraints (2) Model correctness properties as additional constraints (3) Reduce a verification problem to a constraint satisfaction problem and solve it automatically How to Verify Programmer Expectations?
Race Detection • What’s a data-race? • Informally: conflicting and concurrent accesses Is this program race-free? Initially, a = b = 0. Are these two instructions conflicting and concurrent? Thread 1 r1 = a; if (r1 > 0) b = 1; Thread 2 r2 = b; if (r2 > 0) a = 1; • Control flow interwoven with memory consistency requirements • Hence, the question depends on the memory model • - Under SC, this program is race-free • - Under a weaker model, this program might contain races
Constraints for Control Flow • Treat control operations similar to memory operations • Imagine “assigns” and “uses” of “control variables” • Add an auxiliary control variableck for each branch statement k, and convert the if-statement to an auxiliary assign of ck • E.g. if(r1>0) becomes c1=r1>0 • Every op k has a path predicatectrExpr • K is a use of those control variables in ctrExpr • k is feasible if ctrExpr evaluates to ture • Feasibility of ops are checked when setting the rules
Data and Control Dependence Data/control flow can be treated similar to global read value rule, i.e.,a read should see the “latest” write Global Reads:for allr = x, exists ax = … Local Reads:for allx = r, exists ar = … Control Reads:for allopthat depends on c, exists ac = … requireReadValue ops order globalReadValue ops order localReadValue ops order controlReadValue ops order
How to Formalize Data-Race? detectDataRace ops scOrder, hbOrder. legalSC ops scOrder requireHbOrder ops hbOrder mapConstraints ops hbOrder scOrder existDataRace ops hbOrder requireHbOrder ops hbOrder requireProgramOrder ops hbOrder requireSyncOrder ops hbOrder requireTransitiveOrder ops hbOrder existDataRace ops hbOrder i, j ops. conflictingAccess i j ¬(hbOrder i j)¬(hbOrder j i)
Atomicity Verification • What’s Atomicity? • Informally: a block of code executed atomically • Neither a necessary nor a sufficient condition for race-freedom • Our approach: • Annotate the atomic block with AtomicEnter and AtomicExit • Verify it automatically • Our definition is generic, can be fine-tuned
Constraints for Atomicity verifyAtomicity ops order. legalSC ops order existsAtomicityViolation ops order existsAtomicityViolation ops order i, j, k ops. matchedAtomicPair i j (t k t i) ¬ (order k i)¬ (order j k)
Conclusion • My thesis addressed the following issues - How to make memory ordering rules clear and executable? • How to analyze thread executions against these rules? Our methods have been shown to be practical - A wide range of academic memory models as well as real-world models (Itanium, JMM) - Validation of test cases far exceeded others’ both in speed and scale - Being applied for post-silicon verification in industry • Many “customers” can benefit from our methods - Software developers, compiler writers, system designers
Publications • Analyzing the CRF Java Memory Model (APSEC’01) • Specifying Java Thread Semantics Using a Uniform Memory Model (JGI’02) • UMM: An Operational Memory Model Specification Framework with Integrated Model Checking Capability (CCPE) Operational Specification Method • Analyzing the Intel Itanium Memory Ordering Rules Using Logic Programming and SAT(CHARME’03) • Nemos: A Framework for Axiomatic and Executable Specifications of Memory Consistency Models (IPDPS’04) • A Constraint-Based Approach for Specifying Memory Consistency Models (sent to TPLP) Axiomatic Specification Method Constraint Solving Method • QB or not QB: An Efficient Execution Verification Tool for Memory Orderings (sent to CAV) Concurrency Analysis • Rigorous Concurrency Analysis of Multithreaded Programs (sent to ISSTA)
Continuing Research Opportunities • Scale-up our approach even further • - Give up certain precision • - Compositional methods • - Create assertion language to help abstraction • Improve solving algorithms • - Exploit the structural information • “Memory-model-sensitive” compilers • - Code synthesis, optimization • Other application domains • - Security, embedded systems
Thank You ! The dissertation is available at http://www.cs.utah.edu/~yyang/papers/thesis.pdf The prototype tools are available at http://www.cs.utah.edu/~yyang/research.html