390 likes | 418 Views
This research focuses on the deductive verification of advanced out-of-order microprocessors, using deductive methods and model checking techniques. It explores the complexity of processor verification and aims to automate proofs to relieve users from interactive theorem proving.
E N D
Deductive Verification of Advanced Out-of-Order Microprocessors Shuvendu K. Lahiri Randal E. Bryant Carnegie Mellon University
D E C O D E src1 valid value src1valid src1val src1tag src2valid src2val src2tag dest type pc target predict src2 P C dest imm epc type Memory Unit head tail Branch Unit Arithmetic Unit Mem lsq stq OOO Processor Model Register Rename Unit PC Unit Instruction Mem Branch Predictor Result Bus Reorder Buffer
Complexity of Out-of-Order Processor Verification • Unbounded Data • Integer data paths • Parameterized Computation • Uninterpreted functions and predicates • ALU, ExceptionRaise?, Decoding Logic • Unbounded Data structures • Memory • Ordered Data structures • Highly concurrent • Retire, execute, dispatch happen concurrently • Proving Sequential Semantics • With respect to an Instruction Set Architecture (ISA)
Related Work • Deductive Methods • Theorem prover based • Hosabettu et al. and Sawada et al. • Large proof scripts • Manual intervention to discharge the proofs • Uses “flushing” technique • Compositional Model Checking based • McMillan et al. • Does not apply to deep or superscalar processors • Exploits symmetry in the design • User decomposes the proof • Does not need auxiliary invariants
Earlier Work • Lahiri, Seshia and Bryant FMCAD’02 • Modeling and Verification of Out-of-Order Processors • Simple Out-of-order execution unit • Only arithmetic instructions • All proof obligations handled by decision procedure for UCLID
This work • Apply earlier work to more complex designs • Handle speculation and exceptions • Memory instructions, store forwarding etc. • Superscalar out-of-order processors • Can we model the new components in UCLID? • Load store queues, exceptions • Is refinement based deductive verification feasible ? • Earlier deductive methods use Burch-Dill technique • Recursive “flushing” function • Aarons & Pnueli use “refinement” for simpler models • Can we retain the automation of proofs ? • Relieve the user from interactively proving theorems
FIFO Insert when dispatch Remove when retire Content Addressable Broadcast result to all entries with matching source tag Retire Dispatch result bus ALU execute head tail • Directly Addressable • Select particular entry for execution • Retrieve result value from executed instruction Access Modes for Reorder Buffer • Global • Flush all queue entries when instruction at head causes exception
CLU : Logic of UCLID • Terms (T ) Integer Expressions ITE(F, T1, T2) If-then-else Fun (T1, …, Tk) Function application succ (T) Increment pred (T) Decrement • Formulas (F ) Boolean Expressions F, F1F2, F1F2 Boolean connectives T1 = T2 Equation T1 < T2 Inequality P(T1, …, Tk) Predicate application • Functions (Fun) Integers Integer f Uninterpreted function symbol x1, …, xk . T Function definition • Predicates (P) Integers Boolean p Uninterpreted predicate symbol x1, …, xk . F Predicate definition
Memory M Modeled as Function M(a): Value at location a Initially Arbitrary state Modeled by uninterpreted function m0 Writing Transforms Memory M = Write(M, wa, wd) a . ITE(a = wa, wd, M(a)) Future reads of address wa will get wd M a M wa = M wd a m0 M a 1 0 Modeling Memories with ’s
Simultaneous-Update Memories Update arbitrary subset of entries at the same step Useful for modeling Reorder Buffer Forwarding data to all dependant instructions M(i) P(i+1) is true • • • • • • M(i+1) P(i+2) is true M(i+2) • • • M(j) P(j+1) is true M(j+1) next[M] :=i. ITE(P(i), D(i), M(i)) If entry i satisfies a predicate P(i) it is updated with D(i) M(j+2) P(j+3) is true M(j+3) Modeling Parallel Updates
Simultaneous-Update Memories Update arbitrary subset of entries at the same step Useful for modeling Reorder Buffer Forwarding data to all dependant instructions M(i) P(i+1) is true • • • • • • D(i+1) P(i+2) is true D(i+2) • • • M(j) P(j+1) is true D(j+1) M(j+2) P(j+3) is true D(j+3) Modeling Parallel Updates next[M] :=i. ITE(P(i), D(i), M(i)) If entry i satisfies a predicate P(i) it is updated with D(i)
• • • • • • Modeling Unbounded FIFO Buffer Already Popped • Queue is Subrange of Infinite Sequence • Q.head = h • Index of oldest element • Q.tail = t • Index of insertion location • Q.val = q • Function mapping indices to values • q(i) valid only when hi < t • Initial State: Arbitrary Queue • Q.head = h0, Q.tail = t0 • Impose constraint that h0 t0 • Q.val = q0 • Uninterpreted function q(h–2) q(h–1) head q(h) q(h+1) • • • increasing indices q(t–2) q(t–1) tail q(t) q(t+1) Not Yet Inserted
op = PUSH Input = x • • • next[h] := ITE(operation = POP, succ(h), h) • • • • • • q(h–2) q(h–2) q(h–1) q(h–1) next[h] q(h) q(h) h q(h+1) q(h+1) next[t] := ITE(operation = PUSH, succ(t), t) • • • • • • q(t–2) q(t–2) q(t–1) q(t–1) next[q] := (i). ITE((operation = PUSH & i=t), x, q(i)) t q(t) x next[t] q(t+1) q(t+1) • • • Modeling FIFO Buffer (cont.)
Modeling Components of Processors • Reorder Buffer • FIFO • Instructions in Program Order • Parallel Update memory • Update from an executed instruction • Content Addressable • Load-Store Queue • FIFO • Store Queue • FIFO • Associative lookup by content • Find the latest entry containing an address • Flush part of the queue • Do not flush retired instructions
Verification Approach • Extending the approach in FMCAD’02 • Worked with a simple OOO execution unit • No speculation or memory • Deductive verification
p is proved Deductive Verification • d is the state transition relation, • F describes the initial states • p is the property to be proved, • jis an inductive invariant, which implies p Prove F j Prove jd j’ Prove j p
Restricted Invariants and Proofs • Invariants of the form x1x2…xk (x1…xk) • (x1…xk) is a CLU formula without quantifiers • x1…xk are integer variables free in (x1…xk) • Proving these invariants requires quantifiers |= (x1x2…xk (x1…xk)) y1y2…ym (y1…ym) • Automatic instantiation of x1…xk with concrete terms • Sound but incomplete method • Reduce the quantified formula to a CLU formula • Can use the decision procedure for CLU
Proving correctness • Refinement Maps • Establish relation between OOO and sequential ISA model • A refinement map for each ISA visible state element • Register File • Program Counter • Data Memory • Example • “If a register is not being modified in OOO, then it should have the same value as in the ISA”
Auxiliary Data Structures • Shadow Fields • “Predicts” correct value for OOO state elements • Updated during DISPATCH by ISA machine • Auxiliary Fields • Need to define a consistent internal state of OOO • Does not depend on ISA machine • Usually additional maps
ISA Reg. File PC Adding Shadow State • McMillan, ‘98 • Arons & Pnueli, ‘99 • Provides Link Between ISA & OOO Models • Additional entries in ROB • Do not affect OOO behavior • Generated when instruction dispatched • Predict values of operands and result • From ISA model OOO Reg. File PC Reorder Buffer
Shadow States • Operands and Result of an instruction • Correct values • Shadow Register Rename Unit • Latest non-speculative instruction to modify a register • Shadow Memory Address Map • Latest non-speculative instruction to modify a memory address
Auxiliary Structures • Restricted Invariant Structure • x1x2…xk (x1…xk) • Adding complicated Invariants • For every non-executed memory instruction I in ROB, there exists an entry in the Load-Store Queue (LSQ) • Requires Existential () Properties • Add auxiliary structure as witness for • Add a map - rob_lsq_ptr : ROB LSQ • For every non-executed memory instruction I in ROB, rob_lsq_ptr (I) is present in LSQ
Restricted Invariant Structure x1x2…xk (x1…xk) Adding Complicated Invariants For every non-executed memory instruction I in ROB, there exists an entry in the Load-Store Queue (LSQ) Requires Existential () Properties Auxiliary Structures Add auxiliary structure as witness for • Add a map - rob_lsq_ptr : ROB LSQ • For every non-executed memory instruction I in ROB, rob_lsq_ptr (I) is present in LSQ
Auxiliary Structures • rob_lsq_ptr : ROB LSQ • lsq_rob_ptr : LSQ ROB already part of the model • rob_stq_ptr : ROB STQ, stq_rob_ptr : STQ ROB • Need reverse maps • ld_stq_ptr : ROB STQ • For each Load instruction, the STQ entry that would forward data
Incremental Models • Basic Out-of-order execution unit (base) • Reorder Buffer, Register Rename Unit • Exception Handling (exc) • Arithmetic exceptions • Branch Prediction (exc/br) • Memory Instruction – Simple (exc/br/mem-simp) • Stores commit during RETIRE • Illegal Address exceptions • Memory Instruction (exc/br/mem) • Stores commit sometime after RETIRE
t ROB. t reg.tag(rob.dest(t)) Counterexamples • Strengthen Invariants • Use counter-examples to (manually) strengthen the invariants • Example Invariant : t ROB. reg.valid(rob.dest(t)) • Is the invariant inductive ? • Is it preserved by the transition function ? • Counterexample • rob.hd = 1, rob.tl = 10 • rob.valid[1] = true • t = 5 • rob.dest[5] = r10 • reg.tag[r10] = 1 • reg.valid[r10] = false • operation = retire
Misspeculation Invariants • Predict the instruction that would cause misspeculation • Result of branch misprediction or exception • Shadow entry to keep track of this instruction • shdw_exn_mpred_tag : tag in the ROB • Gets updated from ISA machine during DISPATCH • Reset during a “flush” of the OOO state • Invariants • Earliest misspeculated instruction • Instruction at shdw_exn_mpred_tag should raise an exception or be mispredicted • Others
Ordering Invariants • Maintain Program Order in different data structures • Reorder Buffer • Load Store Queue • Store Queue • Often the source of complicated invariants • For memory instructions I1, I2 • Instruction I1 precedes I2 in Reorder Buffer iff I1 precedes I2 in Load-Store Queue • If instruction I1 depends on instruction I2, then I1 precedes I2 in program order
Load-Store Invariants • Correct Value of a Load (r,A) • If A present in STQ • Value from STQ • If shdw.mem_tag(A) in ROB and A not in STQ • Value of the store • Else • Value from the memory
Shadow Invariants • Relate Shadow Variables to State Variables • t ROB. [rob.valid(t) rob.value(t) = shdw.value(t)] • t ROB. [rob.src1valid(t) rob.src1val(t) = shdw.src1val(t) ] • t ROB. [rob.src2valid(t) rob.src2val(t) = shdw.src2val(t) ]
Comparative Verification Effort • Proof script size substantially smaller • 67KB as opposed to 1909 KB (Hosabettu et al.) • Very little user intervention in discharging proofs • Instantiation of quantifiers • Mostly automatic, few manual for larger examples
Going Superscalar • Superscalar • Dispatch 0… d instructions at each step • Retire 0… r instructions at each step • Complex Control Logic • Additional forwarding in DISPATCH window • Additional forwarding in RETIRE window • Extended the base model
Statistics for Superscalar Models • Does not require any change to proof script • Complicates control logic but the invariants still hold • Scales well with increasing width • Almost linear with the (Dispatch*Retire) width • Instantiation considers terms in (Dispatch + Retire) window
Conclusion • Case study of complex processors in UCLID • CLU expressive enough to model advanced features • Reasonable automation in discharging proofs • Use of automatic decision procedures • Quantification strategy robust • Need to generate invariants • Using Predicate Abstraction • Automatically constructed invariant for OOO-base model given the predicates • Improve desirability for deductive methods
H0 T0 head tail next[head] := case (operation = POP) : succ’(head) ; default : head ; esac next[tail] := case (operation = PUSH) : succ’(tail) ; default : tail; esac succ’ := Lambda x. case x = T0 : H0 ; default : succ(x); esac; next[content] := Lambda i. case (operation = PUSH) & (i = tail) : D ; default : content(i); esac Modeling Circular Queues
Address Data • • • • • • A(h–2) d(h–2) A(h–1) d(h–1) A(h) d(h) h retired A(h+1) d(h+1) • • r A(r) d(r) • • speculative A(t–2) d(t–2) A(t–1) d(t–1) t A(t) d(t) A(t+1) d(t+1) • • • • • • • • • Store Queue • Content Addressable • Look for an address • Same address at multiple index Latest Match • Latest index that matches address Partial Flush • Remove entries after an index
Address Data • • • • • • A(h–2) d(h–2) A(h–1) d(h–1) A(h) d(h) h retired A(h+1) d(h+1) • • r A(r) d(r) • • speculative A(t–2) d(t–2) A(t–1) d(t–1) t A(t) d(t) A(t+1) d(t+1) • • • • • • • • • Store Queue • Content Addressable • Look for an address • Same address at multiple index A1 A2 A3 • •
Quantifier Instantiation • Prove |= (x1x2…xk (x1…xk)) y1y2…ym (y1…ym) • Introduce Skolem Constants (y*1,…,y*m) |= (x1x2…xk (x1,…,xk)) (y*1,…,y*m) • Instantiate x1,…,xk with concrete terms • Assume single-arity functions and predicates • Let Fx = {f | f(x) is a sub-expression of (x1…xk)} • Let Tf = {t| f(t) is a sub-expression of (y*1…y*m)} • For each bound variable x, Ax = {t|f Fx and t Tf} • Instantiate over Axi x Ax2 ...x Axk • Formula size grows exponentially with the number of bound variables