Maximizing Sequential Equivalence Checking Efficiency This article explores sequential equivalence checking challeng

Sequential Equivalence Checking : Need and Challenges Anmol Mathur Chief Architect Calypto Design Systems

Outline • Why sequential equivalence checking? • Combinational vs sequential equivalence checking • Existing system-level design/verification flows • System-level to RTL equivalence checking • RTL-RTL sequential equivalence checking • Comparison to sequential property checking • Taming sequential equivalence checking • Demonstration of SLEC from Calypto

Combinational Equivalence Checking • Most prevalent equivalence checking tools available today • Appropriate for RTL to gate-level verification • Expects exact 1-1 flop mapping and matching interfaces

The Power of One-One State Mapping • Very strong inductive invariant • Assuming that the one-one mapped flops (and inputs) are equal at time k, the next state functions (inputs to flops) and output functions (mapped primary outputs) are equal at time k+1 • No state space analysis required • Only combinational input constraints and output don’t cares needed

SLM Algorithmic Manual Process RTL Micro-architecture Imp. System-Level to GDSII Process Flow Manual Process Process Flow User Control Broad Control Limited Control Broad Control

Levels of Systems Software Hardware/software Interface verification Hardware SOC Boundary assertion verification IP Block verification

System-level Models • Higher level of abstraction resulting in faster simulation turnaround time • Performing architectural tradeoffs and performance validation • Platform for software development

Functional Verification Landscape System level Simulation RTL-SL co-simulation • Simulation/emulation based • verification • Assertion based verification RTL level RTL-gate Equivalence Checking Gate level

RTL-RTL Equivalence Checking • Reasons • Incremental development • Feature creep • Performance tuning • Process migration • Reuse across projects • Common Refinements • Buffer/Cache/Memory resizing • Tuning cache replacement and coherence algorithms • Pipeline insertion for performance • Register retiming

Algorithmic Algorithmic SEC Micro-architecture Micro-architecture Sequential Equivalence Checking SLM Process Flow Process Flow RTL Imp. User Control User Control Broad Control Broad Control Limited Control Limited Control Broad Control Broad Control

Sequential Equivalence Checking SLM RTL Interface mapping Interface constraints SEC Abstraction mappings Proved equivalences Counterexamples

Key Advantages of SEC • Allows complete verification of the RTL with respect to the SLM (the independently verified golden model) without testbenches • Verification of the RTL is limited to the behaviors specified in the SLM • Allows verification of RTL blocks to happen before the whole RTL or SLM is completed

SEC vs Assertion Checking • Usability issues with assertion-based RTL verification • Need to write design properties in a formal temporal logic • Properties not independently verifiable • How many properties are enough? • Capturing input constraints and output don’t cares • Technology issues with assertion-based verification • Comparing a complete design against a very incomplete specification (property) • Sequential analysis problem is harder than equivalence checking of two designs!

Outline • Why sequential equivalence checking? • Taming sequential equivalence checking • Notions of sequential equivalence • Key technology challenges • Demonstration of SLEC from Calypto

Re-encoding of state Serial vs parallel interfaces scheduling pipelining Identical data types Composite data types Bit-accurate Precision/rounding differences Combinational vs Sequential EC Sequential EC Sequential differences Combinational EC FFs match Data representation differences

Scheduling • The SLM could perform a computation in parallel while RTL schedules operations in multiple cycles with a scheduling FSM • Introduces additional states due to scheduling • Some operations that can use the same resource become temporally disjoint resulting in resource sharing A B C + SLM + O B A C RTL reset + clk o

Micro-architectural Abstractions • RTL has detailed micro-architectures like: • Scan chains • Sleep mode logic • Clock gating • Memory caches • Bus-based communication along with bus arbitration • Serial communication with handshakes P C SLM P C Handshake controller RTL

Data Type Abstractions • SLM expression (E) could use C-like data types such as float, int, long and user defined data types • RTL expression (E’) could use finite precision bit-vectors (signed/unsigned) to represent fixed-point data values • RTL may explicitly perform rounding and truncation on intermediate computations function sum_of_product( float a, float b, float c) { return a *b + c; } SLM module sum_of_product(a,b,c, out); input signed [7:0] a, b, c; wire signed [15:0] prod; wire signed [7:0] trunc_prod; assign prod = a * b; assign trunc_prod = prod >> 8; assign out = trunc_prod + c; endmodule RTL

Notions of Sequential Equivalence • Cycle-accurate equivalence • Starting from reset, designs produce identical outputs every cycle when equal inputs are applied • Sequential hardware equivalence (Pixley) • Requires equivalence from a set of states reached via an initializing sequence • Safe replacement (Singhal, Pixley, Aziz, Brayton) • No assumption about reset states

SLM Sj Si Refinement mapping Sj’ RTL Si’ Transient states FSM Refinement • The states in RTL that correspond to states in SLM, are referred to as synchronizing states

SLM Refinement mapping RTL Transient states Sequential Equivalence • Starting from corresponding reset states, for corresponding inputs, if the outputs are equal in all corresponding synchronizing states, then SLM and RTL are equivalent

Transactions : State View SL transaction • Encapsulates one or more units of computation for the design being verified • Self-contained since it brings the machine back to a synchronizing state SLM Refinement mapping RTL RTL transaction

Transactions • Functional decomposition of the behaviors of a machine • Transaction 1 : opcodes ADD, SUB, MULT • Transaction 2 : opcodes DIV, MOD • Allows sequential verification problem to be contained • Unconstrained problem is intractable • Verification plan naturally decomposes behaviors • Debugging ease • Allows composition of different behaviors • Sequential composition • Parallel composition

Mem ADDR DATA RD WR Mem Cache ADDR DATA Cache ctl OUT Transaction : Memory RD WR • Design 1 transaction : a single memory read/write occuring in a single cycle • Design 2 transaction: single memory read/write (potentially) happening over multiple cycles Design 1 OUT Design 2

Transaction Equivalence SLM T0 T1 T2 • Starting at reset, for corresponding input sequences, if the design outputs are equal at transaction boundaries, then the designs are equivalent • Transactions can be pipelined RTL T0 T1 T2

Arithmetic Equivalence • Exact Equivalence • For all the possible values in the input space, E and E’ evaluate to exactly the same value • Bounded error equivalence • For all corresponding values in the input spaces of E and E’, |E – E’| < ε • Infinite precision equivalence • Ignoring loss of information at any of the intermediate points in the expressions, the expressions evaluate the same function

Outline • Why sequential equivalence checking? • Taming sequential equivalence checking • Notions of sequential equivalence • Key technology challenges • Specifying interface differences • Sequential analysis • High-capacity solvers • Demonstration of SLEC from Calypto

Specifying Interface Differences • Specification of input/output don’t cares • Sequential signal relationships • Combinational relations between signals • Reset/non-reset values of signals • Input mappings • Factoring delay and throughput differences • Handling protocol differences • Blocking vs non-blocking communication • Serial vs parallel communication

Specifying Interface Differences • Output checks • Handling latency and throughput differences • Conditional output checks • Variable delay or handshake-based checks • Out-of-order checks • Specifying transaction boundaries • When can a new transaction start in the specification or implementation machine • Only differences in the input and output interfaces need to be specified,not the actual input/output protocols

reset RD WR out_rdy Mem Cache ADDR DATA Cache ctl OUT Specifying Interface Differences • Specifying a transaction requires: • Begin-transaction sequence • During-transaction invariants • Ready-for-next-transaction condition • Output valid condition clk reset RD ADDR OUT out_rdy transaction

T0 T1 T2 Output checks = ? = ? = ? Machine acceleration T0 T1 T2 State induction check Sequential Analysis SLM T0 T1 T2 RTL T0 T1 T2

Sequential Analysis Issues • Efficient machine acceleration • Cannot afford to replicate next-state/output functions in unrolling over many cycles • Elimination of pipelining/transient states • Aligning the machines • Accounting for data-dependent delay between synchronizing states • Accounting for out-of-order output checks

Induction and Sequential EC • Base case • The corresponding outputs are equal in transaction 0 assuming the spec and impl machines start in reset states and the input mappings/constraints are obeyed • Induction hypothesis • Assuming that the spec and impl have equal outputs for the first k transactions assuming input mappings/constraints, then the outputs will be identical for the k+1 th transaction • Problem • Corresponding states are no longer known • Induction using the property that all reachable states (from reset) have been explored or purely by using the constraint that the outputs matched in the first k transactions

Inductive Proofs • State-based forward induction • Accumulating reachable synchronizing states during forward symbolic co-simulation • SAT-based forward reachability • Output-based induction • Using equality of outputs in the first k transactions to prove equivalence of outputs at the k+1 th transaction • Strengthening the induction invariant • Mapping flops (user-driven or automatic) • Finding flop maps automatically in the presence of latency/throughput differences • Automatic refinement of cut flops on falsification

Reasons for Incomplete Proofs • Weaker induction hypothesis in sequential EC • No state point or state mapping typically available • Harder solver problems generated since the next state and output checks may require unrolling across multiple cycles • Reset required for base case – over-reset can cause incomplete proofs • Constraints that span across transactions can invalidate induction

Bounded Equivalence Checking • Symbolic simulation of the spec and impl machines for a fixed number of transactions from reset • Bug-finding mode • Coverage metrics • How to quantify confidence in the equivalence of the machines from a bounded k-transaction proof?

Solvers Word-level Solver Hybrid Solver Simulation Bit-level Solver

Run time Expression size Word Level Solver • Word-level solver • Strength : proving arithmetic expressions equivalent • Weakness : generating counterexamples • Bit-level solver (SAT-based) • Strength : proving expressions not equivalent • Weakness : proving arithmetic expressions equivalent BLS WLS

Finite Precision Reasoning • The nice algebraic properties of the + , * are not true when arithmetic computations are done using finite precision wire signed [7:0] a,b,c; wire signed [7:0] tmp; wire signed [8:0] out; assign tmp = a + b; assign out = tmp + c; wire signed [7:0] a,b,c; wire signed [7:0] tmp; wire signed [8:0] out; assign tmp = b + c; assign out = tmp + a; != a = 27 – 1 b = 1 out = -1 c= -1 a = 27 – 1 b = 1 out = 27 – 1 c= -1

Finite Precision + Control • Mixed control-arithmetic reasoning • Infinite precision canonization • Combination of theories: • Finite precision arithmetic • Propositional logic if ( a + b < 0) x = a; else if (a + b > 0) x = b else x = a + b; if ( a == -b) x = 0; else if ( a > -b) x = b; else if (a < -b) x = a;

WLS – BLS Interface • Leveraging word-level information in bit-level solvers • Exploiting word-level symmetry information • Using information about bits in a bus for variable ordering in BDDs and decision ordering in SAT • Intelligent ordering of bit-level problems based on word-level analysis

Cuts Simulation + Formal Solvers • Simulation – intermediate equivalent points • Word and bit-level solvers work together • Cut sets based on proven intermediate equivalences for proof simplification E1 E2

Open Issues in Solvers • Efficient identification of PENs in the presence of latency and throughput differences in the designs • Which PENs to prove? • Ordering of PEN proofs • Use of predicate abstraction to simplify arithmetic-heavy proofs

Outline • Why sequential equivalence checking? • Taming sequential equivalence checking • Notions of sequential equivalence • Key technology challenges • Demonstration of SLEC from Calypto

SystemC Verilog Sys.Verilog language X • Loop Unrolling • Dependence Analysis • Flop/Mux inferencing • Constant Propagation • Dead code elimination • Smart memory modeling CDB API CPT API Frontend Architecture • Language Neutrality • Support multiple languages scalably • Language independent transforms CPT CPT to CDB xforms CDB SLS Synthesis Engine SLEC Verification Engine Future Products

Setup CDB Orchestration Proof Decomposition Simulation Engine Structural Decomposition Name based mappings Inductive Analysis Sequential Analysis Convergence Analysis Machine Acceleration Fixed point Analysis Solver WLS BLS WSAT BDD SAT Simulation IPBDP Verification Engine Architecture

Demonstration Example • DES Encryption Block • Symmetric key encryption/decryption • 64 bit data message, 64 bit key • 16 rounds of computation

RTL – System Continuum • C0 – Untimed Functional (C/SystemC) • C1 – Timed Functional (SystemC) • V2 – Serial RTL (Verilog) • V3 – Pipelined RTL (Verilog)

Maximizing Sequential Equivalence Checking Efficiency This article explores sequential equivalence checking challeng