170 likes | 464 Views
Udo Krautz, Viresh Paruthi, Anand Arunagiri, Sujeet Kumar IBM TM Corporation. Automatic Verification of Floating Point Units. Authors. Udo Krautz, IBM Deutschland, Boeblingen Germany, krautz@de.ibm.com , +49-7031-16-2347
E N D
Udo Krautz, Viresh Paruthi, Anand Arunagiri, Sujeet Kumar IBMTM Corporation Automatic Verification of Floating Point Units
Authors Udo Krautz, IBM Deutschland, Boeblingen Germany, krautz@de.ibm.com, +49-7031-16-2347 Viresh Paruthi, IBM Corporation, Austin TX USA, vparuthi@us.ibm.com, +1-512-286-7922 Anand B Arunagiri, IBM Corporation, Bangalore India, aarunagi@in.ibm.com, +91-80-41777187 Sujeet Kumar, IBM Corporation, Bangalore India, sujkumak@in.ibm.com, +91-80-41777283
Abstract Verification of floating point units (FPU) is one of the most successful applications of formal verification methods. The large and complex data paths and intricate control structures of FPUs makes verification with coverage driven simulation incomplete and error prone. Formal verification (FV) has been successfully leveraged to achieve the high level of quality desired of these critical logics. Typically, FV-based approaches to verify FPUs rely on introducing higher level abstractions to allow reasoning. This however has to be done manually, and quickly becomes tedious for highly optimized bit level implementations on board high performance microprocessors. Automated formal methods working directly on the bit level and providing a full end-to-end check for FPUs exist but are limited to single instructions (issued in an empty pipeline), hence lack in checking control aspects of the logic as those relate to inter-instruction interactions, or pipeline control. In this talk we present an approach based on equivalence checking to overcome the single instruction limitation for automated bit level proofs in the formal verification of FPUs. The sequential execution of instructions is modeled by two instances of the design-under-test. One of these instances acts as a reference model for the other. This allows for a large numbers of internal equivalences to be leveraged by equivalence checking techniques. We show that this method is capable of proving instruction sequences for highly optimized industrial FPU designs. Together with a proof of correctness of individual instructions with model checking it guarantees correctness of the FPU design as a whole. In our experience no other approach can provide the level of automation and ease as the proposed method.
Motivation Floating-Point Units (FPU) inherently difficult to verify: Data path challenges Complex floating-point algorithms and hardware E.g. alignment shifter, leading zero anticipator (LZA), rounding, … Intricate corner-cases E.g. denormal inputs/outputs, cancellation, sticky-bits, … Control complexity Pipelined out-of-order speculative execution, microcode ops, ... Various verification techniques deployed to verify FPUs Incomplete methods to find bugs Rand/manual/targeted testcase generation, coverage analysis, … Bugs may skip into silicon (e.g. Pentium FP bug!) Complete methods (formal) to establish correctness Model checking (automatic) techniques Restricted to a single instruction issue in an empty pipeline (datapath verif) Higher level reasoning Manual with requiring creation of dedicated models (end-to-end verif)
Contribution We propose to enhance automated methods to enable verification of control aspects in addition to the data path Automated end-to-end verification of bit level FPUs Inclusive of control and data path Data path verified with model checking (existing state-of-the-art) Submit a single instruction in an empty pipeline Checks for “numerical correctness” of different ops Control related aspects verified with sequential equivalence checking The design serves as its own reference Instruction sequence submitted to allow inter-instruction interactions Allows leveraging internal equivalence points to alleviate capacity issues Results bear out effectiveness of the approach
Data path Verification Checks numerical correctness of FPU data path IEEE754 standard Implementation constraints (timing, area, power, performance) Fused-multiply-add (FMA) instruction: A*B + C Example bugs: if two nearly equal numbers subtracted (causing cancellation), the wrong exponent is returned if result is near underflow, the wrong guard-bit is chosen Restricted to a single instruction issued in an empty FPU Influence of other instructions not considered Provides complete datapath coverage; remaining verification resources may focus on other aspects (e.g., inter-instruction)
Datapath Verification Testbench A “driver” issues an instruction into real, reference FPUs A “checker” compares the results of the two FPUs for equality FP operations may be bounded by longest-latency operation Verification problem is thus a bounded model check Operands Reference model Real FPU =
Control Verification Verifies pipeline control, complex micro-architectural features Speculative execution, functional clock-gating, blocking, … Example bugs: If a speculatively executed instruction stream should not be executed (e.g. due to branch not taken), does a ‘kill’ generate any side-effects? Does the issue of overlapping instructions cause resource conflicts? Does forwarding of data to subsequent instruction yield wrong result? Requires submission of continuous stream of instructions Activate inter-instruction interactions/dependencies Irrespective of previously executed instructions, or initial state
Control Verification Testbench The design serves as its own “reference” A “driver” issues single instruction in “reference” FPU and additional sequence of instructions in real FPU A “checker” compares correct result of “followed” instruction Verification problem is a sequential equivalence check Internal equivalences can be effectively leveraged Instruction sequence Single instruction (Reference) FPU (Real) FPU =
Conditional Equivalence A single instruction of the sequence is executed in both FPUs Restricted to conditional equivalence (not general SEC) Pipeline stages in which the “followed” instruction is active should be equivalent in a specific cycle Final check only on the result of the “followed” instruction Bounded checking allows to unfold the pipeline – only equivalent pipeline stages should be in result property‘s COI Other instruction Inactive stage Active pipeline stage, followed instruction Followed instruction =
Sequential Equivalence Tenets Several degrees of equivalence/correctness: Identical result of “followed” instruction regardless of initial state Possible with model checking if legal initial states are known Manual computation of initial states tedious for complex pipelines “Followed” instruction not influenced by “residual states” Both FPUs should be equivalent for the “followed” instruction irrespective of a previously executed instruction All timing-windows need to be considered between instructions Requires an infinitesequence of instructions Infinite sequence made finite to allow bounded checking
Verification Technology SAT-based Bounded Model Check Performs a satisfiability check on a k-step unfolded netlist Hybrid SAT-engine Integrates structural netlist transformations, BDDs, simulation, CNF clauses and SAT procedure in one framework Conditional equivalence checking Automatic checkers for pipeline stages getting activated Added for every stage – either proven or disproven Leveraged as “lighthouses” to enable end-to-end SAT check Encapsulated as engines in IBM’s semi-formal tool SixthSense Uses a Transformation Based Verification (TBV) paradigm that maximally exploits synergy between algorithms
Verification Results – Setup Single instruction checks FPU vs high level reference model 45 instructions require case-splits 24 instructions covered by semi-formal 410 instructions fully covered Model: 10k variables/ 100k latches/ 3352k ANDs Instruction sequence checks FPU (sequence) vs FPU (with single followed op) Different types of instruction: Pipelined Fixed latency multicycle Variable latency multicycle 9 scenarios of sequences types defined Two models: B2B issue only Infinite sequences Model: 7,6k variables/ 254k latches/ 1398k ANDs
Results- Single Instruction running on LinuxTM 2.6 64bit, XeonTM E5-2680 2.7GHz
Results – Sequences Followed instruction Irritator instruction Runtime Memory Pipelined (extract exponent) Pipelined (convert decimal integer to decimal fp) 1min:07s 1GB Fixed latency (128bit decimal fp add) 1min:14s 0.94GB Variable latency (convert binary fp to decimal fp) 21min:17s 1.1GB Fixed latency (compare decimal fp) Pipelined (convert decimal integer to decimal fp) 1min:52s 1.1GB Fixed latency (128bit decimal fp add) 1min:22s 1GB Variable latency (convert binary fp to decimal fp) 1:13min:22s 3.6GB Pipelined (convert decimal integer decimal fp) 13min:29s 1.3GB Fixed latency (128bit decimal fp add) 24min:37s 1.8GB Variable latency (convert binary fp to decimal fp) 6h:6min:17s 7GB
Conclusions and Future work Presented an end-to-end automated approach to verify FPUs Inclusive of dataflow and control Dataflow verified instruction-by-instruction against reference Control verified via a sequential equivalence check Future Work Extend B2B sequences to random sequences – cover all possible sequences Random sequences with pipelined instructions solvable Random sequences with multicycle instructions unsolved in 24h Include forwarding of operands Internal equivalences do not hold due to latency differences
Related Work IntelTM uses combination of automatic methods and STE Published in CAV 2009 and FMCAD 2012 Results depict most defects attributed to STE Likely requires manual-implementation specific effort Full details for reproducibility not disclosed Most other works focus on data path verification Focus on specific instructions and design artifacts E.g. FMA instruction together with multiplier Largely manual as rely on methods such as theorem proving Tedious proofs which are implementation specific If automatic use special purpose data structures E.g. Chen’98 uses PHDDs vs SAT/BDDs