460 likes | 836 Views
Logic Simulation : Languages, Algorithms, Simulators . Wolfgang Roesner Verification Tools Development IBM Corp Austin, TX. Hardware Design Languages Modeling Levels - A Taxonomy of HDL Constructs General Purpose HDL Simulators - Event-Driven Sim Improving Simulator Performance
E N D
Logic Simulation : Languages, Algorithms, Simulators Wolfgang Roesner Verification Tools Development IBM Corp Austin, TX
Hardware Design Languages • Modeling Levels - A Taxonomy of HDL Constructs • General Purpose HDL Simulators - Event-Driven Sim • Improving Simulator Performance • Synchronous Design Methodology and Cycle-Based Sim • Let's Design a Cycle-Based Simulator Outline
Logic or "functional" simulation today is done mostly with HDLs (Hardware Design Languages) • Most popular languages today (both are IEEE standards) • Verilog • VHDL • Verilog: • logic modeling and simulation language • started in EDA industry (start-up) in the 80's • was acquired by Cadence • donated to IEEE as a general industry standard • approx. 60% market share in U.S. EDA market • VHDL: • committee-designed language contracted by U.S. (DoD) (ADA-derived) • functional/logic modeling and simulation language • approx. 40% market share in U.S. EDA market Simulation of Hardware Design Languages
Model inputs outputs • Let's look at the common constructs we use to specify the functionality of a piece of hardware: Modeling Levels - Highest Level : Interface input behavior over time output behavior over time t
discrete time discrete value • Temporal Dimension: • continous (analog) • gate delay (psec?) • clock cycle • instruction cycle • events • Data Abstraction: • continuous (analog) • bit : multiple values • bit : binary • abstract value • composite value ("struct") Modeling Levels - Major Dimensions (I)
Functional Dimension: • continuous functions (e.g. differential equations) • Switch-level (transistors as switches) • Boolean Logic • Algorithmic (eg. sort procedure) • Abstract mathematical formula (e.g. matrix multiplication) • Structural Dimension: • Single black box • Functional blocks • Detailed hierarchy with primitive library elements • A good VHDL-centric taxonomy can be found at: http://rassp.scra.org Modeling Levels - Major Dimensions (II)
Continuous Gate Delay Clock Cycle Instruction Cycle Events Multivalue Bit Bit abstract value "struct" Continuous Continous Switch Level Boolean Logic Algorithmic Abstract Mathematical Detailed Component Hierarchy Single Black Box Functional Blocks Temporal Modeling Levels - Major Dimensions (III) Data Functional Structural
Continuous Gate Delay Clock Cycle Instruction Cycle Events Multivalue Bit Bit abstract value "struct" Continuous Continous Switch Level Boolean Logic Algorithmic Abstract Mathematical Detailed Component Hierarchy Single Black Box Functional Blocks Temporal Coverage of Modeling Levels - Verilog Data Functional Structural
Continuous Gate Delay Clock Cycle Instruction Cycle Events Multivalue Bit Bit abstract value "struct" Continuous Continous Switch Level Boolean Logic Algorithmic Abstract Mathematical Detailed Component Hierarchy Single Black Box Functional Blocks Temporal Coverage of Modeling Levels - VHDL Data * Functional Structural * extremely inefficient compared to Verilog
VHDL, Verilog were both defined as simulation languages • Big emphasis on the structural refinement • VHDL: entity/architecture/component/port/signal • Verilog: module/instance/port/signal/reg • Specification of function • General: programming language constructs • VHDL : user-defined data type, package, procedure, function, sequential code • Verilog: function, task, sequential code • Parallelism: • VHDL : process, signal update, wait • Verilog: "always" block, fork/join, wait, event construct • Special purpose H/W constructs • VHDL : concurrent assignment, delayed assignment, signal, Boolean logic • Verilog : continuous assignment, 4-value Boolean logic, switch-level support In HDL Terms
Event-Driven Execution • Before we can look at architectures of simulators we need to understand the execution model that the HDLs imply: General HDL Simulators • VHDL's execution model is defined in detail in the IEEE LRM (Language Reference Manual) • Verilog's execution model is defined by Cadence's Verilog-XL simulator ("reference implementation")
process (trigger) begin if (count<=15) then count <= count + 1 after 1ns; else count <= 0 after 1ns; end if; end process; process (count) begin my_count <= count; trigger <= not trigger; end process; Each process: - loops forever - waits for change in signal from other process An event-driven VHDL example Block 1 Block 2
s(0) <= a(0) xor b(0) after 2ns; c(0) <= a(0) and b(0) after 1ns; s(1) <= a(1) xor b(1) xor c(0) after 2ns; c(1) <= (a(1) and b(1)) or (b(1) and c(0)) or (c(0) and a(1)) after 1ns; sum_out(1 to 0) <= s(1 to 0); carry_out <= c(1); xor => xor => xor and and or and or => and A more hardware-oriented example sum_out(1) s(1) s(0) a sum_out(0) c(0) b carry_out
Let's simulate: a=11 b=01 Time = 0ns (step1) Red Boxes : evaluate in current step s(0) <= a(0) xor b(0) after 2ns; c(0) <= a(0) and b(0) after 1ns; s(1) <= a(1) xor b(1) xor c(0) after 2ns; c(1) <= (a(1) and b(1)) or (b(1) and c(0)) or (c(0) and a(1)) after 1ns; sum_out(1 to 0) <= s(1 to 0); carry_out <= c(1); xor => xor => xor and and or and or => and A more hardware-oriented example sum_out(1) 0 s(1) 1 s(0) a 1 sum_out(0) c(0) b carry_out 1
s(0) <= a(0) xor b(0) after 2ns; c(0) <= a(0) and b(0) after 1ns; s(1) <= a(1) xor b(1) xor c(0) after 2ns; c(1) <= (a(1) and b(1)) or (b(1) and c(0)) or (c(0) and a(1)) after 1ns; Let's simulate: a=11 b=01 Time = 0ns (step2) sum_out(1 to 0) <= s(1 to 0); carry_out <= c(1); xor => xor xor => and and or and or => and A more hardware-oriented example sum_out(1) 0 1 s(1) 1 s(0) a 1 sum_out(0) c(0) b carry_out 1
s(0) <= a(0) xor b(0) after 2ns; c(0) <= a(0) and b(0) after 1ns; s(1) <= a(1) xor b(1) xor c(0) after 2ns; c(1) <= (a(1) and b(1)) or (b(1) and c(0)) or (c(0) and a(1)) after 1ns; Let's simulate: a=11 b=01 Time = 0ns (step3) sum_out(1 to 0) <= s(1 to 0); carry_out <= c(1); xor => xor => xor and and or and or => and A more hardware-oriented example sum_out(1) 0 1 1 s(1) 1 s(0) a 1 sum_out(0) c(0) b carry_out 1
s(0) <= a(0) xor b(0) after 2ns; c(0) <= a(0) and b(0) after 1ns; s(1) <= a(1) xor b(1) xor c(0) after 2ns; c(1) <= (a(1) and b(1)) or (b(1) and c(0)) or (c(0) and a(1)) after 1ns; Let's simulate: a=11 b=01 Time = 0ns (step4) sum_out(1 to 0) <= s(1 to 0); carry_out <= c(1); xor => xor xor => and and or and or => and A more hardware-oriented example sum_out(1) 0 1 1 1 s(1) 1 s(0) a 1 sum_out(0) c(0) b carry_out 1
s(0) <= a(0) xor b(0) after 2ns; c(0) <= a(0) and b(0) after 1ns; s(1) <= a(1) xor b(1) xor c(0) after 2ns; c(1) <= (a(1) and b(1)) or (b(1) and c(0)) or (c(0) and a(1)) after 1ns; Let's simulate: a=11 b=01 Time = 1ns (step1) sum_out(1 to 0) <= s(1 to 0); carry_out <= c(1); xor => xor xor => and and or and or => and A more hardware-oriented example sum_out(1) 0 1 1 1 s(1) 1 1 s(0) a 1 sum_out(0) c(0) 1 1 b carry_out 1 1
s(0) <= a(0) xor b(0) after 2ns; c(0) <= a(0) and b(0) after 1ns; s(1) <= a(1) xor b(1) xor c(0) after 2ns; c(1) <= (a(1) and b(1)) or (b(1) and c(0)) or (c(0) and a(1)) after 1ns; Let's simulate: a=11 b=01 Time = 1ns (step2) sum_out(1 to 0) <= s(1 to 0); carry_out <= c(1); xor => xor xor => and and or and or => and A more hardware-oriented example sum_out(1) 0 1 0 1 s(1) 1 1 s(0) a 1 sum_out(0) c(0) 1 1 1 b carry_out 1 1
s(0) <= a(0) xor b(0) after 2ns; c(0) <= a(0) and b(0) after 1ns; s(1) <= a(1) xor b(1) xor c(0) after 2ns; c(1) <= (a(1) and b(1)) or (b(1) and c(0)) or (c(0) and a(1)) after 1ns; Let's simulate: a=11 b=01 Time = 1ns (step3) sum_out(1 to 0) <= s(1 to 0); carry_out <= c(1); xor => xor xor => and and or and or => and A more hardware-oriented example sum_out(1) 0 1 0 0 s(1) 1 1 s(0) a 1 sum_out(0) c(0) 1 1 1 b 1 carry_out 1 1
s(0) <= a(0) xor b(0) after 2ns; c(0) <= a(0) and b(0) after 1ns; s(1) <= a(1) xor b(1) xor c(0) after 2ns; c(1) <= (a(1) and b(1)) or (b(1) and c(0)) or (c(0) and a(1)) after 1ns; Let's simulate: a=11 b=01 Time = 1ns (step4) sum_out(1 to 0) <= s(1 to 0); carry_out <= c(1); xor => xor xor => and and or and or => and A more hardware-oriented example sum_out(1) 0 1 0 0 s(1) 1 1 s(0) a 1 sum_out(0) c(0) 1 1 1 b 1 carry_out 1 1 1
s(0) <= a(0) xor b(0) after 2ns; c(0) <= a(0) and b(0) after 1ns; s(1) <= a(1) xor b(1) xor c(0) after 2ns; c(1) <= (a(1) and b(1)) or (b(1) and c(0)) or (c(0) and a(1)) after 1ns; Let's simulate: a=11 b=01 Time = 1ns (step5) sum_out(1 to 0) <= s(1 to 0); carry_out <= c(1); sum_out(1) 0 1 0 0 s(1) xor => xor 1 1 s(0) a xor => 1 sum_out(0) c(0) 1 1 1 and and or b 1 carry_out 1 1 and or 1 1 => and A more hardware-oriented example
s(0) <= a(0) xor b(0) after 2ns; c(0) <= a(0) and b(0) after 1ns; s(1) <= a(1) xor b(1) xor c(0) after 2ns; c(1) <= (a(1) and b(1)) or (b(1) and c(0)) or (c(0) and a(1)) after 1ns; Let's simulate: a=11 b=01 Time = 1ns (step5) That's enough - you get the point The statement-(process-) dependencies define a network. Changes are dynamically propagated through the network sum_out(1 to 0) <= s(1 to 0); carry_out <= c(1); xor => xor xor => and and or and or => and A more hardware-oriented example carry_out 0 1 0 0 c(1) 1 1 s(0) a 1 sum_out(0) c(0) 1 1 1 b 1 carry_out 1 1 1 1
The simulator maintains • a list of all atomic executable blocks • a data structure that represents the interconnect of the blocks via signals • a value table that holds all current signal values • At start time the simulator schedules all executable blocks of the models Event-Driven Simulation
Block Code Select next scheduled block Schedule Signal Updates More Blocks? Scheduler Data - Signal Sink Lists - Event Queue Signal - Updates More Blocks? Incr Time Done Core-Architecture of an Event-Driven Simulator
The most obvious bottle-neck for functional verification is simulation throughput • There are two basic ways to improve throughput • Simulator performance • Running many simulations in parallel • Parallelization • Hard : parallel simulation algorithms • much parallel event-driven simulation research • has not yielded a breakthrough • hard to compete against "trivial parallelization" • Simple: run independent testcases on separate machines • Workstation "SimFarms" • 100s - 1000s of engineer's workstations run simulation in the background • ideal parallelization factor Improving Simulation Speed
Full-HDL support • If full cover of VHDL/Verilog is important • Optimizing compiler techniques • treat sequential code constructs like general programming language • all optimizations for language compilers apply: • data/control-flow analysis • global optimizations • local optimizations (loop unrolling, constant propagation) • register allocation • pipeline optimizations • etc. etc. • Global optimizations are limited because of model-build turn-around time requirements • Example: modern microprocessor is designed w/ ~1Million lines of HDL • Imagine the compile time for a C-program w/ 1M lines! Improving Simulator Performance (I)
Full-HDL support • Better scheduling algorithms • scheduling is clearly the bottle-neck in all event-simulators • Use hybrid techniques: • some of the simplifications discussed in the following can be applied to localized "islands" of HDL. • requires an HDL compile process that automatically analyzes the structure of the model and uses "speed-up" modes for sub-partitions • Problem: • assume 50% of the model has such islands • even if we could speed up simulation of those parts to take 0 time, we would only gain a speedup factor of 2x Improving Simulator Performance (II)
s(0 to 2) <= ('0' & a (0 to 1)) + ('0' & b(0 to 1) ); sum_out(0 to 1) <= s(1 to 2); carry_out <= s(0); vs. s(0) <= a(0) xor b(0) after 2ns; c(0) <= a(0) and b(0) after 1ns; s(1) <= a(1) xor b(1) xor c(0) after 2ns; c(1) <= (a(1) and b(1)) or (b(1) and c(0)) or (c(0) and a(1)) after 1ns; • Simplifications • Use higher-level HDL specification Improving Simulator Performance (III)
Principles of using higher-level HDL specification • Common theme: cut down of number of scheduled events • create larger sections of un-interrupted sequential code • use less fine-grain granularity for model structure • -> smaller number of schedulable blocks • use higher-level operators • use zero-delay wherever possible • methodology implications: timing verification is not done together with functional simulation • Data abstractions • use binary over multi-value bit values • multi-value : use only for bus contention situations to resolve several drivers with different strengths (strong, resistive, high-impedance) • use word-level operations over bit-level operations • NEXT : Most Powerful - methodology-based subset of HDLs Improving Simulator Performance (IV)
clock : dependent on critical path xor xor and or and or and • Clock the design only so fast the longest possible combinational delay path settles before cycle is over • Cycle time depends on the longest topological path • Hazards/Races do not disturb function • Longest topological path can be analytically calculated w/o using simulation -> stronger result w/o sim patterns Synchronous Design Methodology
Synchronous design • LSSD : design for testability • Critical delay path defines the clock frequency • Behavioral Function and Timing Correctnesscan be verified independently • Design can be verified independently of its implementation • -> Logic Synthesis • -> Custom Design • -> Synthesizable, high-level HDL as main vehicle for functional verification • IF: Boolean Equivalence Checking proofs closure between functional an implementation view • Functional Verification can use zero-delay functional simulation • --> Cycle-Based Simulation, FSM-based Formal Verification Logic Design Groundrules
Latches Arrays Boolean Logic Network • Logic mapped to a non-cyclic network of Boolean functions • Network also constains state-holding primitives : latch/reg/array • HDL does not contain any timing information • Function can be evaluated by zero-delay signal evaluation • --> Speed of Simulation + Simplicity of Tools Functional View of Synchronous Designs
Levelized Combinational Logic Latches Logic is ordered into levels so that order of evaluation is correct. E.g., A and B are computed before C. Cycle Simulation Algorithm A C B
sum_out(1) s(1) xor => xor s(0) a => xor sum_out(0) c(0) and and or b carry_out and or => and Load temp1, a(0) Load temp2, b(0) Xor temp1, temp2, temp3 Store temp3, s(0) And temp1, temp2, temp3 Store temp3, c(0) Load temp1, a(1) Load temp2, b(1) Xor temp1, temp2, temp3 Load temp4, c(0) Xor temp3, temp4, temp5 Store temp5, s(1) We can cover every Boolean function into a minimal set (~4 or better) instructions As an example, let's assume we compile our circuit into a stream of pseudo instructions.... Why Is This Faster?
RTL (VHDL, Verilog) Language Compile Model Build Physical VLSI Design Tools / Custom Design Cycle-Based Simulation Model Cycle-Based Simulator Formal Verification : Boolean Equivalence Check VERITY Methodology Flow
Latch Inference for sequential HDL • if there is a possible set of signal evaluations that leave the value of a signal undefined -> that signal is assumed to be a storage element • Verification & Logic Synthesis must interpret HDL the same way (check with Boolean Equivalence Checking) Storage Elements
HDL Compilation • Evaluation Sequence • Function Evaluation Implementation Options For Cycle Sim
Preserve the HDL structures • Compile HDL like a programming language • Preserve design hierarchy, processes, modules, functions, procedures • Implementation process very similar to programming language compilers • Incremental processing is trivial • Model optimization is hard (cross-functional boundary) and limited • Map HDL to library of primitive functions (e.g. IBM "Texsim") • Crush design hierarchy to increase optimization potential • Synthesis-like process, but simpler because of missing physical constraints • Incremental model build is very hard • Designer view of hierarchy must be preserved for model debug process HDL Compilation Options For Cycle Sim
Oblivious Evaluation • For every cycle, evaluate all Boolean logic • Minimal amount of book-keeping and runtime control data structures • Large amount of redundant evaluation for large models with low change activity • Event-Driven Evaluation • Evaluate changes only • Minimize redundant work for low-activity models • Book-keeping overhead (But: use synchronicity constraint) • Hybrid Techniques (example: Texsim) • Use design partitions as base granularity for event-driven evaluation • Use “key controling signals” as guards for event-driven evaluation Evaluation Sequence Options For Cycle Sim
Interpreted • Map design network into an efficient data structure which a simulator “walks” at run-time • “Model” is data, highly portable • Few successfull examples where interpretation ofn a datastructure didn’t add significant performance loss due to indirection • Compiled (example: Texsim) • Map design network into a sequence of instructions • Generating “C-source” is not an industrial-strength option • “Model” consists of machine-instructions, platform dependent • Many programming language compiler optimizations apply, but the problem is simpler (more constrained) Function Evaluation Options For Cycle Sim
Flattening of the hierarchy • Network optimizations - using levelized network • Constant propagation • DeMorgan’s law and other Boolean optimizations • Equivalent function removal • Merging of functions with only one fanout • Final levelization • Structural analysis and logic partitioning (e.g. latch-partitioning, below • Create model symbol table and allocate value table • Compile logic by partition and level • Generate reference history for use by register allocation • Generate machine independent code & perform peephole optimization • Perform instruction scheduling • Generate target machine object code IBM's Texsim simulator
These numbers are supposed to give you intuitive feel: • The actual numbers depend on many other factors: • Model activity rate • Test bench activity & implementation language • Multiple clock domains Orders of Magnitude in Speed Event Simulator 1 Cycle Simulator 20 Event driven cycle Simulator 50 Acceleration 1000 Emulation 100000
But we have not talked about many areas: • Model-Build software principles (how to make gigantic model in minutes) • Simulator user interface (how to talk to user and to programs) • How a cycle simulator deals with multi-value signals • do we need those ? Yes, mainly for bus logic and power-on-reset sim • How do we take the cycle-sim algorithm and implement in special purpose hardware : simulation acceleration, emulation • Where is there still a need for pure event-simulation? • Good research: only few papers in the field during the last 10 years • Good place to start is Peter Maurer at Florida State. Extremely interesting for folks who want to explore different algorithms We could write a simulator now...
And... • This field is just one segment of verification tools development which is just one segment of Electronic Design Automation