710 likes | 1k Views
Part III. Logic Emulation. What is a Logic Emulation System?. 1. A programmable hardware built with programmable logic (FPGA) and programmable interconnect devices (PID). 2. A software which automatically programs the hardware according to the circuit under design
E N D
Part III Logic Emulation
What is a Logic Emulation System? 1. A programmable hardware built with programmable logic (FPGA) and programmable interconnect devices (PID). 2. A software which automatically programs the hardware according to the circuit under design 3. Control HW/SW to support operation of the emulated design as a hardware component operating in real time.
Workstation Target System Logic Emulator In-circuit Interface Logic Module Probe Module Typical Logic Emulation Environment Compiler, runtime software Stimulus generator, logic analyzer
Why we need Logic Emulation? • Design verification issues. • Real-time operation. • System-level testing. • Rapid prototyping.
Design Verification Issues • Simulation-based verification methods have run out of steam when chip complexity grows. • Emulation is a verification technology that grows along with design size.
Real-Time Operation • Simulation requires test vector development which is costly and difficult. • Verification depends on test vector correctness. • Certain applications must be verified in real time - human perception: audio and video. • Emulation connected to actual hardware can run: • real diagnostic code, • operating systems, and • applications.
System-Level Testing • Often the chip meets its specifications but it fails in the system. • We have to verify the system-level interactions between the chip and other components. They are hard to formalize. • Internal probing is impossible when the chip is fabbed and placed in a system • But it is possible using emulation.
Rapid Prototyping • Once emulated design is debugged it is available for immediate use by software developers for software debugging. • Emulated design is available for demo and experiments with architecture on real applications and data.
Programmable interconnect Interface Logic element Logic element Memory element VLSI core Programmable Hardware includes programmable interconnect
Considerations for programmable interconnect • The capacity of logic and interconnection depends on package constraints. • This forces a hierarchical system. Chips => boards => boxes => system • The interconnect structure must: 1. Provide successful connectivity, 2. Maximize FPGA utilization, and 3. Minimize delay and skew. • Rent’s rule applies to predict the interconnect needs.
Structures of Multi-FPGA Systems • Topologies: - Mesh - nearest neighboring. - Crossbar - full and partial. • Interconnect scheme: - Circuit switched. - Time multiplexed.
FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA Nearest Neighbor Interconnection
Advantages and Disadvantages of Nearest Neighbor Interconnection • Advantages: • Uniform: all chips the same. • Easy to lay out on PCB. • Disadvantages: • Routing is easily blocked. • The “through pins” limit the logic utilization of FPGAs. • Long and unpredictable delays. • No natural hierarchical extension.
FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA Nearest Neighbor Extensions Connect to non-neighbors Add more neighbors
Advantages and Disadvantages of nearest-neighbor extended architectures • Advantages: • More choices for router by adding diagonal lines & skip lines. • Disadvantages: • More complex PCB. • More complex routing software.
A B C D A B C D A B C D A B C D Partial Crossbar Interconnect Logic blocks Crossbars C pins B pins D pins A pins Second-level crossbars
Partial Crossbar Interconnect • Partial crossbar consists of a set of small full crossbars, • connected to logic blocks • but not to each other. • I/O pins of each FPGA are divided into subsets. • Each subset is connected by a full crossbar circuit switch. • Partial crossbar is a potentially blocking network.
Characteristics of “Partial Crossbar Architecture” • Partial crossbar’s size is proportional to the number of FPGA pins. • All interconnections go through one/three crossbar chips for a one-level/two-level partial crossbar interconnect – • delays are uniform and bounded.
Global FPIC Global FPIC Local FPIC Local FPIC Local FPIC FPGA FPGA FPGA FPGA FPGA FPGA Mixed Full and Partial Crossbar External connections Partial crossbar Full crossbar
Circuit Switched versus Time Multiplexed Interconnect Schemes • Trade-offs between the operating speed and the hardware cost. • Time-multiplexing method: • can greatly expand available interconnect. • allows lower cost IC package and PCB. • makes partitioning easier. • BUT • System power increases due to frequent signal switching (higher hardware cost). • Complex scheduling software. • Slow operating speed.
Virtual Wires FPGA FPGA Logical outputs Logical inputs Physical wires FPGA FPGA DeMux Mux I change space to time
Logic Emulation Systems and their interconnection schemes • System with mesh topology - Quickturn’s RPM and Virtual Machine Works (IKOS). • System with partial crossbar - Quickturn’s Enterprise, Mars, and System Realizer. • System with mixed full and partial crossbar - Aptix Prototyping System. • System using time-multiplexed interconnect - Virtual Machine Works (IKOS) , CoBALT and Arkos (Quickturn).
Memory Solutions in Emulators and future devices/systems • Goal: programmable memories with different width/depth/port combinations. • FPGA-based memories: • inefficient of using logic resources. • timing correctness is difficult to be insured. • large or highly multi-ported memories must be partitioned across several FPGAs. • SRAMs with dedicated or programmable controllers.
HDL synthesis Pre-configuration preparation Synthesis Partitioning System mapping Full-chip configuration P & R Design downloading In-circuit emulation Emulators Logic Emulation Design Flow
Logic Emulation Design Compiler and its components • Logic emulation design compiler is a large and complex EDA tool which includes: • Front-end design importer. • HDL-based synthesizer. • Clock and timing analyzer. • Partitioner. • System-level placer and router. • FPGA-based placer and router.
Objectives of logic emulation compiler • Fast compilation time. • Fast emulation clock. • Timing correctness. • Easy (ECO ENGINEERING Change Order). • Minimize circuit size.
Design Considerations for Logic Emulators • HDL synthesis: • Trade-off run-time and quality. • CLB-based vs. gate-based designs. • Clock and timing analysis: • Timing correctness, hold-time violation free. • Clock skew minimization. • Partitioning: • Run time. - • Timing and area.
Design Considerations for Logic Emulators • System placement and routing: • Timing. • Completeness of routing. • FPGA-based placement and routing: • Fast run time. • Parallel compilation. Remember you emulate not the same logic as your design
Q Q D D CLB CK CK Routing delay LUT Hold-Time Violation Clock distribution problem (Skew)!!! Hold-time violation occurs when Routing delay > LUT delay!!!
Q Q D D CK CK LUT Timing Correctness Delay insertion Delay element CLB Routing delay
Timing Correctness Use clock enables for gated clocks Q Q D D LUT CK CK CE CLB Clock path Primary clock Low-skew net
Methodology and components of Logic Emulator System • Pre-configuration preparation - prepare netlists and control files for configuration. • Testbed preparation - prepare emulation-based operation environment. • Full-chip configuration - download design to the emulator. • In-circuit emulation - test the design.
Pre-Configuration in Emulator System • Translate the leaf-cell libraries into emulation primitives. • Translated libraries must be verified for functional equivalence to original. • Modify and redesign some components to attain compatibility with emulation techniques, such as precharge logic circuits. • Assemble all the gate-level netlists for the entire design.
Testbed in Logic Emulator • Design and implement the target ICE boardcombining the emulated design with real hardware. • Slowdown testbed to emulation speed. • Assemble the testbed and emulation equipment.
Full-Chip Configuration & In-Circuit Emulation • Full-chip configuration: • Prepare control files. • Partition the design to fit into the emulation system. • Download design into the system. • Verify that the emulation model faithfully implements the design as specified by RTL. • In-circuit emulation
Part IV Reconfigurable Computing and Systems
General-Purpose Computing vs. Custom Computing • General-purpose computing - applying applications on a general-purpose computer. • Custom computing - applying applications on a custom-made application-specific hardware. • Field-programmable devices make this into a reality.
Goals of Reconfigurable Computing • Tailor the architecture to the application. • Minimize or eliminate instruction interpretation. • Exploit fine grained parallelism. • Map software to hardware.
Applications of reconfigurable computing • Database search and analysis. • Image processing and machine vision. • Data compression. • Signal processing. • Neural networks. • Biology computing. • Medical computing. • Design Automation (PSU) • Many more.
Multi-Mode Systems map various applications to a reconfigurable system ROM Reconfigurable system Application 1 Application 2 • Different configurations for read & write • operations of a tape driver (Honeywell). • Different configurations for different • printer controllers (Tektronix).
Truck? Run-Time Reconfiguration in military image recognition system Jeep? Image data I/O ? Tank? • Break single computation into multiple pieces. • Page in components as needed (virtual hardware), • ex., automatic target recognition.
Custom Computing • Application-specific systems. • Numerous applications for similar reconfigurable systems. • Offers hardware performance, flexibility to handle numerous algorithms. • Multi-FPGA systems can be viewed as hardware supercomputers. Tell about DEC Perle
Program 1 Inst1 Processor Coprocessor Reconfigurable Co-processors Program 2 Inst2 - Provide custom instructions on a per-application basis.
Standalone PU Coprocessor CPU I/O interface Memory caches Types of Reprogrammable Systems Three ways to attach custom computing units Attached processing unit PU = processing Unit
Types of Reprogrammable Systems • Attached and standalone processing units are reprogrammable systems on computer add-on cards and separate reprogrammable cabinets. • Considerations: large communication overhead may over-shadow the speed gain. • Application-specific coprocessors can achieve significant improvement over a wide range of applications.
Types of Reprogrammable Systems • Integrate the reprogrammable logic into the processor itself. • A reprogrammable functional unit can be configured on a per-algorithm basis. • Providing some special-purpose instructions tailored to the needs of a given application.
Architectures of Multi-FPGA (Reconfigurable) Systems • The most commonly used topologies: • Mesh: 1D (linear array), 2D, and 3D. • Crossbar: full, partial, mixed, and hierarchical. • Hybrid between mesh and crossbar. • Application-specific architecture.
Ext. Interface Ext. Interface FPGA FPGA FPGA FPGA FPGA RAM RAM RAM RAM 16 FPGAs Hybrid Topology of a reconfigurable system Splash 2: augments a linear array of FPGAs with a crossbar switch. Goal: Supporting systolic circuits.
FPGA FPGA FPGA FPGA RAM RAM RAM Hybrid Topology Host interface Anyboard: A linear array of FPGAs augmented by global buses.
Hybrid Topology RAM Host interface RAM 4 X 4 mesh of FPGAs RAM RAM DECPeRLe-1: a 4 X 4 mesh of FPGAs augmented with shred global buses.