630 likes | 859 Views
Reconfigurable Computing. Nehir Sönmez 25-11-2004. Reconfigurable Computing. Standard Def inition : A reconf igurable computer is a device which computes by using post-fabrication spatial co m ponents of compute elements. [ Dehon ]
E N D
Reconfigurable Computing Nehir Sönmez 25-11-2004
Reconfigurable Computing Standard Definition: A reconfigurable computer is a device which computes by using post-fabrication spatial components of compute elements. [Dehon] • FPGA implementation of a processor core to run a program is excluded - not spatial mapping of problem. • ASIC implementations excluded – not postfabrication programmable. The definition restricts RC to mapping to fine-grained devices (such as FPGAs). Whereas General Purpose computers compute by making connections in time.
What is Reconfigurable Computing? Computation using hardware that can adapt at the logic level to solve specific problems • Why is this interesting? • Some applications are poorly suited to microprocessor. • VLSI “explosion” provides increasing resources. • Hardware/Software • Relatively new research area.
Spatial Computation • Example: • grade = 0.2 × mt1 + 0.2 × mt2 • + 0.2 × mt3 + 0.4 × project; • A hardware resource • (multiplier or adder) is • allocated for each operator • in the compute graph. • The abstract computation • graph becomes the • implementation template.
Temporal Computation • A hardware resource is • time-multiplexed to • implement the actions of • the operators in the • compute graph. • Close to a sequential • processor/software • solution. Many inbetween • cases exist.
Why is Custom Logic Faster Than Software? • Spatial vs. Temporal Computation • Processors divide computation across time, dedicated logic divides across space
Why is Custom Logic Faster Than Software? • Specialization • Instruction set may not provide the operations your program needs • Processors provide hardware that may not be useful in every program or in every cycle of a given program • Multipliers • Dividers • Instruction Memory • Processors need lots of memory to hold the instructions that make up a program and to hold intermediate results. • Bit Width Mismatches • In general, processors have a fixed bit width, and all computations are performed on that many bits • Multimedia vector instructions (MMX) a response to this
Data Storage (Register File) A B C ALU 64 Microprocessor-based Systems • Generalized to perform many functions well. • Operates on fixed data sizes. • Inherently sequential.
A H B L Functional Unit Reconfigurable Computing If (A > B) { H = A; L = B; } Else { H = B; L = A; } • Create specialized hardware for each application. • Functional units optimized to perform a special task.
Dataflow • Superscalar must find dataflow graph at run time • RC constructs data flow graph at compile time • no logic control overhead • no window size limitations
Implementation Spectrum Microprocessor Reconfigurable Hardware ASIC • ASIC gives high performance at cost of inflexibility. • Processor is very flexible but not tuned to the application. • Reconfigurable hardware is a nice compromise.
LE LE LE LE LE LE LE LE LE LE LE LE Field-Programmable Gate Array Tracks Logic Element • Each logic element outputs one data bit. • Interconnect programmable between elements. • Interconnect tracks grouped into channels.
Logic Element FPGA Architecture Issues • Need to explore architectural issues. • How much functionality should go in a logic element? • How many routing tracks per channel? • Switch “population”?
S S Real World Physical Issues • Modelling FPGA delay. • Improving performance through buffering/segmentation. • Technology dependent. • The cost of reconfigurability. Wires have real cost
Translating a Design to an FPGA C program . . C = A+B . Circuit Array A + C B • CAD to translate circuit from text description to physical implementation well understood. • CAD to translate from C program to circuit not well understood. • Very difficult for application designers to successfully write high-performance applications Need for design automation!
A B for (i = 0; i<n, i++) { . . } C = A+B + C High-level Compilers • Difficult to estimate hardware resources. • Some parts of program more appropriate for processor (hardware/software codesign). • Compiler must parallelize computation across many resources. • Engineers like to write in C rather than pushing little blocks around.
Reconfigurable Hardware Logic Element A B Out • Each logic element operates on four one-bit inputs. • Output is one data bit. • Can perform any boolean function of four inputs 2 = 64K functions! C D A B C D = out 4 2
Xilinx - Spartan II Architecture • IOBs provide the interfacebetween the package pins andthe internal logic • CLBs provide the functionalelements for constructing mostlogic • Dedicated block RAMmemories of 4096 bits each • Clock DLLs for clockdistributiondelay compensationand clock domain control • Versatile multi-levelinterconnect structure
Spartan II Configurable Logic Block • Basic block is a logic cell (LC) • – A 4-input function generator (LUT), • – Carry logic • – storage element. • • Each CLB contains • – four LCs, organized in two similar slices. • – logic that combines function generators toprovide functions of five or six inputs. LUT capacity is completely determined by the number of inputs, not the complexity
A B Co FA Ci LUT LUT S A A B B S Co Ci Ci Example: Two Bit Adder Made of Full Adders A+B = D Logic synthesis tool reduces circuit to SOP form S = ABCi + ABCi + ABCi + ABCi Co = ABCi + ABCi + ABCi + ABCi
LUT LUT ? Circuit Compilation • Technology Mapping • Placement • Routing Assign a logical LUT to a physical location. Select wire segments And switches for Interconnection.
Processor + FPGA Three possibilities daughtercard Proc FPGA chip Backplane bus (e.g. PCI) 1. FPGA serves as coprocessor for data intensive applications. FPGA chip Proc 2. FPGA serves as embedded computer for low latency transfer. “Reconfigurable Functional Unit”
RF FPGA ALU 3. Processor integration Processor + FPGA (cont..) Processor • FPGA logic embedded inside processor. • A number of problems with 2 and 3. • Process technology an issue. • ALU much faster than FPGA generally. • FPGA much faster than the entire processor.
F F F F F F F F F Multi-FPGA Systems • Most applications don’t fit on one device. • Create need for partitioning designs across many devices. • Effectively a “netlist computer” Each FPGA is a logic processor interconnected in a given topology.
Xilinx XC4000 Cell • 2 4-input look-up tables • 1 3-input look-up table • 2 D flip flops
Reconfiguration • Reconfiguration methodology • Static • Partially static (=partial reconfiguration) • Dynamic
The Design Process • Partition a program into sections to beimplemented on hardware and softwareseparately • Synthesize the computations destined forreconfigurable hardware into gate-level orcircuit level description. • Map the circuit onto reconfigurable blocksand connect them using reconfigurablerouting. • After compilation, the circuit is ready for configuration onto the hardware at runtime.
Performance • Power consumption • Flexibility • Programming Specialization RC Objectives • RC objectives: Specialization, performance, flexibility • Basic idea: “Programmable Hardware”
B B A A C C Continuous Routing Structured Routing Reconfigurable Devices • Routing strategies Reconfigurable Computing
Reconfigurable Instruction Set Processors • By including reconfigurability we can increase flexibility with high specialization Reconfigurable Instruction Set Processors Processor PLD Reconfigurable Processor
· · · · · · Task 1 Task K Task K+1 Task N Software Hardware · · · Software Hardware Task 1 Task 2 Task N • Coprocessor based approach • ASIP based approach Reconfigurable Instruction Set Processors
Coprocessor based approach (I) • Typical example: CPU + PCI board • Altera ARC-PCI • Compaq Pamette • System on Chip (SoC) • Altera´s Excalibur device • Chameleon Systems, Inc. Reconfigurable Instruction Set Processors
Coprocessor based approach (II) • Altera ARC-PCI Reconfigurable Instruction Set Processors
Coprocessor based approach (III) • Compaq Pamette Reconfigurable Instruction Set Processors
Coprocessor based approach (IV) • Altera´s Excalibur device • Embedded Processor: ARM, MIPS or NIOS Reconfigurable Instruction Set Processors
Coprocessor based approach (V) • Chameleon Systems, Inc. Reconfigurable Instruction Set Processors
Fetch Decode Issue Integer Unit FP Unit Branch Unit LD/ST Unit Reconfigurable Unit ASIP based approach (I) • Reconfigurable unit within CPU Reconfigurable Instruction Set Processors
C Code Compiler Instruction Description (Configuration bits) Assembly Code ASIP based approach (II) • Challenge: CAD tools Reconfigurable Instruction Set Processors
C Code Compiler Structure C Parsing Optimizations Hardware Estimator Inst. Identification Inst. Selection Hardware Generation Config. Scheduling Code Generation Assembly Code Configuration bits ASIP based approach (III) Reconfigurable Instruction Set Processors
32 32 32 32 32 32 5 5 4 5 Register File ALU MUX Encoded Instruction Word RFU ASIP based approach (II) • Example: Philips CinCISe Architecture Reconfigurable Instruction Set Processors
Why Compute With FPGAs? • Huge performance gap between software and hand-designed hardware systems • Often 100-to-1 ratio of performance or performance/area • Hardware systems not so good for general computing • Big design, cost barriers to implementation • Not practical to buy a new machine every time you want to run a different program • Reconfigurable systems offer best-of-both-worlds • Run-time programmability • Hardware-level performance
Good Applications for Reconfigurable Computing • Relatively small application graph • FPGAs have limited capacity • Simple control flow helps a lot • Data Parallelism • Execute same computations on many independent data elements • Pipeline computations through the hardware • Small and/or varying bit widths • Take advantage of the ability to customize the size of operators
Reconfigurable Computing Successes • RSA Decryption • Programmable-Active-Memory machine set record for decryption of RSA-encrypted data • DNA Sequence Matching • Reconfigurable hardware has achieved 100x better performance than contemporary supercomputers • Signal Processing • FPGA-based filters often get 10x better performance than DSP chips • Benefit from customization of hardware to the application • Emulation • Use reconfigurable logic to simulate new processors at high speeds • Cryptographic Attacks • High-performance low-cost implementations for breaking encryption algorithms