120 likes | 218 Views
Reconfigurable Computing - Performance Issues. John Morris Chung-Ang University The University of Auckland. ‘Iolanthe’ at 13 knots on Cockburn Sound, Western Australia. FPGA Architectures. Design Flow Good engineering practice requires that design exercises should follow a defined procedure
E N D
Reconfigurable Computing -Performance Issues John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound, Western Australia
FPGA Architectures • Design Flow • Good engineering practice requires that design exercises should follow a defined procedure • User’s specification • This is your starting point • It may take several forms • Informal requirements given to you by your user / client / … • Formal written requirements • All functional and non-functional requirements are precisely stated • Sometimes resulting in a very large (and dull) document! • Something in between • Your tutorial assignment was in this category • Mostly formal, but with some gaps you would need to fill in • Using research / further discussion with client / … etc
Typical FPGA Architecture • Logic blocks embedded in a ‘sea’ of connectionresources • CLB = logic blockIOB = I/O bufferPSM = programmable switch matrix • Interconnections critical • Transmission gates on paths • Flexibility • Connect any LB to any other • but • Much slower than connections within a logic block • Much slower than long lines on an ASIC • Aside: • This is a ‘universal’ problem - not restricted to FPGAs! • Applies to • • custom VLSI, • • ASICs, • • systems, • • parallel processors • Small transistors high speed high density long, wide datapaths
Logic Blocks • Combination of • And-or arrayorLook-Up-Table (LUT) • Flip-flops • Multiplexors • General aim • Arbitrary boolean function of several variables • Storage • Adders are critical • All modern FPGAs have‘fast carry logic’ • High speed lines connectingLBs directly • Very fast ripple carry adders
an-1 a1 bn-1 b1 an-2 a0 bn-2 b0 FA FA FA FA cout cout cin cin cout cout cin cin sn-1 s1 sn-2 s0 carryout Ripple Carry Adder • The simplest and most well known adder • Time to complete • n x propagation delay( FA: (a or b) carry ) • We can do better than this - using one of many known better structures • but • What are the advantages of a ripple carry adder? • Small • Regular • Fits easily into a 2-D layout! Very important in packing circuitry into fixed 2-D layout of an FPGA!
an-1 a1 a3 bn-1 b3 b1 an-2 a2 a0 bn-2 b2 b0 FA FA FA FA FA FA cout cout cout cin cin cin cout cout cout cin cin cin sn-1 s1 s3 sn-2 s2 s0 carryout LB LB LB Ripple Carry Adders • Ripple carry adder performance is limited by propagation of carries But these signals would need to be carried by the generalrouting resources (slow!) (In fact, you can’t fit a 2-bit adder with carry out in a CLB because there aren’t enough outputs! A 2-bit adder fits in a Xilinx CLB (enough logic for 5 inputs and 2 outputs) The fast carry logic provides special (low R) lines for carry-in and carry-out fast adder with 2 bits/CLB
‘Fast Carry’ Logic • Critical delay • Transmission of carry out from one logic block to the next • Solution (most modern FPGAs) • ‘Fast carry’ logic • Special paths between logic blocks used specifically for carry out • Very fast ripple carry adders! • More sophisticated adders? • Carry select • Uses ripple carry blocks - so can use fast carry logic • Should be faster for wide datapaths? • Carry lookahead • Uses large amounts of logic and multiple logic blocks • Hard to make it faster for small adders!
Carry Select Adder a4-7 b4-7 0 cin a0-3 cout7 b0-3 n-bit Ripple Carry Adder sum04-7 cout3 n-bit Ripple Carry Adder 1 b4-7 cout7 n-bit Ripple Carry Adder sum0-3 sum14-7 ‘Standard’ n-bit ripple carry adders n = any suitable value 0 1 0 1 Here we build an 8-bit adder from 4-bit blocks carry sum4-7
These two blocks ‘speculate’ on the value of cout3 This block adds the 4 low order bits After 4*tpd it will produce a carry out Carry Select Adder a4-7 b4-7 0 cin a0-3 cout7 b0-3 n-bit Ripple Carry Adder sum04-7 cout3 n-bit Ripple Carry Adder 1 b4-7 cout7 n-bit Ripple Carry Adder sum0-3 sum14-7 One assumes it will be 0 the other assumes 1 0 1 0 1 carry sum4-7
This block adds the 4 low order bits After 4*tpd it will produce a carry out Carry Select Adder • After 4*tpd we will have: • sum0-3 (final sum bits) • cout3 (from low order block) • sum04-7 • cout07 (from block assuming 0 cin) • sum14-7 • cout17 (from block assuming 1 cin) a4-7 b4-7 0 cin a0-3 cout7 b0-3 n-bit Ripple Carry Adder sum04-7 cout3 n-bit Ripple Carry Adder 1 b4-7 cout7 n-bit Ripple Carry Adder sum0-3 sum14-7 0 1 0 1 carry sum4-7
Carry Select Adder a4-7 b4-7 0 cin a0-3 cout7 b0-3 n-bit Ripple Carry Adder Cout3 selects correct sum4-7 and carry out sum04-7 cout3 n-bit Ripple Carry Adder 1 b4-7 cout7 n-bit Ripple Carry Adder sum0-3 sum14-7 0 1 0 1 All 8 bits + carry are available after 4*tpd(FA) + tpd(multiplexor) carry sum4-7
Carry Select Adder • This scheme can be generalized to any number of bits • Select a suitable block size (eg 4, 8) • Replicate all blocks except the first • One with cin = 0 • One with cin = 1 • Use final cout from preceding block to select correct set of outputs for current block