240 likes | 336 Views
Implementation Approaches with FPGAs. Compile-time reconfiguration (CTR) CTR is a static implementation strategy where each application consists of one configuration . Run-time reconfiguration (RTR)
E N D
Implementation Approaches with FPGAs • Compile-time reconfiguration (CTR) CTR is a static implementation strategy where each application consists of one configuration. • Run-time reconfiguration (RTR) RTR is a dynamic implementation strategy where each application consists of multiple cooperating configurations.
Compile Time Reconfiguration • Consist of a single system-wide configuration • Static hardware configuration remains on the FPGAs for the duration of the application • Similar to ASIC from application point of view • Conventional design tools provide adequate support for application development • Examples: Splash, Nano processor
Run Time Reconfiguration • Applications reconfigure hardware resources during application execution • Each configuration implements some fraction of the application • Optimizes hardware resources • Lack of sufficient design tools and a well-defined design methodology
New Design Problems • Divide the algorithm into time-exclusive segments that do not need to (or cannot) run concurrently • Each segment should remain a reasonable amount of time • Tasks should be relatively independent from each other • Coordinate the behavior between configurations • Intermediate result
Two RTR Approaches (1) • Global Approach • Each phase of application is implemented as a single system-wide configuration; it allocates all hardware resources in each configuration step • Relatively simple, coarse grained • Implementation issues • Divide the application into roughly equal-sized partitions • Interfaces between configurations are fixed • Example: RRANN
Two RTR Approaches (2) • Local Approach • Applications locally reconfigure subsets of the logic as the application executes • Flexible, finer granularity • Ability to create fine-grained functional operators • Implementation Issues • Interfaces are not fixed • Designer need to ensure both structural compliance and physical compliance • No good design tool support • Example: DISC, RRANN2
Run-time Reconfiguration Paper (1) • FPGA and Neural Networks • Implementation of random topologies • Training versus operation • Multiple training algorithms • Run-time reconfiguration
Run-time Reconfiguration Paper (2) • Problem: Backpropagation Training Algorithm • Feed-forward stage: • Backpropagation
Run-time Reconfiguration Paper (3) • Backpropagation • Update
Run-time Reconfiguration Paper (4) • Approach 1: • Combine all three stages of execution into the same circuit module and configure this module onto FPGAs • No reconfiguration • Approach 2: • Combine the feed-forward and update stages into one circuit and the backpropagation stage into another. • Reconfigure twice (per cycle)
Run-time Reconfiguration Paper (5) • Approach 3: • Treat feed-forward, backpropagation and update as three circuit modules • Need to reconfigure three times per cycle • Each stage consists of a global controller occupying one FPGA and many neural processors occupying the balance of the available FPGAs • 6 neurons per FPGA
Run-time Reconfiguration Paper (6) • Global Controller • Sequence the execution of local hardware subroutines on the neural processors • Supplying data to the neural processors • Neural Processor • Perform computations • Have six hardware neurons, pre- and post- processing, memory interfacing, local control, and a local RAM
Run-time Reconfiguration Paper (7) • Multiplexed Interconnection • Broadcast bus is used to connect all outputs of neurons on layer m and inputs of neurons on layer m+1 • The Feed-forward Stage • The Backpropagation Stage • The Update Stage
Run-time Reconfiguration Paper (8) • Implementation • Xilinx XC3090 • Host PC • Comparison of space capacity • Option 1: One hardware neuron per XC3090 • Option 2: Four hardware neurons per XC3090 • Option 3: Six hardware neurons per XC3090
Run-time Reconfiguration Paper (9) • Comparison of time efficiency • Option 1: 0ms reconfiguration time • Option 2: 14ms per pass reconfiguration time • Option 3: 21ms per pass reconfiguration time • Time / Space tradeoff • When more hardware is needed, the same space on an FPGA could be reused many times through reconfiguration, but doing so reduces the amount of time that the FPGA could spend executing
Run-time Reconfiguration Paper (10) • Functional Density Metric D • Funtional density is a composite area-time metric used to identify the computational throughput (operations per second) of unit hardware resources • A (area) is measured in the FPGA cell-count of the circuit; operating time (T) is measured as the execution time of the system
RRANN2: Partial Reconfiguration (1) • Runtime reconfiguration (RTR) is an implementation approach that divides an application into a series of sequentially executed stages with each stage implemented as a separate circuit module • Partial RTR extends the approach by partitioning these stages and designing their circuitry such that they exhibits a high degree of functional and physical commonality • By leaving common circuitry resident, transition between configurations can be accomplished by updating only the difference between configurations
RRANN2: Partial Reconfiguration (2) • Design goal • To reach the break-even point with fewer neurons per layer • Advantages • Reduced size of reconfiguration bit-stream is faster to download • Eliminating part of the routing and control circuitry increases hardware neural density • Static versus Dynamic Circuitry
RRANN2: Partial Reconfiguration (3) • Fully static circuitry • Combinational logic • Storage devices (preserves both configuration and current value of the storage device) • Mostly static circuitry • Precision: two devices only differ in their precision • Constant value: two blocks differ by a constant value • Function: two blocks perform logically different functions but their construction is almost identical • Subsets: one block is structurally and functionally contained within the bounds of the other
RRANN2: Partial Reconfiguration (4) • Physical design issues • Each block should contain the same physical implementation and occupy the same position on the device • A common logic block is also constrained by the physical context of its surroundings, many of which might be unknown at design time • Further constrains have to be placed on the design to group the static circuitry to insure the decrease of the resulting bit-stream • No good design-tool support
RRANN2: Partial Reconfiguration (5) • Implementation • Step 1: The circuit modules are placed and routed by hand to physically map the schematics to corresponding FPGA resources • Step 2: The physical representation is converted to downloadable configuration bit streams • Performance (CLAy31) • Reconfiguration time: 600us • Training performance: 4 times the performance of RRANN • FPGA density: 50% more neurons per FPGA than RRANN
Research Issues (1) • Scheduling designs into a Time-multiplexed FPGA • An algorithm is proposed to split a FPGA design into multiple configurations of time-multiplexed FPGAs • ASAP • ALAP • Optimize the scheduler by identifying the units not on the critical path and reschedule their evaluation into other cycles
Research Issues (2) • Wormhole run-time reconfiguration • The means of altering the configuration has relied on global control strategies, which presents a fundamental bottleneck to the potential bandwidth of configuration information flow • Serial configuration: Xilinx 4000 • Random access configuration: CLAy • Wormhole run-time reconfiguration
Research Issues (3) • Interaction of pipeline and reconfigurable FPGAs • An ideal virtualized FPGA would be capable of executing any hardware design, regardless of the size of that design. The execution speed would be proportional to the physical capacity of FPGA, and inversely-proportional to the size of the hardware design • Similar to DISC? • Granularity of swapping unit