320 likes | 431 Views
Physical 2D Morphware and Power Reduction Methods for Everyone. Jürgen Becker, Katarina Paulsson, Michael Hübner. Head of Institute: Prof. Dr.-Ing. K.D. Müller-Glaser Prof. Dr.-Ing. J. Becker. Contents. Motivation 2D-Online Placement and Routing Applications Configuration Management
E N D
Physical 2D Morphware and Power Reduction Methods for Everyone Jürgen Becker, Katarina Paulsson, Michael Hübner Head of Institute: Prof. Dr.-Ing. K.D. Müller-Glaser Prof. Dr.-Ing. J. Becker
Contents • Motivation • 2D-Online Placement and Routing • Applications • Configuration Management • Online Routing • Integration and Test • Audio- Streaming Application • Analysis: Power Consumption of Communication Primitives • Routing dependent Power Consumption • Power Aware Routing Strategies • Conclusion and Future Work
Traditional dynamic and partial re-configurable Systems StaticArea Slot 1 Slot 0 Module A IOs Interface Processor Module B Inter- face ICAP ExternalMemory Separation into 2 areas: • Static Area • Dynamic Area DynamicArea Motivation Separation of dynamic areainto Slots: • Height of slots is relatedto chip size Communication structure • Horizontal Bus-Macros Source: Markus Stitz
Disadvantages of traditional Approach Motivation Disadvantages using Slot based approach: • Fixed width of reconfigurable Areas:Specified by the functionality with maximum utilization • Communication only via Bus-Topology • Unutilized area with smaller functions • Optimized for automotive application Introduction of new approach with possibility of 2D-Placement: • Exploitation of complete reconfigurable are • Size of area adjusted to the size of function • NoC to enable intermodule communication and adaptionof topology • Generic approach for a wide range of applications
FPGA FPGA User User - - IP IP MicroBlaze MicroBlaze / / PowerPC PowerPC Interface Interface UART UART to PC to PC Module A Module A OPB OPB - - External External Bus Bus Flash Flash - - Flash Flash Controller Controller Memory Memory Module B Module B Module C Module C HWIcap HWIcap Module D Module D Module E Module E ICAP ICAP 2-Dimensional Approach Slot based approach: Flexible positioning of function blocks within a slot Multiple modules in a Slot M 3 Static Area M 2 M 1 Makro Motivation Configuration Slot Optimized chip-area utilization Decreased delay times High adaptivity Module based approach Autonomous modules Various placement Question: Why not?
Virtex-II Configuration Memory • Configuration memory buildout of frames • Minor Frame configures a fraction of all CLB-resources in a columns • Minor Frame is smallest reconfigurable unit • Dynamic and partial reconfiguration: Only „column wise“ • Solution: Read- Modify- Writeback Method HardwareBasics Major Frame Minor Frame
ICAP- Interface Principle of Read- Modify- Writeback Method Basic requirements • Partial read-back of actual configuration possible • „Glitchless Switching“ Basic principle • Read-back of partial configuration • Manipulation of bitstream • Write-back of manipulated bitstream Challenge: • No documentation of addressingthe several resources Read- Modify- Writeback Method
Manipulation Bitstream JBits- API Bitstream Host- System Serial Interface Serial Interface FPGA ICAP Reverse Engineering with JBits • API for Xilinx Virtex-II configuration bitstream • Manipulation of singe FPGA resources • from complete bitstream • Readback- bitstream Read- Modify- Writeback Method • JBits.read() • JBits.getCLBBits(row,col,source) • JBits.setCLBBits(row,col,source,bits) • JBits.generatePartial()
Features by Elementary-Block Adressing • On- Chip realization • Moving of elementary-blocks • Swapping of elementary-blocks • Fetching of elementary-block • Loading of elementary-block configuration from external memory Readback- Modify- Writeback Methode • Merging of elementary-blocks to completed blocks (Modules) • Moving of modules • Swapping of modules • Loading of modules
Storage of a Module to External Memory Col 0 Col 1 Col 0 Col 1 0 1 2 21 0 1 2 21 0 1 2 21 0 1 2 21 n Memory mapping COL 0 COL 1 Read- Modify- Writeback Method 0 1 2 21 1 2 3 21 ..... ..... ..... ..... ........ ........ n..0 n..0 n..0 n..0 n..0 n..0 n..0 n..0 Module without placement information 0 Module • Cut out of frames from the complete bitstream • Scaling by movement to ground line • Frame by frame storage…
Configurations Management System Properties: • Platform independent for all Virtex-II Devices • Generic, not application specific approach • API for Run-Time system FPGA basis-configuration and + configuration management system have the tasks: • Reception of reconfiguration commands • Feedback of system status • Administration of the reconfigurable area • Management of external memory • Dynamic routing Application
Routing Channel • Includes dynamic routing blocks Module Interface • Interfaces to routing blocks Modul Slot Connection macro • Interface to static part • Interface to routing channel Routing block Type I Type II Type III Online Routing Elements • Variable module position Online Routing Online Routing
Modul • Routing • Exchange: TYPE-I TYPE-II Slot • Un-Routing • Exchange TYPE-II TYPE-I Online Routing Process • Initialization • TYPE-I Blocks • TYPE-III End Online Routing IP-User Interface
Modular Software Configuration Manager Configuration-management Serial Interface Flash Manager Icap Manager Flash Driver Uart Utils Icap Utils EDK- LIB Uart Driver Icap Driver UART FLASH ICAP-PRIMITIVE
Speaker D/A Conv. OPB-GP-IO OPB- UartLite OPB- UartLite Audio Streaming Application Performance of RMW- Methode • Per E-Block = 20 ms • Virtex-II XC2V1000 Device • Frequency of 100 MHz Integration and Test MicroBlaze V PowerPc OPB- HwIcap Memory Controller ExternalMemory Host
BasicModule1X1 BasicModule2X1 BasicModule1X1 External Ports (Example: East_out) 16 Bit Standard Logic Area Allocation Function- module Router- module BasicModule1X2 BasicModule2X2 7 Bit Additional 1 Bit Control Line 8 Bit Data Basic Modules Basic modules with fixed connection points: • Simple router functionality included • Communication interfaces withfixed distance Guaranteed connection toany neighbor module • Inter-module communication Adaptable topology • Generic interface to local application
Design of Reconfigurable System from High Abstraction Level UCF File Design Flow Busmacros Permutation of Slots Bitstream (JBits)
BasisModule1X1 On-Line Placement Process • Control Unit initiates load-process from external memory • Run-time system calculates necessary and available area for placement • After placement communication primitives were established
Implemented System: Virtex-II Pro (XC2VP30) Router Network – ProcessorConnect Macros Router Connect Module PPC System Router Unit Function Unit
Contents • Motivation • 2D-Online Placement and Routing • Applications • Configuration Management • Online Routing • Integration and Test • Audio- Streaming Application • Analysis: Power Consumption of Communication Primitives • Routing dependent Power Consumption • Power Aware Routing Strategies • Conclusion and Future Work
Motivtion Investigation of communication lines on standard reconfigurable hardware regarding energy consumption and performance • Longer communication lines cause higher power consumption (due to capacity) while providing higher performance • Power consumption and delay was estimated for the different lines • This information can be exploited during the routing phase (in future even dynamic) to optimize power consumption while keeping performance conditions Source: Xilinx
Implementation of the signal lines Slice based macros for implementing the different communication lines on a Virtex II – 2000 FPGA (Xilinx) Direct line Double line Hex line Long line
Estimation of the power consumption with XPower XPower for estimating the power consumption based on timing simulation in ModelSim ModelSim: Simulation Generation of VCD- file VCD*-File XPower: Estimation of the power consumption (based on number of signal transitions) *Value Change Dump
Results Power consumption and delays of different FPGA signal lines:
Optimization possibilities in larger FPGA designs Can multiple short lines be more power efficient than one longer line? Example: Hex line or 3 double lines? Long line or multiple hex lines?
1 long line vs. multiple short lines Results from estimating the power consumption: Multiple hex lines instead of 1 long line saves ca. 20% of power consumption!
Exploitation as a routing strategy to minimize power consumption Designs can be analyzed for identification of nets to be optimized under consideration of timing constraints Example: DA- Converter FPGA
Example: DA- Converter DA- converter, DCM and sinus generator; implemented on a Spartan III – 200 FPGA (Xilinx); 250 MHz Implemented with the Xilinx ISE 8.1 tools Long net consisting of hex lines, could be replaced by double lines Worst delay: 1.848 ns Hex line, could be replaced by double lines Worst delay: 1.701ns Optimization possibilities can easiliy be identified in the FPGA Editor; using a text file could simplify this process
Conclusion RP-Platform • Local approach with JBits • Tool for moving and cutting out modules • Addressing of elementary blocks: • On-Chip realization of Read-, Modify and Writeback Method • Loading of modules from external memory • Novel design methods possible • Platform independent API: • Online routing possible • “Regular routing” without macro in dynamic area possible • Simplified Design- Flow • Toolbox to prepare Module • Adaptive Routing Strategies for Power PerformanceTrade-Off • Consideration of Application Requirements (Dataflow) andPlacement (Routing)
Future Work • Increase of performance • Extension of the Run-time system enabling 2D-Approach • Netlist based Online-Routing • Methods for Monitoring Transitions and Run-Time negotiation for Power optimization • Basic Investigation of Novel Reconfigurable Architecture