720 likes | 741 Views
May 14, 2004 , TU Tallinn, Estonia. Reconfigurable HPC part 3 Architectural Resources. Reiner Hartenstein TU Kaiserslautern. terms:. DPU: datpath unit DPA: data path array rDPU: reconfigurable DPU rDPA: reconfigurable DPA. Converging Design Flows.
E N D
May 14, 2004 , TU Tallinn, Estonia Reconfigurable HPCpart 3Architectural Resources Reiner Hartenstein TU Kaiserslautern
terms: DPU: datpath unit DPA: data path array rDPU: reconfigurable DPU rDPA: reconfigurable DPA Converging Design Flows the same synthesis method may be used for mapping an algorithm onto both: rDPA [Kress, 1995], and DPA [Broderson, 2000]: this synthesis method is a generalization of systolic array synthesis: super systolic synthesis 2
>> Time to space migration << • Time to space migration • Flowware languages • Data Sequencers • Sequencing through 2-D memory • MoM architecture • Acceleration mechanisms http://www.uni-kl.de 3
Time to space migration of algorithms Problems in time to space migration of algorithms Some have moderate interconnect requirements Many DSP algorithms require just a pipeline Some algorithms require excessive interconnect Example: the Viterbi algorithm A comprehensive taxonomy of algorithms is missing 4
Foundries offer up to 9 metal layers and up to 3 poly layers Intel IC interconnect: metal layers Reconfigurable interconnect fabric layouted over the rDPU cell 5
Select mode, number, width of NNports Select Function Repertory 16 8 32 rout-through only rout-through and function + 24 2 rDPU more NNports: rich Rout Resources select Nearest Neighbour (NN) Interconnect: an example 4 Examples of 2nd Level Interconnect: layouted over rDPU cell - no separate routing areas ! KressArray Family generic Fabrics: a few examples http://kressarray.de 6
Xplorer Application Set KressArray Xplorer (Platform Design Space Explorer) ALE-X Compiler expr. tree interm. form 2 ALEX Code Compiler Architecture Estimator User HDL Generator Simulator Suggestion VHDL Verilog User Interface Selection Design Rules Architecture Editor interm. form 3 Mapper Improvement Proposal Generator Mapping Editor Datapath Generator Generator Mapper data stream Schedule Scheduler Kress rDPU Layout Scheduler Delay Estim. Sug- gest- ion statist. Data Power Estimator DPSS KressArray family parameters Power Data Inference Engine (FOX) Analyzer KressArray DPSS published at ASP-DAC 1995 7
SNN filter KressArray Mapping Example http://kressarray.de rout thru only array size: 10 x 16 = 160 rDPUs not used backbus connect 9
operator operand [13] + 2 hor. NNports, 32 bit route thru result 3 vert. NNports, 32 bit operand backbus connect route-thru-only rDPU Xplorer Plot: SNN Filter Example http://kressarray.de 10
Communication resource editor panel of the Xplorer user interface 11
Elements of the Xplorer mapping editor:a) Routing editor panel 12
Elements of the Xplorer mapping editor: b) Input port editor panel 13
DPSS Scheduler specifies and assembles the data streams from / to array KressArray DPSS (Datapath Synthesis System) FPGA-Style Mapping for coarse grain reconfigurable arrays Compiler Mapper 17
Ulrich Nageldinger http://hartenstein.de/Ph-D-Theses.html infineon technologies, Munich Dissertation Ulrich Nageldinger: • ... on mapping applications onto KessArrays • ... simultaneous routing and placement by simulated annealing • Supporting a huge family of KressArrays • fuzzy logic improvement proposal generator • profiling • design space exploration 18
>> Flowware languages << • Time to space migration • Flowware languages • Data Sequencers • Sequencing through 2-D memory • MoM architecture • Acceleration mechanisms http://www.uni-kl.de 19
Similar Programming Language Paradigms very easy to learn 20
goto PixMap[1,1] HalfZigZag; SouthWestScan uturn (HalfZigZag) x EastScan is step by [1,0] end EastScan; y SouthScan is step by [0,1] endSouthScan; NorthEastScan is loop8 times until [*,1] step by [1,-1] endloop end NorthEastScan; HalfZigZag HalfZigZag data counter data counter HalfZigZag is EastScan loop 3 times SouthWestScan SouthScan NorthEastScan EastScan endloop end HalfZigZag; data counter data counter *> Declarations JPEG zigzag scan pattern Flowware language example (MoPL) SouthWestScan is loop8 times until [1,*] step by [-1,1] endloop end SouthWestScan; The same language principles 21
MoPL-3 Grammar • The MoPL-3 Grammar ... of ... • the Map-oriented Programming Language version 3 (MoPL-3), a data-procedural programming language • to specify functions and operators to be mapped onto a DataPath Array (DPA) or other pipe network (hardwired as well as reconfigurable) • and to procedurally program data streams associated with these functions or operators 22
1. Program Definition Declaration Part MoPL Subroutine 2. Boundary Declarations Boundary Declaration Array Declaration 15 3 16 4 19 5 rALU = rDPU MoPL grammar 1 (14): 1. Program Def.2. Boundary Decl‘s 23
Compound Window Declaration SW Group Name SW = Scan Window Window Spec Window Names 27 Window Size MoPL grammar 2 (14): 3. Scan Window Decl‘s 3.Scan Window Declarations 24
rALU Config Top Structure rALU Name Structural Part Do Structure While Structure 17 MoPL grammar 3 (14): 4. rALU Set-up Decl‘s 4. rALU Set-up Declarations 15 25
Sub Structure Condition 25 Sub Structure List Set Structure Local Branch Flag If Structure MoPL grammar 4 26
rALU Activation Ident rALU Subnet Name MoPL grammar 5 (for missing production rules see Ph. D. thesis by Jürgen Becker) http://hartenstein.de/Ph-D-Theses.html 27
Compound Scan Pattern Decl Simple Pattern Decl Pattern Name rALUsubnet Flag Local Branch Flag MoPL grammar 6 (14): 5. Scan Pattern Decl‘s 5. Scan Pattern Declarations 22 28
Scan Statement Part Scan Statement Block 15 Scan_Pattern_Name Scan_Window_Name 16 MoPL grammar 7: 6. Scan Statement Decl‘s 6.Scan Statement Declarations 29
Scan Pattern Call Array Name Scan Statement MoPL grammar 8 (14) 18 30
Scan Pattern Sequence Pattern Spec Scan Action MoPL grammar 9 (14): 7. Scan Actions 7. Scan Action Declarations 24 23 24 24 24 31
Shortest Step Transformation Simple Scan Stretching t.b.d. t.b.d. Shearing MoPL grammar 10 (14) 32
Lib Scan Name Scan Name SizeXY Escape Clause Scan Ident Condition Clause Library Scan StepWidthXY MoPL grammar 11 (14) 33
8. Expression Declarations Assignment Sign Factor Term Rel Op Simple Expression Expression SW Variable MoPL grammar 12 (14): 8. Expressions 34
9. Lexical Declarations Ident Digit Letter Underscore Unsigned Real Point Number Scale Factor FourBitVector MoPL grammar 13: 9. Lexical Declarations 35
Decl-Size Range Data Type Name-List MoPL grammar 14 (14): 10. Common Production Rules 10. Common Production Rules 36
>> Data Sequencers << • Time to space migration • Flowware languages • Data Sequencers • Sequencing through 2-D memory • MoM architecture • Acceleration mechanisms http://www.uni-kl.de 37
application-specific distributed memory* • Application-specific memory: rapidly growing markets: • IP cores • Module generators • EDA environments • Optimization of memory bandwidth for application-specific distributed memory • Power and area optimization as a further benefit • Key issues of address generators will be discussed *) see books by Francky Catthoor et al. 38
Significance of Address Generators • Address generators have the potential to reduce computation time significantly. • In a grid-based design rule check a speed-up of more than 2000 has been achieved, compared to a VAX-11/750 • Dedicated address generators contributed a factor of 10 - avoiding memory cycles for address computation overhead 39
Smart Address Generators 1983 The Structured Memory Access (SMA) Machine 1984 The GAG (generic address generator) 1989 Application-specific Address Generator (ASAG) 1990 The slider method: GAG of the MoM-2 machine 1991 The AGU 1994 The GAG of the MoM-3 machine 1997 The Texas Instruments TMS320C54x DSP 1997 Intersil HSP45240 Address Sequencer 1999 Adopt (IMEC) 40
Adopt (from IMEC) • customized MMU (cMMU) • address expression (AE) • Address Sequence (AS) • Address Calculation Unit (ACU) • Application-Specific Unit (ASU) • cMMU synthesis environment: • application-specific ACUs for array index reference • ACU as a counter modified by multi-level logic filter • ACU with ASUs from a Cathedral-3 library • distributed ACU alleviates interconnect overhead (delay, power, area) • nested loop minimization by algebraic transformations • AE splitting/clustering • AE multiplexing to obtain interleaved ASs • other features For more details on Adopt see paper in proceedings CD-ROM 41
Distributed Memory SA: scrambling and descrambling the data ? Just in time: a new research area: Application-specific distributed memory: e. g. book by F. Catthoor et al. ... Data address generators - 20 years research: 42
>> Sequencing through 2-D memory << • Time to space migration • Flowware languages • Data Sequencers • Sequencing through 2-D memory • MoM architecture • Acceleration mechanisms http://www.uni-kl.de 43
asM memory bank asM data counter asM asM asM asM example: 4x4 scan window ...... Speedup by MoM MoM architecture: 2-D memory space, adj. scan window MoM anti machine grid-based design rule check example smart memory interface (r)DPU speed-up: >1000 asMA distributed memory complex boolean expressions in 1 clock cycle address computation overhead: 94 % 44
Vary-size scan windows Size adjustable at run time square or rectangular shape location‘s individual access mode: R, W, R/W, no-op by no-op placements any wild window shape avoid multiple read/multiple write for overlapping successive scan window positions 47
DA L0 B0 Address Stepper Base Slider Limit Slider GAG A GAG Slider Model Generic Address Generator 49
D B A L 0 GAG = Generic Address [ ] | | | | Generatorc DA L0 B0 limit Address Stepper Base Slider Limit Slider all 3 are copies of the same BSU stepper circuit A GAU generic address unit Scheme published in 1990 GAU 50