1 / 72

Reconfigurable HPC part 3 Architectural Resources

May 14, 2004 , TU Tallinn, Estonia. Reconfigurable HPC part 3 Architectural Resources. Reiner Hartenstein TU Kaiserslautern. terms:. DPU: datpath unit DPA: data path array rDPU: reconfigurable DPU rDPA: reconfigurable DPA. Converging Design Flows.

briancooper
Download Presentation

Reconfigurable HPC part 3 Architectural Resources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. May 14, 2004 , TU Tallinn, Estonia Reconfigurable HPCpart 3Architectural Resources Reiner Hartenstein TU Kaiserslautern

  2. terms: DPU: datpath unit DPA: data path array rDPU: reconfigurable DPU rDPA: reconfigurable DPA Converging Design Flows the same synthesis method may be used for mapping an algorithm onto both: rDPA [Kress, 1995], and DPA [Broderson, 2000]: this synthesis method is a generalization of systolic array synthesis: super systolic synthesis 2

  3. >> Time to space migration << • Time to space migration • Flowware languages • Data Sequencers • Sequencing through 2-D memory • MoM architecture • Acceleration mechanisms http://www.uni-kl.de 3

  4. Time to space migration of algorithms Problems in time to space migration of algorithms Some have moderate interconnect requirements Many DSP algorithms require just a pipeline Some algorithms require excessive interconnect Example: the Viterbi algorithm A comprehensive taxonomy of algorithms is missing 4

  5. Foundries offer up to 9 metal layers and up to 3 poly layers Intel IC interconnect: metal layers Reconfigurable interconnect fabric layouted over the rDPU cell 5

  6. Select mode, number, width of NNports Select Function Repertory 16 8 32 rout-through only rout-through and function + 24 2 rDPU more NNports: rich Rout Resources select Nearest Neighbour (NN) Interconnect: an example 4 Examples of 2nd Level Interconnect: layouted over rDPU cell - no separate routing areas ! KressArray Family generic Fabrics: a few examples http://kressarray.de 6

  7. Xplorer Application Set KressArray Xplorer (Platform Design Space Explorer) ALE-X Compiler expr. tree interm. form 2 ALEX Code Compiler Architecture Estimator User HDL Generator Simulator Suggestion VHDL Verilog User Interface Selection Design Rules Architecture Editor interm. form 3 Mapper Improvement Proposal Generator Mapping Editor Datapath Generator Generator Mapper data stream Schedule Scheduler Kress rDPU Layout Scheduler Delay Estim. Sug- gest- ion statist. Data Power Estimator DPSS KressArray family parameters Power Data Inference Engine (FOX) Analyzer KressArray DPSS published at ASP-DAC 1995 7

  8. Xplorer GUI 8

  9. SNN filter KressArray Mapping Example http://kressarray.de rout thru only array size: 10 x 16 = 160 rDPUs not used backbus connect 9

  10. operator operand [13] + 2 hor. NNports, 32 bit route thru result 3 vert. NNports, 32 bit operand backbus connect route-thru-only rDPU Xplorer Plot: SNN Filter Example http://kressarray.de 10

  11. Communication resource editor panel of the Xplorer user interface 11

  12. Elements of the Xplorer mapping editor:a) Routing editor panel 12

  13. Elements of the Xplorer mapping editor: b) Input port editor panel 13

  14. Xplorer: Improvement Proposal Generator 14

  15. Xplorer: conditional swap operator 15

  16. Xplorer: Macro cells 16

  17. DPSS Scheduler specifies and assembles the data streams from / to array KressArray DPSS (Datapath Synthesis System) FPGA-Style Mapping for coarse grain reconfigurable arrays Compiler Mapper 17

  18. Ulrich Nageldinger http://hartenstein.de/Ph-D-Theses.html infineon technologies, Munich Dissertation Ulrich Nageldinger: • ... on mapping applications onto KessArrays • ... simultaneous routing and placement by simulated annealing • Supporting a huge family of KressArrays • fuzzy logic improvement proposal generator • profiling • design space exploration 18

  19. >> Flowware languages << • Time to space migration • Flowware languages • Data Sequencers • Sequencing through 2-D memory • MoM architecture • Acceleration mechanisms http://www.uni-kl.de 19

  20. Similar Programming Language Paradigms very easy to learn 20

  21. goto PixMap[1,1] HalfZigZag; SouthWestScan uturn (HalfZigZag) x EastScan is step by [1,0] end EastScan; y SouthScan is step by [0,1] endSouthScan; NorthEastScan is loop8 times until [*,1] step by [1,-1] endloop end NorthEastScan; HalfZigZag HalfZigZag data counter data counter HalfZigZag is EastScan loop 3 times SouthWestScan SouthScan NorthEastScan EastScan endloop end HalfZigZag; data counter data counter *> Declarations JPEG zigzag scan pattern Flowware language example (MoPL) SouthWestScan is loop8 times until [1,*] step by [-1,1] endloop end SouthWestScan; The same language principles 21

  22. MoPL-3 Grammar • The MoPL-3 Grammar ... of ... • the Map-oriented Programming Language version 3 (MoPL-3), a data-procedural programming language • to specify functions and operators to be mapped onto a DataPath Array (DPA) or other pipe network (hardwired as well as reconfigurable) • and to procedurally program data streams associated with these functions or operators 22

  23. 1. Program Definition Declaration Part MoPL Subroutine 2. Boundary Declarations Boundary Declaration Array Declaration 15 3 16 4 19 5 rALU = rDPU MoPL grammar 1 (14): 1. Program Def.2. Boundary Decl‘s 23

  24. Compound Window Declaration SW Group Name SW = Scan Window Window Spec Window Names 27 Window Size MoPL grammar 2 (14): 3. Scan Window Decl‘s 3.Scan Window Declarations 24

  25. rALU Config Top Structure rALU Name Structural Part Do Structure While Structure 17 MoPL grammar 3 (14): 4. rALU Set-up Decl‘s 4. rALU Set-up Declarations 15 25

  26. Sub Structure Condition 25 Sub Structure List Set Structure Local Branch Flag If Structure MoPL grammar 4 26

  27. rALU Activation Ident rALU Subnet Name MoPL grammar 5 (for missing production rules see Ph. D. thesis by Jürgen Becker) http://hartenstein.de/Ph-D-Theses.html 27

  28. Compound Scan Pattern Decl Simple Pattern Decl Pattern Name rALUsubnet Flag Local Branch Flag MoPL grammar 6 (14): 5. Scan Pattern Decl‘s 5. Scan Pattern Declarations 22 28

  29. Scan Statement Part Scan Statement Block 15 Scan_Pattern_Name Scan_Window_Name 16 MoPL grammar 7: 6. Scan Statement Decl‘s 6.Scan Statement Declarations 29

  30. Scan Pattern Call Array Name Scan Statement MoPL grammar 8 (14) 18 30

  31. Scan Pattern Sequence Pattern Spec Scan Action MoPL grammar 9 (14): 7. Scan Actions 7. Scan Action Declarations 24 23 24 24 24 31

  32. Shortest Step Transformation Simple Scan Stretching t.b.d. t.b.d. Shearing MoPL grammar 10 (14) 32

  33. Lib Scan Name Scan Name SizeXY Escape Clause Scan Ident Condition Clause Library Scan StepWidthXY MoPL grammar 11 (14) 33

  34. 8. Expression Declarations Assignment Sign Factor Term Rel Op Simple Expression Expression SW Variable MoPL grammar 12 (14): 8. Expressions 34

  35. 9. Lexical Declarations Ident Digit Letter Underscore Unsigned Real Point Number Scale Factor FourBitVector MoPL grammar 13: 9. Lexical Declarations 35

  36. Decl-Size Range Data Type Name-List MoPL grammar 14 (14): 10. Common Production Rules 10. Common Production Rules 36

  37. >> Data Sequencers << • Time to space migration • Flowware languages • Data Sequencers • Sequencing through 2-D memory • MoM architecture • Acceleration mechanisms http://www.uni-kl.de 37

  38. application-specific distributed memory* • Application-specific memory: rapidly growing markets: • IP cores • Module generators • EDA environments • Optimization of memory bandwidth for application-specific distributed memory • Power and area optimization as a further benefit • Key issues of address generators will be discussed *) see books by Francky Catthoor et al. 38

  39. Significance of Address Generators • Address generators have the potential to reduce computation time significantly. • In a grid-based design rule check a speed-up of more than 2000 has been achieved, compared to a VAX-11/750 • Dedicated address generators contributed a factor of 10 - avoiding memory cycles for address computation overhead 39

  40. Smart Address Generators 1983 The Structured Memory Access (SMA) Machine 1984 The GAG (generic address generator) 1989 Application-specific Address Generator (ASAG) 1990 The slider method: GAG of the MoM-2 machine 1991 The AGU 1994 The GAG of the MoM-3 machine 1997 The Texas Instruments TMS320C54x DSP 1997 Intersil HSP45240 Address Sequencer 1999 Adopt (IMEC) 40

  41. Adopt (from IMEC) • customized MMU (cMMU) • address expression (AE) • Address Sequence (AS) • Address Calculation Unit (ACU) • Application-Specific Unit (ASU) • cMMU synthesis environment: • application-specific ACUs for array index reference • ACU as a counter modified by multi-level logic filter • ACU with ASUs from a Cathedral-3 library • distributed ACU alleviates interconnect overhead (delay, power, area) • nested loop minimization by algebraic transformations • AE splitting/clustering • AE multiplexing to obtain interleaved ASs • other features For more details on Adopt see paper in proceedings CD-ROM 41

  42. Distributed Memory SA: scrambling and descrambling the data ? Just in time: a new research area: Application-specific distributed memory: e. g. book by F. Catthoor et al. ... Data address generators - 20 years research: 42

  43. >> Sequencing through 2-D memory << • Time to space migration • Flowware languages • Data Sequencers • Sequencing through 2-D memory • MoM architecture • Acceleration mechanisms http://www.uni-kl.de 43

  44. asM memory bank asM data counter asM asM asM asM example: 4x4 scan window ...... Speedup by MoM MoM architecture: 2-D memory space, adj. scan window MoM anti machine grid-based design rule check example smart memory interface (r)DPU speed-up: >1000 asMA distributed memory complex boolean expressions in 1 clock cycle address computation overhead: 94 % 44

  45. Xputer Lab at Kaiserslautern: MoM I and II 45

  46. Antimachine: MoM architecture 46

  47. Vary-size scan windows Size adjustable at run time square or rectangular shape location‘s individual access mode: R, W, R/W, no-op by no-op placements any wild window shape avoid multiple read/multiple write for overlapping successive scan window positions 47

  48. 2-D Generic Data Sequence Examples 48

  49. DA L0 B0 Address Stepper Base Slider Limit Slider GAG A GAG Slider Model Generic Address Generator 49

  50. D B A L 0 GAG = Generic Address [ ] | | | | Generatorc DA L0 B0 limit Address Stepper Base Slider Limit Slider all 3 are copies of the same BSU stepper circuit A GAU generic address unit Scheme published in 1990 GAU 50

More Related