1 / 88

Reconfigurable Computing: a New Business Model – and its Impact on SoC Design

Warzaw, Sept. 4 - 6, 2001. Reconfigurable Computing: a New Business Model – and its Impact on SoC Design. Reiner Hartenstein University of Kaiserslautern. if you use part of it, please, quote me and e-mail me to:. downloadable “handout”. Viewgraphs downloadable from:

althea
Download Presentation

Reconfigurable Computing: a New Business Model – and its Impact on SoC Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Warzaw, Sept. 4 - 6, 2001 Reconfigurable Computing:a New Business Model – and its Impact on SoC Design Reiner Hartenstein University of Kaiserslautern

  2. if you use part of it, please, quote me and e-mail me to: downloadable “handout” • Viewgraphs downloadable from: http://www.fpl.uni-kl.de /staff/hartenstein/lot/HartensteinWarsaw01.ppt • Paper dowloadable from: http://www.fpl.unikl.de/papers/paper112.pdf 2

  3. FPL 2002, La Grande Motte (Montpellier, France), Sept. 2 – 4 Paper Submission deadline : 15th March 2002 Laboratoire d‘ Informatique, de Robotique et de Microélectronique de Montpellier Notification of Acceptance : 20th May 2002 de Montpellier Conferences on Reconfigurable Logic • FCCM, FPGA (founded 1992), and FPL (founded 1991 at Oxford, UK): The International Conference on Field-programmable Logic and Applications • topic adoption by congresses: ASP-DAC, DAC, DATE, ISCAS, SPIE …. http://www.lirmm.fr/fpl2002/ 3

  4. fine grain coarse grain >> Introduction • Introduction • FPGA boom • Coarse Grain Architectures • Fascinating Paradigm Shift • Programming Coarse Grain rDPAs • Principles of Soft Computing Machines • Future developments expected • Conclusions rDPAs fundamental issues http://www.uni-kl.de 4

  5. The Impact of Reconfigurable Logic • Reconfigurable platforms bring a new dimension to digital system development and have a strong impact on SoC design. • A rapidly growing large user base of HDL-savvy designers with FPGA experience. • Flexibility supports turn-around times of minutes instead of months for real time in-system debugging, profiling, verification, tuning, field-maintenance, and field upgrades (pro and contra) • However, completely ignored by CS & CSE Curricula (major obstacle) 5

  6. “Mainstream Silicon Application is switching every 10 Years” Makimoto’s Wave “The Programmable System-on-a-Chip is the next wave“ standard ? µproc., memory 2007 1967 1987 ? LSI, MSI reconfigurable 1957 ASICs, accel’s 1977 1997 custom Published in 1989 The History of Paradigm Shifts 2ndDesignCrisis 1stDesignCrisis TTL What’s coming next ? 6

  7. hardwired procedural programming structural programming 4th wave ? ? rDPAs ? Hartenstein’s Curve algorithm: variable algorithm: fixed algorithm: variable Tredennick’s resources: variable resources: fixed resources: fixed Paradigm Shifts How’s next Wave ? standard FPGAs 2007 2007 1967 1987 1957 1977 1997 custom no further wave ! 7

  8. Repeat Success Story by new Machine Paradigm ! Software Industry’s Secret of Success standard µproc., memory TTL 2007 1967 1987 LSI, MSI reconfigurable 1957 ASICs, accel’s 1977 1997 custom The Impact of Makimoto’s Paradigm Shifts Dr. Makimoto: FPL 2000 keynote Procedural personalization via RAM-based Machine Paradigm structural personalization: RAM-based before run time Personalization (CAD) before fabrication 8

  9. Terminology 9

  10. Reconfigurable Logic going Mainstream • Fine grain: FPGAs killing the ASIC market • Fastest growing segment of semiconductor market • Please, Lobby for New Curricula. • Substantially improved design flow and libraries • Coarse grain: several startups • Comprehensive Methodology • One of the goals of this talk: to motivate You by Key Issues and Visionary Highlights. 10

  11. >> FPGA boom • Introduction • FPGA boom • Coarse Grain Architectures • Fascinating Paradigm Shift • Programming Coarse Grain RAs • Principles of Soft Computing Machines • Future development expected • Conclusions http://www.uni-kl.de 11

  12. configurable logic blocks (CLBs) L L L L L L L L L L L L longlines L L L L L L What is an FPGA ? reconfigurable interconnect fabric S S single-length lines S = Switch Box L = Logic Block S S double-length lines Xilinx XC400E 12

  13. Actel Xilinx Lattice 6% 42% 15% Altera $3.7 Bio Top 4 PLD Manufacturers 2000 37% Top 4 FPGA Manufacturers 2000 13

  14. FPGAs going Mainstream • IP reuse and "pre-fabricated" components for the efficiency of design and use for PLDs • [Dataquest] PLD market > $7 billion by 2003. • FPGAs are going into every type of application. • FPGA, from an IP standpoint, look like an ASIC (soft IPs) • PLD vendors provide libraries to support their products. • today Altera and Xilinx own >65% of PLD business. • FPGAs soon reach 50 million system gates 14

  15. Place and Route . . Schematics/ HDL Netlister Netlist Bitstream [S. Guccione] HLL Compiler [S. Guccione] Compiler HLL User Code Compiler Executable Away from complex design flow [S. Guccione] EDA trends .... 15

  16. FPGA core Compiler HLL Memory core CPU core [S. Guccione] memory memory Compiler HLL [S. Guccione] embedded hardw. CPU & memory cores embedded CPU and memory available 16

  17. [S. Guccione] HLL Compiler Compiler HLL Compiler HLL [S. Guccione] CPU for configuration management EDA trends .... • on-board microprocessor CPU is available anyhow - even along with a little RTOS [S. Guccione] 17

  18. Configuration caching*: straight forward: host host Soft Data Path Soft Data Path RAM Config. Cache Compiler, Mapper, RTOS etc. RAM multi-context: Compiler, Mapper, RTOS etc. RAM host Soft Data Path RAM Compiler, Mapper, RTOS etc. RAM • RA part computes code for other • RA part (self reconfiguration) Dynamic (RTR) RAM RAM RAM RAM RAM Configuration Architectures (dynamic vs. static configuration) *) no cacheas usual ! • Configuration Loading Resources: • separate configuration fabrics (e.g. FPGA) • wormhole routing (KressArray, Colt, PipeRench) 18

  19. FPGA core JBits API Memory core CPU core Java Compiler User Java Code Executable [S. Guccione] Converging factors for RTR [S. Guccione] • million gate FPGAs and co-processing with standard microprocessor are commonplace • direct implementation of complex algorithms • new tools like Xilinx Jbits tool suite • directly support coprocessing and Run Time Reconfiguration (RTR) 19

  20. Revenue [Kean] / month Update 2 Update 1 reconfigurable Product Product with download ASIC Product Time / months 1 10 20 30 static vs. dynamic reconfiguration (RTR) • ... by on-board / on-chip flash, other memory • supported by on-board / on-chip CPU core • supports ASAT, adaptable devices • supports in-field debugging and upgrading (new business model) • requires disciplined implementation to avoid a testing nightmare 20

  21. Configware as the Key Enabler • Configware market is taking off for mainstream • FPGA-based designs more complex, even SoC • No design productivity and quality without good configware libraries (soft IP cores) from various application areas. • Xilinx AllianceCORE & Reference Design Alliance et al. • Growing no. of independent configware houses (soft IP core vendors) and design services • Currently the top FPGA vendors are key innovators and meet most of the configware demand. 21

  22. „Drivers“ & „OS“ for FPGAs • separate EDA software market, comparable to the compiler / OS market in computers, • Cadence, Mentor, Synopsys just jumped in. • Xilinx and Altera are fabless FPGA vendors • < 5% Xilinx / Altera income from EDA software • > 50% Xilinx people work on support, EDA & Configware 22

  23. >> Coarse Grain Architectures for detailed overview see proceedings • Introduction • FPGA boom • Coarse Grain Architectures • Fascinating Paradigm Shift • Programming Coarse Grain rDPAs • Principles of Soft Computing Machines • Future developments expected • Conclusions http://www.uni-kl.de 23

  24. 4G 3G memory 2G wireless StrongARM microprocessor / DSP 100 10 1 0.1 0.01 0.001 Algorithmic Complexity (Shannon’s Law) 1G Transistors/chip computational efficiency SH7752 mA/ MIP batteryperformance Normalized processor speed Why coarse grain ? Sources: Proc ISSCC, ICSPAT, DAC, DSPWorld 100 000 000 10 000 000 1000 000 100 000 10 000 1000 100 10 1 1960 1970 1980 1990 2000 2010 24

  25. Fine-grained vs. coarse-grained • Fine-grained reconfiguration versus coarse-grained reconfiguration. • fine grain is general purpose • slow and area-inefficient, but high parallelism • coarse grain is application domain-specific • coarse grain is highly area-efficient • extremely high performance 25

  26. area used by application L L L S S L L L resources needed for reconfigurability S S L L L Reconfigurability Overhead partly for configuration code storage “hidden RAM” not shown 26

  27. physical ~ 10 memory logical FPGA physical supersystolic ~ 10 000 FPGA logical FPGA routed microprocessor Why Coarse Grain instead of FPGA ? Sources: Proc ISSCC, ICSPAT, DAC, DSPWorld 100 000 000 000 10 000 000 000 1000 000 000 100 000 000 10 000 000 1000 000 100 000 10 000 1000 Transistors / chip reduced reconfigurability overhead by up to ~ 1000 1980 1990 2000 2010 27

  28. XPU family (IP cores): PACT corp., Munich CALISTO: Silicon Spice* CS2000 family: Chameleon Systems MECA family: Malleable* flexible array: MorphICs ACM: Quicksilver Tech CHESS array: Elixent FIPSOC: SIDSA *) bought Commercial rDPAs XPU128 28

  29. Select mode, number, width of NNports Select Function Repertory 16 8 32 rout-through only rout-through and function + 24 2 rDPU more NNports: rich Rout Resources select Nearest Neighbour (NN) Interconnect: an example 4 Examples of 2nd Level Interconnect: layouted over rDPU cell - no separate routing areas ! KressArray Family generic Fabrics: a few examples http://kressarray.de 29

  30. SNN filter KressArray Mapping Example http://kressarray.de rout thru only array size: 10 x 16 = 160 rDPUs not used backbus connect 30

  31. >> Fascinating Paradigm Shift • Introduction • FPGA boom • Coarse Grain Architectures • Fascinating Paradigm Shift • Programming Coarse Grain rDPAs • Principles of Soft Computing Machines • Future development expected • Conclusions http://www.uni-kl.de 31

  32. Development of Hypergrowth Markets Harper Business 1995 Mainstream Tornado Paradigm Shift 32

  33. Makimoto’s 3rd wave EDA industry paradigm switching every 7 years 2006 [Hartenstein] 1999 (Co-) Compilation Stream-based DPU arrays [Keutzer / Newton] 1992 Synthesis: Cadence, Synopsys ... 1985 Schematics entry: Daisy, Mentor, Valid ... 1978 Transistor entry: Applicon, Calma, CV ... The next EDA Industry Revolution 33

  34. It’s a General Paradigm Shift ! • Using FPGAs (fine grain reconfigurable): just Logic Synthesis on a strange platform • Coarse Grain Reconfigurable Arrays (Reconfigurable Computing): a fundamental Paradigm Shift systolic array* [1980] KressArray** [1995] • Replacing Concurrent Processes by much more efficient parallelism: Stream-based ComputingArrays chip-on-a-day* [2000] ____ *) hardwired **) reconfigurable • ignored by Curricula & most R&D scenes 34

  35. Stream-based Computing (2) terms: • DPU: datapath unit • DPA: datapath array • rDPU: reconfigurable DPU • rDPA: reconfigurable DPA • stream-based computing: using complex pipe network (super-systolic: Kress et al.) 35

  36. terms: DPU: datpath unit DPA: data path array rDPU: reconfigurable DPU rDPA: reconfigurable DPA Converging Design Flows the same synthesis method may be used for mapping an algorithm onto both: rDPA [Kress, 1995], and DPA [Broderson, 2000]: this synthesis method is a generalization of systolic array synthesis: super systolic synthesis 36

  37. .... DPU DPU DPU DPU instruction sequencer instruction sequencer instruction sequencer instruction sequencer Bus(es) or switch box Concurrent Computing CPU extremely inefficient 37

  38. DPU DPU DPU DPU driven by data stream from/to memory or, from/to peripheral interface Stream-based Computing no instruction sequencer inside ! transport-triggered execution 38

  39. driven by data streams DPU DPU DPU DPU DPU DPU DPU DPU DPU Stream-based Computing: (r)DPU array for both, reconfigurable, and, hardwired 39

  40. avoiding address computation overhead avoiding instruction fetch and interpretation overhead high parallelism, massively multiple deep pipelines much less configuration memory no routing areas to configure functions from CLBs >>> extremely high efficiency 40

  41. >> Programming Coarse Grain RAs • Introduction • FPGA boom • Coarse Grain Architectures • Fascinating Paradigm Shift • Programming Coarse Grain rDPAs • Principles of Soft Computing Machines • Future development expected • Conclusions http://www.uni-kl.de 41

  42. y a DPU architecture + y 1 * y 2 - x y 3 - - equations - a a - a x 33 13 23 3 - a a a x 12 22 32 placement 2 linear projection or algebraic mapping a a a x 11 21 31 1 data streams - - linear pipelines and uniform arrays only ( ) y 0 - 1 ( ) 0 y The Mathematician’s Synthesis Method 2 ( ) y 0 3 Systolic Stream-based Computing System Systolic Array [H. T. Kung, 1980]: an array of DPUs (Data Path Units) no routing! 42

  43. this dichotomy is completely ignored by our CS curricula y 1 y 2 - y 3 - - placement - a a - a x 33 13 23 3 - a a a x 12 22 32 2 computing computing systolic a in space a a x in time 11 21 arrays 31 1 etc. - - ( ) y 0 data streams - 1 ( ) 0 y migration by re-timing 2 ( ) y 0 3 and other transformations Computing in space and time 43

  44. y a + * DPU architectures x expression tree 1 3 2 simultaneous placement & routing + + 4 * xf Mapper - * sh sh + + * xf Scheduler data streams - * free form pipe network sh sh simulated annealing General Stream-based Computing System heterogenous Array of DPUs (data path units) The same mapper for both: Reconfigurable, or hardwired Kress DPSS [1995] 44

  45. Super Pipe Networks The key is mapping, rather than architecture * *) KressArray [1995] 45

  46. Performance 1000 µProc 60%/yr.. CPU Processor-Memory Performance Gap:(grows 50% / year) 100 10 DRAM 7%/yr.. DRAM 1 1980 1990 2000 Processor Memory Performance Gap 46

  47. An example by Nageldinger’s KressArray Xplorer Efficient Memory Communication should be directly supported by the Mapper Tools Legend: Optimized Parallel Memory Controller sequencers memory ports application not used Synthesizable Memory Communication http://kressarray.de 47

  48. Memory Communication Architecture • hot research topic in embedded systems • storage context transformations [Herz, others] • for low power • for high performance • startups provide memory IP or generators 48

  49. “instructions” rDPA Compiler Memory (data memory) Scheduler memory bank memory bank memory bank ... memory bank ... Sequencers (data stream generator) memory bank Stream-based Soft Machine 49

  50. Hot Research Topic: Memory Architectures • High Performance Embedded Memory Architectures • High Performance Memory Communication Architectures [Herz] • Custom Memory Management Methodology [Cathoor] • Data Reuse Transformations [Kougia et al.] • Data Reuse Exploration [Soudris, Wuytak] 50

More Related