280 likes | 401 Views
December 2- 4, 2001, Montpellier, France. Stream-based Arrays: Converging Design Flows for both,. Reconfigurable. Reiner Hartenstein University of Kaiserslautern. and Hardwired. >> Stream-based Computing. Stream-based Computing Stream-based Compilation Techniques Use in Co-Design
E N D
December 2- 4, 2001, Montpellier, France Stream-based Arrays: Converging Design Flows for both, Reconfigurable Reiner Hartenstein University of Kaiserslautern and Hardwired ....
>> Stream-based Computing • Stream-based Computing • Stream-based Compilation Techniques • Use in Co-Design • Now it’s up to You ! http://www.uni-kl.de 2
commercial rDPAs: XPU family (IP cores): PACT Corp., Munich ** CALISTO: Silicon Spice CS2000 family: Chameleon Systems ** MECA family: Malleable flexible array: MorphICs ACM: Quicksilver Tech CHESS array: Elixent MorphoSys: Morpho Tech FIPSOC: SIDSA **) bought rDPA (coarse grain) becoming important http://pactcorp.com XPU128 3
http://kressarray.de You may use it on your Netscape rout thru only not used backbus connect SNN filter Example: KressArray Family KressArray Xplorer: array size: 10 x 16 = 160 rDPUs 4
[Broderson] 1) systolic array* [1980] Bee Project chip-on-a-day* [2000] 2) KressArray** [1995] and others [later] super systolic synthesis ____ *) hardwired Generalization of the Systolic Array terms: DPU: datpath unit DPA: data path array rDPU: reconfigurable DPU rDPA: reconfigurable DPA **) reconfigurable Rapidly toward the Break-through • replaceConcurrent Processes by more efficient parallelism: stream-based DPAs1 stream-basedrDPAs2 Kress: a generalization of systolic array synthesis: 5
.... DPU DPU DPU DPU instruction sequencer instruction sequencer instruction sequencer instruction sequencer Bus(es) or switch box compare Concurrent Computing CPU extremely inefficient • massive bottleneck phenomena at run time • control flow overhead • instruction fetch / interpretation overhead • address computation overhead - may be massive 6
driven by data stream fr./to memory or, fr./to peripheral interface • no instruction sequencer inside ! DPU DPU DPU DPU DPU DPU DPU DPU DPU ... with Stream-based Computing: (r)DPA • „instruction fetch“: at compile time • transport-triggered execution • for both, • reconfigurable, and • hardwired [Brodersen] avoids run time overhead and bottleneck phenomena rDPA: drastically reduced reconfigurability overhead 7
Compiler HLL miscellanous Memory soft CPU soft DPU array Soft rDPA ? • 50 mio system gates soon • even large rDPAs as soft IPs become feasible • by >2005: don’t care about area efficiency ? 8
>> Stream-based Compilation Techniques • Stream-based Computing • Stream-based Compilation Techniques • Use in Co-Design • Now it’s up to You ! http://www.uni-kl.de 9
this dichotomy is completely ignored by our CS curricula y y a 1 DPU architecture + y 2 - * y 3 - - x equations placement placement - a a - a x 33 13 23 3 - a a a x 12 22 32 2 linear projection or algebraic mapping computing computing systolic a in space a a x in time 11 21 arrays 31 1 etc. - - ( ) y 0 - data streams 1 linear pipelines and uniform arrays only ( ) 0 y migration by re-timing 2 ( ) y 0 The Mathematician’s Synthesis Method 3 and other transformations Systolic Stream-based Computing System Systolic Array [H. T. Kung, 1980]: a DPA (Data Path Array) no routing! 10
y a + * DPU architectures x expression tree 1 3 2 simultaneous placement & routing + + 4 * xf Mapper - * sh sh 2 + + * xf Scheduler data streams - * free form pipe network sh sh simulated annealing General Stream-based Computing System heterogenous DPA or rDPA 11
Herz • Synthesizable Memory Communication Architecture • an example by Nageldinger’s KressArray Xplorer sequencers memory ports application Legend: not used Optimized Parallel Memory Controller GAG generic sequencer methodology available Memory Communication Architecture … • hot research topic in embedded systems • storage context transformations [Cathoor, Herz, Kougia, Soudris] • startups provide memory IPs or generators 12
>> Use in Co-Design • Stream-based Computing • Stream-based Compilation Techniques • Use in Co-Design • Now it’s up to You ! http://www.uni-kl.de 13
University of Kaiserslautern Computer tightly coupled by compact instruction code loosely coupled by decision data bits only Xputer Compiler Compiler Memory Memory “von Neumann” Scheduler does not support soft data paths Sequencer Datapath Datapath Array (multiple) sequencer Datapath Xputer: har dw ired program d a ta reconfigurable reconfigurable The Soft Machine Paradigm cou n ter: cou n ter(s) also for hardwired state register [Broderson] Computer:the wrong Machine Paradigm “von Neumann” enabling technology published £ 10 years ago now a hot topic area full day course last week at Tampere, Finland 14
Hardware / Software Co-Design turns to Configware / Software Co-Design Jürgen Becker’sCo-DE-X Co-Compiler [ASP-DAC’95] X-C high level programming language source Software running on Partitioner KressArray Configware running on X-C Computer Machine Paradigm supporting different platforms GNU C compiler partitioning compiler compiler Analyzer Xputer “Soft” Machine Paradigm / Profiler mProcessor DPSS Resource Parameters interface Reconfigurable Accelerators Co-Compilation 15
sequential processes: resource parameter driven Co-Compilation host: loop 1-16 body endloop reconf.array: loop 1-8 trigger endloop fork loop 1-8 body body endloop loop 1-4 trigger endloop loop 9-16 body endloop loop 1-8 body endloop loop 1-2 trigger endloop join loop unrolling strip mining Loop Transformation Examples 16
>> Now it’s up to You ! • Stream-based Computing • Stream-based Compilation Techniques • Use in Co-Design • Now it’s up to You ! http://www.uni-kl.de 17
… is based on the Submarine Model Algorithm Software procedural high level Programming Language Brain usage: procedural-only Assembly Language Hardware invisible: under the surface Hardware However, current CS Education …. This model disables ... Software Faculty Colleagues shy away from the Paradigm Shift: 18 their Brain hurts? - can’t be: this Half has been amputated
procedural structural partitioning Brain Usage: both Hemispheres ... this model disablesHardware and Software as Alternatives Algorithm Hardw/Configw only Software only Software & Hardw/Configw Hardware, Configware Software 19
Hardware (procedural) structurally disabled … completely disabled to cope with solutions other than software only Get involved! The Dominance of the Submarine Model ... ... indicates, that our CS education system produces zillions of mentally disabled Persons It‘s time to attack the software faculty dictatorship. 20
It’s up to You ! >>> thank you thank you for listening 21
>>> END END 22
Revenue [T. Kean] / month Update 2 Update 1 reconfigurable Product Product with download ASIC Product Time / months 1 10 20 30 The Impact of Reconfigurable Logic • Reconfigurable platforms bring a new dimension to digital system development and have a strong impact on SoC design. • A rapidly growing large user base of HDL-savvy designers with FPGA experience. • Flexibility promises spin-around times downto minutes instead of months for real time in-system debugging, profiling, verification, tuning, field-maintenance, and field upgrades • A New Business Model (in-field debugging and upgrading ... ) • A Fundamental Paradigm Shift in Silicon Application 23
“Mainstream Silicon Application is switching every 10 Years” Makimoto’s Wave “The Programmable System-on-a-Chip is the next wave“ standard ? µproc., memory TTL 2007 1967 1987 ? LSI, MSI reconfigurable 1957 ASICs, accel’s 1977 1997 custom Published in 1989 The History of Paradigm Shifts 24
hardwired procedural programming structural programming FPGAs Coarse grain RAs Hartenstein’s Curve algorithm: variable algorithm: fixed algorithm: variable Tredennick’s resources: variable resources: fixed resources: fixed Paradigm Shifts How’s next Wave ? standard 2007 2007 1967 1987 1957 1977 1997 custom no further wave ! 25
Configware Success Story by new Machine Paradigm Software Industry’s Secret of Success standard µproc., memory TTL 2007 1967 1987 LSI, MSI reconfigurable 1957 ASICs, accel’s 1977 1997 custom The Impact of Makimoto’s Paradigm Shifts Dr. Makimoto: FPL 2000 keynote Procedural personalization via RAM-based Machine Paradigm structural personalization: RAM-based before run time 26
“Mainstream Silicon Application is switching every 10 Years” Makimoto’s Wave standard µproc., memory TTL FPGAs 2007 1967 1987 LSI, MSI 1957 ASICs, accel’s 1977 1997 coarse grain custom The History of Paradigm Shifts 27
Select mode, number, width of NNports Select Function Repertory 16 8 32 rout-through only rout-through and function + 24 2 rDPU more NNports: rich Rout Resources select Nearest Neighbour (NN) Interconnect: an example 4 Examples of 2nd Level Interconnect: layouted over rDPU cell - no separate routing areas ! KressArray Family generic Fabrics: a few examples Wired by Abutment 28