990 likes | 1.01k Views
This paper discusses the present and future of reconfigurable systems in Brasilia, Brazil, focusing on the migration of programming to the structural domain and the opportunities for introducing this domain to programmers through clever abstraction mechanisms. It also explores the use of data streams for computing and the anti-machine paradigm. The paper concludes with final remarks on the potential of reconfigurable systems.
E N D
November 14, 2003, Brasilia, Brazil Present and Future of Reconfigurable Systems Reiner Hartenstein* University of Kaiserslautern *) IEEE fellow
Literature (also downloads) http://hartenstein.de also click „recent talks“ this page: also links to available Ph. D theses: Becker ,Herz, Kress, Nageldinger, 2
Reconfigurable Computing: a second programming domain Migration of programming to the structural domain The structural domain has become RAM-based The opportunity to introduce the structural domain to programmers ... ... to bridge the gap by clever abstraction mechanisms using a simple new machine paradigm 3
data streams ... 2007 1967 1987 computer age (PC age) 1957 1977 1997 morphware age von Neumann does not support morphware mainframe age here? IT ages flowware 4
>> outline << • fine grain reconfigurable • Placement and routing • coarse grain reconfigurable • Flowware • Datastream-based Computing • The Anti Machine Paradigm • Final Remarks http://www.uni-kl.de 5
fine grain • Fine Grain morphware platforms already mainstream: reconfigurable logic just logic design on a strange platform ? speed-up til 3 orders of magnitude 6
no. of masks cost / mio § 12 12 16 20 26 28 30 >30 4 Xilinx Lattice 42% 15% Actel NRE and mask cost [dataquest] . 3 6% PC: 25% others: 31% Altera total: $3.7 Bio 37% Top 4 PLD Manufacturers 2000 2 6 % automotive 22% communication 16% consumer 1 mask set cost [eASIC] feature size 0.8 0.6 0.35 0.25 0.18 0.15 0.13 0.1 0.07 rGAs you don‘t need specific silicon ! • FPGAs going into every type of application – also SoC • fastest growing segment of semiconductor market • [Dataquest] > $7 billion by 2003. 7
switch rGA with island architecture (Ausschnitt) connect switch 8
switch box switch point • Rekonfigurierbar switch box 9
connect box part of configuration memory • Rekonfigurierbar connect point connect box 10
illustration Verbindungspunkt (vergrößert) reconfigurable logic box Verbindungs-Punkt • Rekonfigurierbar 11
connection activated illustration Die Zuleitung zur Funktionswahl des rLB nicht gezeigt reconfigurable logic box 12
connect point activated • Routing 13
3 Schaltpunkte der 4. Schaltpunkt der 5. Schaltpunkt switch points activated switch point • Routing switch box 14
Routing continued • Routing 15
Routing A B Routing completed for 1 net 20 Transistors + 20 Flipflops Plazierungs- und Routing Software bekannt s. 25 Jahren 1979 Silva Lisco (Silicon Valley Research Corp.) bietet CALM-P an Solche Netzwerk-Probleme manuell oder mit Hilfe der Graphen-Theorie behandelbar. 16
A B Routing: long distance nets Passing through: long distance wiring from rLBs outside this region A path can be used only once at a time ..... 17
A B C D routing congestion C and D are not reachable. C cannot be connected with D. A bridge can be passed only once (bridges of Königsberg) 18
>> outline << • fine grain reconfigurable • Placement and routing • coarse grain reconfigurable • Flowware • Datastream-based Computing • The Anti Machine Paradigm • Final Remarks http://www.uni-kl.de 19
Leonhard Euler Euler‘s problem of the bridges of Königsberg is such a network problem (1736): 1736 Find a way, which passes each bridge exactly once ..... ... also an optimization: none of the bridges remains unused. 20
Graph node edge Right Bank Kneiphof Island Other Island Left Bank L. Euler: Solutio Problematis Ad geometriam Situs Pertinentis; Commetarii Academiae Scientiarum Imperialis Petropolitanae 8 (1736), pp. 128-140 21
to to from from 2 1 0 1 1 0 0 0 1 1 1 1 directed graph 1 1 2 2 3 3 4 4 2 0 0 0 0 2 1 1 1 1 4 3 0 0 3 0 0 3 1 1 1 1 2 1 0 0 4 4 0 0 0 1 1 1 4 3 1 3 1 4 2 4 3 2 undirected graph 2 2 1 2 3 2 4 1 4 3 2 3 3 3 4 / / / / / / / / Data structures for Graphs adjacency matrix Graph List J. E. Hopcroft, R. E. Tarjan: Efficient algorithm for graph manipulation; Comm. ACM, 1973 22
Large Scale Routing ENIAC, completed 1945 Partitioning over racks in the hall Partitioning over card cages in the rack Partitioning over boards (cards) in card cages Partitioning over chips etc. on the card (e. g. SBC) Partitioning over blocks on the chip (e. g. microprocessor) 23
PCBs (printed circuit boards) for 40 years planar „wiring“ MULTEC at Böblingen produces printed circuits boards since 1963 no. of pins is limited 24
Integated Citcuit (Chip) limited number of pins „wiring“ on a planar surface 25
S M IMS2 I IMS S S M JWGU M I IMS1 I IMS3 KL2 KL3 KL4 IMS IMS Kaisers- FTI1 lautern 1 FTI2 hierarchy rack more levels card cage card chip macro cell basic cell 26
cell *) 30er: Telefon-Vermittlung (ohne Chips, Crossbar / Hebdreh-Wähler statt Karten) 40er: erste Computer (ohne Chips) wiring hierarchy card cage wiring connects the cards macro cell card wiring connects the chips on-Chip- wiring connects the cells cables in the rack connect the card cages 27
An obsolete Application Area • fine grain reconfigurable • Placement and routing • coarse grain reconfigurable • Flowware • Datastream-based Computing • The Anti Machine Paradigm • Final Remarks before fabrication ? after fabrication ? http://www.uni-kl.de 28
Quickturn Celaro Pro (Mentor) Dini Group PCi bus extender Dini group Emulators Dini Group 29
n=8 32 4 x 4 14 partial crossbar full crossbar no. of crossbar chips no. of crossbar chips cossbar chips in a row cossbar chips in a row n n x n/2 n n 8 32 32 8 8 100 5000 100 100 Crossbar 64 64 30
Logik-Karte Einschub Schrank 14 Logic Chips (Lchip) with 128 pins (occasionally for rout-through) each Xchip: 4 pins connected to each Lchip 32 Crossbar Chips (Xchip) with 72 I/O pins (for rout-through only) Routing 8 Logic cards per card cage 8 Ychip cards per card cage 8 card cages per rack Backplane: 8 Zboard cards per rack 31
NASA telemetrics crossbar array 1964 Crossbar ? 1913 J. N. Reynold‘s crossbar switch 1915 patent granted 1926 first public telefon switching application in Shweden Betulander‘s crossbar switch 1919 32
RWC Real World Computing, Japan, 40 TFLOPS Crossbar weight: 220 tons, 3000 km cable, 5120 processors with 5000 pins each 33
rGA rGA rGA rGA rGA rGA rGA rGA rout-through Routing Congestion Example direct connection impossible detour connection 34
Routing-only configuration (2 examples) • Routing rLB Identitity function configured 35
Graphs, Partitioning, Algorithms T. Uehara, W. M. van Cleemput: Optimal Layout of CMOS Functional Arrays; IEEE Trans. C-30, pp. 305-312, May 1981 B. Kernighan, S. Lin: An Efficient Heuristic Procedure for Partitioning Graphs; BSTJ 49, 1970, C. Alpert, A. Kahng: Recent Directions in Netlist Partitioning: A Survey; Integration, vol 19 (1-2), pp. 1-81, 1995 T. Cormen, et al.: Introduction to Algorithms; MIT Press / McGraw-Hill, 1991 36
System gates per rGA chip 10 000 000 [Xilinx Data] planned 1 000 000 Virtex II Virtex 100 000 XC 40250XV 10 000 XC 4085XL 1 000 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 Jahr 100 500 200 why emulators are obsolete 37
number of design starts [N. Tredennick, Gilder Technology Report, 2003] rGA-basiert why declining ASIC business? you don‘t need specific silicon ! More and more the prototyping platform of rGA based systems will be directly delivered as the product to the customer: fully configured ASICs lost the battle. rGAs are the winners ASIC emulators have been a transient solution: now with declining commercial significance. 38
Rocket IO Power PC Core On Chip Memory Controller Embeded RAM Xilinx: full hierarchy on chip from rack to chip • Xilinx Virtex-II Pro FPGA Architecture • PowerPC 405 RISC CPU (PPC405) cores • FPGA Fabric-based on Virtex-II Architecture Source: Ivo Bolsens, Xilinx 39
>> outline << • fine grain reconfigurable • Placement and routing • coarse grain reconfigurable • Flowware • Datastream-based Computing • The Anti Machine Paradigm • Final Remarks http://www.uni-kl.de 40
Reconfigurable Computing : not that new – but shocking the fundamentals of CS curricula focusing on coarse grain • Fine Grain morphware platforms already mainstream: reconfigurable logic just logic design on a strange platform • Coarse Grain platforms: an order of magnitude more MIPS/mW than fine grain 41
*) R. Hartenstein: ISIS 1997 von Neumann hard- wired hardwired rDPAs (reconfigurable computing)* coarse grain FPGAs 2 1 0.5 0.25 0.13 0.1 0,07 DSP FPGAs (reconfigurable logic) instruction set processors standard microprocessor why coarse grain T. Claasen et al.: ISSCC 1999 MOPS / mW 1000 coarse grain goes far beyond bridging the gap throughput 100 10 1 0.1 0.01 flexibility 0.001 µ feature size 42
Reconfigurable Interconnect Fabric rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU rDPU RIF layouted over rDPUs: rDPA wired by abutment separate routing area rDPA (Reconfigurable Datapath Array) 43
CMOS intercoonnect resources Foundries offer up to 9 metal layers and up to 3 poly layers reconfigurable interconnect fabric layouted over the rDU cell 44
XPU family (IP cores): PACT Corp., Munich Commercial rDPAs XPU128 45
mapping algorithms efficently onto rDPA SNN filter on KressArray rout thru only array size: 10 x 16 = 160 rDPUs „Structured Configware Design“ [R. H.] not used backbus connect by the way: example of scalability / relocatability by EDA support 46
Routing Hundreds of rGAs or very large rGAs badly scalable Routing congestion growing exponentially 47
Communication Resource Requirements ... often Functional Resources are not the Throughput Bottleneck In some Application Areas, such as e. g. Wireless Communication, Reconfigurable Computing Arrays need extraordinarily rich and powerful Communication Resources The Solution: Generators for Domain-specific RA Platforms 48
Select mode, number, width of NNports Select Function Repertory 16 8 32 rout-through only rout-through and function + 24 2 rDPU more NNports: rich Rout Resources select Nearest Neighbour (NN) Interconnect: an example 4 Examples of 2nd Level Interconnect: layouted over rDPU cell - no separate routing areas ! KressArray Family generic Fabrics: a few examples http://kressarray.de 49
Super Pipe Networks The key is mapping, rather than architecture * *) KressArray [ASP-DAC-1995] 50