410 likes | 643 Views
Reconfigurable Computing. Dominique LAVENIER IRISA / CNRS Rennes lavenier@irisa.fr. Reconfigurable Computing Idea (1). micro processor. ASIC. FPGA. programmable slow. not programmable fast. program. architecture. Reconfigurable Computing Idea (2). Y(i) = X(i-k) W(k).
E N D
Reconfigurable Computing Dominique LAVENIER IRISA / CNRS Rennes lavenier@irisa.fr
Reconfigurable Computing Idea (1) micro processor ASIC FPGA programmable slow not programmable fast program architecture
Reconfigurable Computing Idea(2) Y(i) = X(i-k) W(k) Sequence of pre-defined instructions Assembly of boolean functions memory memory Von Neumann model
Talk overview • FPGA Technology • Reconfigurable Architectures • Reconfigurable Processor Arrays • Perspectives
1995 1996 1997 1998 1999 2000 2001 FPGA in short • FPGA: Field Programmable Gate Array • Introduced by Xilinx in 1985 • Implement a few millions of logic gates • Market: 2500 - 2000 - 1500 - 1000 - 500 dollars in million
FPGA Structure I/O Logic block Switching box Routing network
CLB(configurable logic block) REG RAM Look-up table
Traditional Design Flow VHDL EDIF RTL a few minutes to a few hours Tech. Indep. Optimization LUT Mapping Placement Routing Bitstream Generation Config. Data
FPGA Component Use • FPGA components are used for • ASIC substitution • Rapid prototyping • VHDL simulation • Reconfigurable Computing • . . .
Reconfigurable Architectures • Functional Unit • Co-processor • Accelerator • System
UAL MEM Reconfigurable Functional Unit • FPGA integrated into the datapath • Idea: • tailored the operations/instructions • to the application • Level of Reconfigurability: • Instructions
Spyder Project • C. Iseli (Swiss Federal Institute of Technology, Lausanne) RFU1 RFU2 registers registers RFU3
Why it does not work ? • RFUs are slow • between 5 to 10 times slower than standard functional units • No programming tools • the synthesis of specific operators must be automatic
UAL MEM Reconfigurable Co-Processor • Close connection to the CPU • Integrated on the same die • Not (yet?) available • Level of Reconfigurability: • Functions
ArMen • B. Pottier (UBO, Brest) P P M M P P M M
Level of Reconfigurability: • Application Reconfigurable Accelerator UAL • Communicate through I/O bus • External board • Matrix of FPGA components • with external RAM • Commercial boards available MEM
PAM boards • PAM : Programmable Active Memory) • J. Vuillemin, P. Bertin, D. Roncin (DEC PRL) • Perle-0 (87), Perle-1 (91), Pamette (95), … Host computer FPGA memory
P P M M • Level of Reconfigurability: • System Reconfigurable System • System on Chip • - 1 reconfigurable zone connected • to several components • - available soon • Virtex + PowerPC (Xilinx/IBM)
Functional Unit Co-processor Accelerator System ? ? ? Intensive computation cryptography, image processing, DNA sequencing, … Embedded systems mobils of 3rd generation, ... Architectures - Applications
Reconfigurable Processor Arrays • Principle • parallelize intensive computation on an array of hundred (thousand) of tailored processors • Performance come from • the parallelization • the customization
... ... ... … send ( … ) receive ( … ) … … Parallelization initial code ... ... ... … for ( … ) for ( … ) for (… ) … …
Customization • data-path width • dedicated operator • parallelism A C B D
Design of Reconfigurable Processor Arrays • fast design time thanks to • regular structure • specify one processor, then replicate • local interconnection • optimize place-and-route step
Reconfigurable Processor ArraysApplications • Image processing • Signal processing • Bio-computing • Crypyography • Text processing • ... Today : mostly integer applications
Performance examples • DNA search • PeRLe-1 board (16 Xilinx 3090 - 1991) • speed-up = 50 • K-means clustering • Wildforce board (4 Xilinx 4036 - 1997) • speed-up = 100 • PPI algorithm • Spyder board (1 Xilinx V800 - 2000) • speed-up = 200 host same technology
Limitations host • host-board data bandwidth • bottleneck • programming tools • automatic parallelization • partitioning • hardware generation • portability !
Perspectives • Technology • Applications • Architecture
Exponential Growth in Density LUT logic cells logic gates 1 000 000 100 000 10 000 1000 12 M 1.2 M 120 K 12 K 1994 1996 1998 2000 2002 2004 2006
Technology 1998 2000 2002 2005 30-50M gates Xilinx Virtex XCV300 (0.3M gates) Xilinx Virtex II (10M gates) 400 Nios Xilinx Virtex XCV3200 (2M gates) • Altera APEX20K1500 (2.4 M gates) • 30 x 32-bit Nios processor (80K gates)
Applications • until now • performance have been demonstrated on integer applications with a high degree of parallelism • from now • it becomes « reasonable » to investigate the implementation of floating point applications
Floating-point operators • Estimation based on current research at IRISA • Component Xilinx XCV1000 (1 Mgates) • Pipelined operators Simple precision Double precision adder area 3% 5% multiplier area 5% 20% frequency 50Mhz 100Mhz
Floating point performance 1998 2000 2002 2005 5 FPA 25 MHz 0.1 1 10 100 25 FPA 50 MHz 125 FPA 100 MHz 500 FPA 200 MHz Giga Flops FPA : double precision floating-point adder
Architecture • Today accelerator board: • restricted bandwidth • parallelism on 1D array
Architecture • dual-port RAM connection Fast dual-port memory
Architecture • On-chip FPGA An alternative way of using the one billion-transistor processors of the next decade
Conclusion • The technology is available for reconfigurable computing • 30-50 M gates in 2005 • Application domains are increasing • floating point • No programming tools • model ? • portability ?