280 likes | 399 Views
Progress Report. FPGA-based Infrastructure. Henry Chen henryic@ee.ucla.edu June 11, 2010. Motivation. Architectural & algorithmic exploration/optimization High-performance/high-throughput computation Closed-loop test environment [1,2]. Platform Architecture [3].
E N D
Progress Report FPGA-based Infrastructure Henry Chen henryic@ee.ucla.edu June 11, 2010
Motivation • Architectural & algorithmic exploration/optimization • High-performance/high-throughput computation • Closed-loop testenvironment [1,2]
Platform Architecture [3] • Large design effort; amortize widely • As general-purpose as possible • Large memories • High I/O bandwidth • Use embedded CPU to provide high-level interface to FPGA resources
IBOB • IBOB (Interconnect Break-Out Board) • 1x Virtex-II Pro (FPGA + PowerPC405) • 2x 18Mb (36-bit) SRAMs (~250MHz) • 2x CX4 10Gb high-speed-serial • 2x Z-DOK+ high-speed differential GPIO (80 diff pairs) • 80x LCMOS/LVTTL GPIO • RS232 UART to PPC; major I/O bottleneck • read_xps/write_xps • Our primary test platform; have 2 in-house
ROACH • ROACH (Reconfigurable Open Architecture Compute Hardware) • 1x Virtex 5 FPGA • External PPC440 • 1x DDR2 DIMM • 2x 72Mbit (18-bit) QDR SRAMs (~350MHz) • 4x CX4 • 2x Z-DOK+ (80 diff pairs) • External PPC provides much faster interface to FPGA resources (1GbE) • None in-house (for now)
BEE2 • BEE2 (Berkeley Emulation Engine) • 5x Virtex-II Pro • 20x DDR2 DRAM DIMMs (200MHz) • 18x CX4 ports • High-End Reconfigurable Computer • High I/O bandwidth per FPGA • High memory bandwidth per FPGA • High memory capacity per FPGA • Have one in-house
BORPH [4] • Linux kernel modification for hardware abstraction; run on embedded CPU connected to FPGA • “Hardware process” • Programming an FPGA running Linux executable • Some FPGA resources accessible in Linux process memory space • Makes FPGA board look just like Linux workstation • Used on BEE2, ROACH; limited version on IBOB w/ expansion board
Design Environment • Simulink • Schematic-like • Integration w/ Matlab for analysis • Good for dataflow designs (ie., DSP) • Designed by BWRC, now maintained by international collaboration • Tutorials aplenty! See wiki
Design Environment • Xilinx System Generator for Simulink • Custom DSP and system blocksets • One-click design compilation
Testing w/ ROACH + KATCP • Digital frontend receiver (Rashmi)
1GbE PowerPC Matlab FPGA LVDS IO ASIC Test Board QDR SRAM ASIC BRAM
Testing Requirements • High TX clock rate (400MHz target) • Beyond practical limits of IBOB’s V2P • Long test vectors (~4Mb) • Asynchronous clock domains for TX and RX
Asynchronous Clock Domains • Easily supported by FPGA hardware • XSG has very limited capability for expressing multiple clocks; CE toggling • Further restricted by bee_xps tool automation; assumes single clock design (though many different clocks available)
Asynchronous Clock Domains • Manually merged separate designs for test vector and readback datapaths Fixed 60MHz RX 255-315 MHz TX
Results • Test up to 315MHz w/ loadable vectors in QDR;up to 340MHz with pre-compiled vectors in ROMs • 55dB SNR @ 20MHz bandwidth
Limitations • DDR output FF critical path @ 340MHz (clock out) • QDR SRAM bus interface critical path @ 315MHz • Output clock jitter? • LVDS receivers usually only 400500Mbps • OK for data, not good for faster clocks • Get LVDS I/O cells?
Future Design Recommendations • Send source-synchronous clock with returned data • Send synchronization information with returned data • “Vector warning” or frame start • Data valid
KATCP • Comm. protocol interfacing to BORPH • Can be implemented over TCP telnet connection • Libraries and clients for C, Python
KATCP Matlab Client • For our purposes, replaces read_xps, write_xps • Can program FPGA from directly from Matlabno more JTAG cable! • Provides byte-level read/write granularity • Increases speed from ~KB/s to ~MB/s • Room for improvement; currently high protocol overhead
Towards Streaming • Transition to TCP/IP-based protocols facilitates streaming • Osort test vectors 10Mb of data at ~Mb/s (IBOB) • Single-vector load and read via SRAM • LWIP UDP read/write_xps • Ethernet streaming w/o going through shared memory
New Windows Server(s) • dsp experiencing severe stability problems • eecls-{1, 2, 3, 4}.ee.ucla.edu • Windows Server 2008 (32-bit) • Matlab R2007b (+ XSG 10.1) • Matlab R2009b (+ XSG 11.5, Synphony 2009.12) • Xilinx Suite 10.1 • Xilinx Suite 11.5 • ModelSim 6.6a • Synplify 2010.03 • sherwin is now a print server
References [1] Marković, D., et al., “ASIC Design and Verification in an FPGA Environment,” IEEE CICC, 2007 [2] Dejan Marković, UCLA EEM216A Fall 2008 Lecture 20 [3] Chang, C., et al., “BEE2: A High-End Reconfigurable Computing System”, IEEE Design & Test of Computers, 2005 [4] H. So, R. Brodersen, “A Unified Hardware/Software Runtime Environment for FPGA-Based Reconfigurable Computers using BORPH,” ACM TECS, 2008.