130 likes | 275 Views
i3 on Internet in a Box. Zhangxi Tan, Wei Xu, David Patterson UC Berkeley. Outline. Project Overview System Architecture Sample Software and Demo Future Work. Project Overview. Testing and debugging large scale distributed systems is difficult Problem with existing approaches
E N D
i3 on Internet in a Box Zhangxi Tan, Wei Xu, David Patterson UC Berkeley
Outline • Project Overview • System Architecture • Sample Software and Demo • Future Work
Project Overview • Testing and debugging large scale distributed systems is difficult • Problem with existing approaches • Scalability – O(100) nodes • Reproducibility - PlanetLab • Observability - Don’t know what’s going on inside • Cost, Space and Power • IIAB: Building distributed system testbed with over 1,000 nodes using multi modular FPGA system • Version 0: Basic hardware building block, operating system and TCP/IP network support
Methodology Target Platform Xilinx XUP boards • Price $299 • Xilinx Virtex II Pro FPGA (VP30) • 256 MB DDR memory • 10/100 Mbps Ethernet Research Accelerator for MultiProcessing (RAMP) • 5 Xilinx Virtex II Pro FPGA (VP70) • DDR II memory / FPGA • 10 Gbps Ethernet Version 0 platform
Version 0 Status • 4 32-bit RISC processor (MicroBlaze) in a chip • Running at 100 MHz with L1 Cache (Instruction 16KB, Data 16KB) • HW Div/Mult, Barrier Shifter and etc • 64 MB DDR memory (100 MHz) for each processor (different address space) • 50 MIPS (measured from Linux kernel) • Running uClinux 2.4.32 kernel • Inter-processor connection • P2P 32-bit high speed FIFO link • 3.2 Gbps throughput, 1 cycle access latency • Emulate Ethernet device through Linux kernel driver • Support TCP/IP protocol stack • Standard UNIX socket programming interface • Software implementation (Polling/Interrupt) • Software router through Linux kernel
Chip Layout Chip Utilization • 97% BRAM utilization • 61% LUT utilization • Over 9 million equivalent gate count No floorplanning! • Xilinx tools are difficult to use! • Inter-connection creates hot spots • Excessive BRAM usage affects the layout
An XUP Cluster • 3 XUP boards with 12 nodes • Connected by 100 Mbps Ethernet switch
Network Performance • Measured by TTCP program (polling mode) • Software networking <<< 3.2 Gbps physical bonds
i3 on IIAB trigger data data data id id R id R Internet Indirection Infrastructure (i3) • A new Internet architecture from Berkeley • Multicast, unicast, anycast and etc. • Chord DHT based C implementation Sender Receiver (R)
Future Work (1/2) • LEON3 (SPARC v8) as the next supported processor • MMU Support • Double precision floating point • Reconfigurable parameters (Cache, MMU and etc.) • Cache coherent (snooping) • 0.85 MIPS/MHz (5000 LUTs, 90 MHz on Virtex-II) • Huge benefit on software support • Full version Linux support (Linux 2.6 Kernel) • Java support • Putting multiple LEON3 will be more challenging • Floorplanning and physical synthesis (less PAR time and QoR)
Future Work (2/2) • Time Dilation • Make 50 MIPS processor looks like 1000 MIPS processor to software • Network/Link emulation: delay, bandwidth, jitter and etc. • Disk emulation • An abstraction layer (HW/SW approach) to software • Better internal architecture • Processor/memory subsystem • High performance internal network (1 gigabit Ethernet)