240 likes | 268 Views
ATOLL. ATOLL - Performance And Cost Optimization of a SAN Interconnect. Dipl.-Inf. Patrick R. Schulz schulz@uni-mannheim.de Computer Architecture Group University of Mannheim, Germany. Presentation Outline. Design Considerations and Goals Basic Architecture of ATOLL
E N D
ATOLL ATOLL - Performance And Cost Optimization of a SAN Interconnect Dipl.-Inf. Patrick R. Schulz schulz@uni-mannheim.de Computer Architecture Group University of Mannheim, Germany Nov. 4th PDCS2002
Presentation Outline • Design Considerations and Goals • Basic Architecture of ATOLL • Optimization for Performance and Cost • Special features of ATOLL • Performance results • Future Developments and Conclusion Nov. 4th PDCS2002
ATOLL SAN • Design considerations for ATOLL: • Design for highest performance and lowest cost • Minimization of communication latency • Optimization of bandwidth for small and large messages • Realization of basic communication functions in hardware • Simplification of program access to the NIC • Avoiding software overhead Nov. 4th PDCS2002
ATOLL NIC • Design goals on ATOLL: • Integration of all network components to a single chip • the external switch moves onto the NIC • Provides 4 replicated independant NI devices on the host side to serve 2/4-way SMP nodes without OS intervention • 4 bidirectional Link Ports to SAN • User level communication • Hardware message handler • Many support functions for parallel processing (atomic message startup, thread synchronization, ...) Nov. 4th PDCS2002
ATOLL Basic Architecture ATOLL-Chip 4,5 Mio transistors 0.18u CMOS process 5,7 x 5,7 mm Chip Fastest and Second Biggest Design of a European University Nov. 4th PDCS2002
ATOLL HW Architecture • PCI Interface • 64bit/66,100,133MHz PCI-X 1.0 compliant • runs also as 32bit/33MHz PCI interface (3.3V) • master (DMA) and slave (PIO) functionality • capable of combining several transactions into one burst if applicable Nov. 4th PDCS2002
ATOLL HW Architecture • Host Port (Network Interface) • four fully featured devices • running at 250 MHz • PIO Mode for efficient send/receive of small messages utilizing write-combining and read-prefetching • DMA engines for autonomous transfer of large messages • small NI context of two cache lines fully loadable (virtual interfaces) Nov. 4th PDCS2002
ATOLL HW Architecture • 4 x 4 bi-directional Crossbar • fully integrated network switch on-chip • running at 250 MHz • 2 GBytes/s bisection bandwidth • fully pipelined, wormhole routing • fall-through latency of 6 cycles (24ns) • reverse flow control through crossbar Nov. 4th PDCS2002
ATOLL HW Architecture • Link Interface • bidirectional byte-wide LVDS Links (2 x 250 MBytes/s) • running at 250 MHz • reverse flow control characters are exchanged to prevent buffer overflow • CRC protection & automatic retransmission for 64 byte link packets • guaranteed message delivery after injection into network Nov. 4th PDCS2002
ATOLL2d Torus Topology Example Node with an ATOLL NIC All topologies fitting to the 4 interconnects are supported ... Nov. 4th PDCS2002
NIC NIC NIC ATOLLTree Topology Example Nov. 4th PDCS2002
Optimization for Performance and Cost regarding cost: • wormhole philosophy eliminates memory on NIC • link cables and connectors (HD-68pin), PCB, chip package (custom BGA) are highly optimized for routability => ONLY 2+2 layer PCB, single layer package • LVDS signalling => high speed, low power, low EMI • I/O cells (LVDS, PCI-X) designed by partner university • free standard cell lib (VST, 0.18um) • low cost backend service, wire-length driven, traditional design flow Nov. 4th PDCS2002
Optimization for Performance and Cost regarding performance: • Hardware retransmission => low software overhead • PCI-X => high performance node interface • User-level communication (multiple devices) => low latency • High clock frequency (250MHz) => high bandwidth (2GB/s) • Low latency (3 clock cycles for xbar arbitration) • NO kernel traps, IRQs when accessing the device andNO polling on PCI bus • mirroring important status registers in main memory using cache coherence Nov. 4th PDCS2002
Optimization for Performance and Cost Nov. 4th PDCS2002
Special Hardware Features regarding performance and cost: • programmable clock period (14MHz steps) => speed grades • cables with controlled impedance and low skew => transmission lines characteristics => wave pipelining • double pumped data on the cables => only one frequency, no phase shift Nov. 4th PDCS2002
ATOLLBandwidth ~225 MByte/s Link utilization 100% = 250MByte/s >100 MByte/s link utilization [%] message size [bytes] Nov. 4th PDCS2002
ATOLLLatency Test system: P3-1000 (Serverworks) PCI 66/64bit ATOLL@245MHz ONLY 27 clockcycles (~100 ns) latency per hop. Nov. 4th PDCS2002
Cost Comparision Performance Cost 16Gb/s 2GB/s 1xNIC + 1x 4 port Switch ~ $2700 4x 0.3x 1GB/s 1xNIC ~ $900 4Gb/s 0.5GB/s 4xNIC + 1x 4 port Switch ~ $540 $1000 100Mb/s 12MB/s Fast-Ethernet ATOLL Fast-Ethernet ATOLL Myrinet 2000 Myrinet 2000 ATOLL:Cost-effectivness of 4 x (1/0.3) = 12 x of Myrinet Nov. 4th PDCS2002
ATOLL-Team Uni MannheimLS Rechnerarchitektur Thanks to: Uni KaiserslauternLS Schaltungstechnik Ulrich Brüning Lambert Schälicke Patrick R. Schulz Holger Fröning Lars Rzymianowicz Basic Architecture Prof. Tielert Mark Wegener I/O Cells HW Implementation Architectural Enhancements IMEC Belgium Carl Das Layout Backend Service SUN Microsystems Synopsys Nov. 4th PDCS2002
Future Development • Future of ATOLL Hardware-Development • optical Link Interconnect • based on a high performance SERDES chip (2 x 250 MB/s to 2.5 Gb/s) • short distance (up to 100m) serial optical interconnect • plug compatible to electrical interface • very cost effective implementation • ATOLL 2 • 500 MHz clock • higher dimensional Crossbar for multidimensional IN structures • multithreaded cached host interface • memory management support • command extension for direct memory operations (put, get, …) => MPI-2 Nov. 4th PDCS2002
Conclusion • Radical new design approach leads to a single chip solution integrating a whole network on a chip. • Low budget design implemented from architecture to the chip. • It’s now reality (We are lucky: It’s first time right) Nov. 4th PDCS2002
ATOLL: A New Contender in the System Area Network Market Thank you for your attention! Questions? further information: www.atoll-net.de schulz@uni-mannheim.de Nov. 4th PDCS2002
Chip Photo Nov. 4th PDCS2002
Interconnect Nov. 4th PDCS2002