240 likes | 366 Views
Communications in ISTORE. Dan Hettena. Communication Goals. Goals: Fault tolerance through redundancy Tolerate any single hardware failure High bandwidth with commodity hardware Use redundancy for extra bandwidth Lower latency alternative For latency-sensitive apps, such as Titanium
E N D
Communications in ISTORE Dan Hettena
Communication Goals • Goals: • Fault tolerance through redundancy • Tolerate any single hardware failure • High bandwidth with commodity hardware • Use redundancy for extra bandwidth • Lower latency alternative • For latency-sensitive apps, such as Titanium • Provide Active Messages interface
Outline • ISTORE Network Hardware • IP Communication • Actives Messages Communication
ISTORE Network Hardware • Components • 64 ISTORE Bricks, each with: • Pentium II 266MHz • IBM 10kRPM SCSI disk • Can sometimes be read faster than 30MB/s • 4 100Mbps ethernet interfaces • Intel EtherExpress Pro/100 (82557/8) • Total bandwidth = 4*100Mbps = 40MB/s
ISTORE Networking Hardware • Components (continued) • 14 “Little” Routing Switches • PacketEngine/Alcatel PowerRail 1000 • 20 100Mbps interfaces (copper) • 2 1Gbps interfaces (fiber) • 2 “Big” Routing Switches • PacketEngine/Alcatel PowerRail 5200 • More-than-enough 1Gbps interfaces (fiber)
ISTORE Networking Hardware • Routes between bricks
ISTORE Networking Hardware • Routes between bricks (continued) • Short routes • Only if connected to the same “little” switches • No need to go through a “big” switch • 2 hops
ISTORE Networking Hardware • Routes between bricks (continued) • Long routes • Must choose a big switch • 4 hops
ISTORE Networking Hardware • Performance observations • Switches are store-and-forward • Ethernet packets are all at least 60 bytes • 0-padded by sender if necessary • Time per 100Mbps copper hop is15ms + (10ns/bit)(size – 60 bytes)
ISTORE Networking Hardware • Future work • Plug in the wires
IP Communication • Goals • Stripe packets across all 4 interfaces • 4x increase in available TCP/UDP bandwidth • Automatically handle link and router faults • Transparent to TCP/UDP applications • Transparent backward-compatibility with hosts that do not support striping
IP Communication • Nested outline • Previous work • Kernel driver overview • Providing fault tolerance • Providing backward compatibility • Making sure it scales
IP Communication • Previous work • Linux bonding driver (net/drivers/bonding.c) • Generic driver to “bond” links • Ignores faults • Does not prevent packet reordering • Only supports one remote host • “An Architecture for Packet-Striping Protocols” (Hari, Varghese, Parulkar)
IP Communication • Kernel striping driver • Cooperates with ethernet driver • Use special MAC addresses • 49:53:54:4F:<00NNNNNN>:<000000II> • Easy to determine if host supports striping • Store striping information in headers • Link status • Reordering data
IP Communication • Fault tolerance • User process periodically tests links • Notifies striping driver • Striping driver will not use broken links • Need to detect performance faults, too • Backward compatibility • Same IP address for both modes
IP Communication • Scales automatically, unless packets arrive out of order • Possible with multiple routes (e.g. striping) • TCP relies on a consistent round-trip time • Packet reordering confuses TCP • Result is unnecessary retransmissions • This will be an issue in ISTORE
IP Communication • Scaling (continued) • Need to reorder packets before handing them up to IP • Solution (almost implemented) • Clever use of queuing algorithms on sender and receiver • Makes it unlikely that the receiver will dequeue packets out of order • This is previous work (Hari et al)
IP Communication • Future Work • Complete reordering support • Automatic detection of node ID • Automatic detection of incorrect wiring • Automatic configuration of the switches • By the Diagnostic Processors
Active Messages Support • Goals • Support latency-sensitive apps • Titanium, for example • This is not a primary goal for ISTORE • As reflected in the networking hardware • Non-goals • Transparency • Support for malicious users
Active Messages Support • Problem: kernel ruins latency • Protocol stacks (UDP, TCP) are slow • User-kernel interaction is slow • Solution: remove kernel from critical path • Previous work: U-Net • User level ethernet driver • Cooperates with kernel driver • Only accessible to trusted users
Active Messages Support • Custom AM implementation • By Dan Bonachea (not me) • Based on HPAM • But also supports big bulk transfers • Supports ISTORE user-level networking • Automatically gets a new network card • (if a link dies) • Also supports UDP
Active Messages Support • Performance comparison • Using a ping program (client of AM) • Using short route (four hops) • Ethernet latency = 4*(17ms)=68ms • UDP mean round-trip time is 160ms • User-level ethernet mean is 80ms • Includes check-summing and AM overhead
Active Messages Support • Future work • Compile Titanium for ISTORE • and see what happens.
Conclusions • Kernel hacking is fun • And my talking is done.