400 likes | 529 Views
Hybrid Reliable Multicast with TCP-XM. Karl Jeacle. Jon Crowcroft. Marinho P Barcellos, UNISINOS Stefano Pettini, ESA. Rationale. Would like to achieve high-speed bulk data delivery to multiple sites Multicasting would make sense
E N D
Hybrid Reliable Multicast with TCP-XM Karl Jeacle Jon Crowcroft Marinho P Barcellos, UNISINOS Stefano Pettini, ESA
Rationale • Would like to achieve high-speed bulk data delivery to multiple sites • Multicasting would make sense • Existing multicast research has focused on sending to a large number of receivers • But Grid is an applied example where sending to a moderate number of receivers would be extremely beneficial
Multicast availability • Deployment is a problem! • Protocols have been defined and implemented • Valid concerns about scalability; much FUD • “chicken & egg” means limited coverage • Clouds of native multicast • But can’t reach all destinations via multicast • So applications abandon in favour of unicast • What if we could multicast when possible… • …but fall back to unicast when necessary?
Multicast TCP? • TCP • “single reliable stream between two hosts” • Multicast TCP • “multiple reliable streams from one to n hosts” • May seem a little odd, but there is precedent… • TCP-XMO – Liang & Cheriton • M-TCP – Mysore & Varghese • M/TCP – Visoottiviseth et al • PRMP – Barcellos et al • SCE – Talpade & Ammar
Building Multicast TCP • Want to test multicast/unicast TCP approach • But new protocol == kernel change • Widespread test deployment difficult • Build new TCP-like engine • Encapsulate packets in UDP • Run in userspace • Performance is sacrificed… • …but widespread testing now possible
Sending Application Receiving Application If natively implemented TCP TCP IP IP test deployment UDP UDP IP IP TCP/IP/UDP/IP
TCP engine • Where does initial TCP come from? • Could use BSD or Linux • Extracting from kernel could be problematic • More compact alternative • lwIP = Lightweight IP • Small but fully RFC-compliant TCP/IP stack • lwIP + multicast extensions = “TCP-XM”
TCP-XM overview • Primarily aimed at push applications • Sender initiated – advance knowledge of receivers • Opens sessions to n destination hosts simultaneously • Unicast is used when multicast not available • Options headers used to exchange multicast info • API changes • Sender incorporates multiple destination and group addresses • Receiver requires no changes • TCP friendly, by definition
TCP SYN SYNACK ACK Sender Receiver DATA ACK FIN ACK FIN ACK
TCP-XM Receiver 1 Sender Receiver 2 Receiver 3
TCP-XM connection • Connection • User connects to multiple unicast destinations • Multiple TCP PCBs created • Independent 3-way handshakes take place • SSM or random ASM group address allocated • (if not specified in advance by user/application) • Group address sent as TCP option • Ability to multicast depends on TCP option
TCP Group Option kind=50 len=6 Multicast Group Address 1 byte 1 byte 4 bytes • New group option sent in all TCP-XM SYN packets • Non TCP-XM hosts will ignore (no option in SYNACK) • Presence implies multicast capability • Sender will automatically revert to unicast
TCP-XM transmission • Data transfer • Data replicated/enqueued on all send queues • PCB variables dictate transmission mode • Data packets are multicast (if possible) • Retransmissions are unicast • Auto fall back/forward to unicast/multicast • Close • Connections closed as per unicast TCP
Fall back / fall forward • TCP-XM principle • “Multicast if possible, unicast when necessary” • Initial transmission mode is group unicast • Ensures successful initial data transfer • Fall forward to multicast on positive feedback • Typically after ~75K unicast data • Fall back to unicast on repeated mcast failure
Multicast feedback Option kind=51 len=3 % Multicast Packets Received 1 byte 1 byte 1 byte • New feedback option sent in ACKs from receiver • Only used between TCP-XM hosts • Indicates % of last n packets received via multicast • Used by sender to fall forward to multicast transmission
TCP-XM reception • Receiver • No API-level changes • Normal TCP listen • Auto-IGMP join on TCP-XM connect • Accepts data on both unicast/multicast ports • tcp_input() accepts: • packets addressed to existing unicast destination… • …but now also those addressed to multicast group • Tracks how last n segs received (u/m)
API changes • Only relevant if natively implemented! • Sender API changes • New connection type • Connect to port on array of destinations • Single write sends data to all hosts • TCP-XM in use: conn = netconn_new(NETCONN_TCPXM); netconn_connectxm(conn, remotedest, numdests, group, port); netconn_write(conn, data, len, …);
PCB changes • Every TCP connection has an associated Protocol Control Block (PCB) • TCP-XM adds: struct tcp_pcb { … struct tcp_pcb *firstpcb;/* first of the mpcbs */ struct tcp_pcb *nextm; /* next tcpxm pcb */ enum tx_mode txmode; /* unicasting or multicasting */ u8_t nrtxm; /* number of retransmits for multicast */ u32_t nrtxmtime; /* time since last retransmit */ u32_t mbytessent; /* total bytes sent via multicast */ u32_t mbytesrcvd; /* total bytes received via multicast */ u32_t ubytessent; /* total bytes sent via unicast */ u32_t ubytesrcvd; /* total bytes received via unicast */ struct segrcv msegrcv[128]; /* ismcast boolean for last n segs */ u8_t msegrcvper; /* % of last segs received via mcast */ u8_t msegrcvcnt; /* counter for segs recvd via mcast */ u8_t msegsntper; /* % of last segs delivered via mcast */ }
next next next next next U1 U2 M1 M3 M2 nextm nextm nextm nextm nextm Linking PCBs • *next points to the next TCP session • *nextm points to the next TCP session that’s part of a particular TCP-XM connection • Minimal timer and state machine changes
What happens to the cwin? • Multiple receivers • Multiple PCBs • Multiple congestion windows • Default to min(cwin) • i.e. send at rate of slowest receiver • Is this really so bad? • Compare to time taken for n unicast transfers
Grid multicast? • How can multicast be used in Grid environment? • TCP-XM is new multicast-capable protocol • Globus is de-facto Grid middleware • Would like TCP-XM support in Globus…
Globus XIO • eXtensible Input Output library • Allows “i/o plugins” to Globus • API • Single POSIX-like API / set of semantics • Simple open/close/read/write API • Driver abstraction • Hides protocol details / Allows for extensibility • Stack of 1 transport & n transform drivers • Drivers can be selected at runtime
XIO/XM driver specifics • Two important XIO data structures • Handle • Returned to user when XIO framework ready • Used for all open/close/read/write calls • lwIP netconn connection structure used • Attribute • Used to set XIO driver-specific parameters… • … and TCP-XM protocol-specific options • List of destination addresses
XIO code example // init stack globus_xio_stack_init(&stack, NULL); // load drivers onto stack globus_xio_driver_load("tcpxm", &txdriver); globus_xio_stack_push_driver(stack, txdriver); // init attributes globus_xio_attr_init(&attr); globus_xio_attr_cntl(attr, txdriver, GLOBUS_XIO_TCPXM_SET_REMOTE_HOSTS, hosts, numhosts); // create handle globus_xio_handle_create(&handle, stack); // send data globus_xio_open(&handle, NULL, target); globus_xio_write(handle, "hello\n", 6, 1, &nbytes, NULL); globus_xio_close(handle, NULL);
One-to-many issues • Stack assumes one-to-one connections • XIO user interface requires modification • Needs support for one-to-many protocols • Minimal user API changes • Framework changes more significant • GSI is one-to-one • Authentication with peer on connection setup • But cannot authenticate with n peers • Need some form of “GSI-M”
Driver availability • Multicast transport driver for Globus XIO • Requires Globus 3.2 or later • Source code online • Sample client • Sample server • Driver installation instructions • http://www.cl.cam.ac.uk/~kj234/xio/
mcp & mcpd • Multicast file transfer application using TCP-XM • ‘mcpd &’ on servers • ‘mcp file host1 host2… hostN’ on client • http://www.cl.cam.ac.uk/~kj234/mcp/ • Full source code online • FreeBSD, Linux, Solaris
All done! • Thanks for listening! • Questions?