470 likes | 614 Views
Reliable IP Multicast: status and selected topics. Y E P U & Y AN S UN. A CSE620 Presentation. Overview. Introduction Reliable Multicast Protocols Case Studies Multicast congestion control Routing for Multicast The MBone and the Internet2 Summary. Introduction. Why Multicast?
E N D
Reliable IP Multicast:status and selected topics YE PU & YAN SUN A CSE620 Presentation
Overview • Introduction • Reliable Multicast Protocols • Case Studies • Multicast congestion control • Routing for Multicast • The MBone and the Internet2 • Summary Reliable Multicast
Introduction Why Multicast? • In many emerging applications, one sender will transmit to a group of receivers simultaneously Why Reliable? • Audio/Video applications do not require reliability • Many other exciting applications do, e.g. remote WB, collaborative VR, data dissemination Unicasting Multicasting Reliable Multicast
Reliable Multicast: Basic Questions • What is the "right" definition of reliable multicast? • Is there a baseline(e.g., reliable delivery of all data)? • should ordering/causality be part of the networking semantics of reliable multicast? • where to draw the line between network- and application-level functionality? • Design Approaches • How important is scalability (large number of participants)? • Are there fundamental differences from one setting to another (1-many vs many-many) that require different approaches? • Are separate designs, each optimized for a different scenario, the way to go? • Can one protocol (or protocol framework) fit all requirements? Will n protocols (or framworks) fit k (n<k) scenarios? • Framework • Is there a value (and if so, what is it) of developing a common framework (a la RTP) in which various reliable multicast protocols can be built • What should that framework look like? • In terms of IETF, is there any part (which one) that should be standardized? -- ACM SIGCOMM Multicast Workshop, Stanford, August 27, 1996 Reliable Multicast
Reliability Mechanism: Who’s Responsible? • Sender Initiated • Sender is responsible for packet loss detection • Based on positive acknowledgements (ACKs) • ACK implosion at large scale multicast, poor scalability • Receiver Initiated • Receiver is responsible for packet loss detection • Based on negative acknowledgements (NACKs) • Alleviates ACK implosion, better performance • Potentially NACK implosion ACK Implosion NACK Trigger NACK Implosion Reliable Multicast
Loss Recovery: What did ja say? • Loss Recovery: Detection and Retransmission of lost packets • Global Recovery • Repair are multicasted to the entire group • Efficient where loss is often concentrated at the backbone gateway • Local Recovery • Try to recover from packet loss without going all the way to the source • Response multicast within a scope just large enough to reach each affected receiver • Forward Error Control (FEC) • Retransmit error-correcting codes instead of original packet data • Simultaneously repair packet losses with a single packet Reliable Multicast
Feedback Control Feedback Control: Mechanism that restricts the amount of feedback generated by multicast group Structure Based Rely on a designated receiver (DR) to process and filter feedback traffic Timer Based Delay retransmission request for a random time interval, uniformly distributed between the current time and one-way trip time to the source Reliable Multicast
Multicast Protocols by 1997 Reliable Multicast
Case Study: SRM • SRM: Scalable Reliable Multicast • Originally designed for wb • Currently operational over the MBone • Receiver-reliable, NACK-based • Any receiver can multicast NACK or repair packet Reliable Multicast
SRM Loss Recovery Principle Data-driven Recovery (sequence gap detection) Control-driven Recovery (session message sequence) • Source assign unique sequence number • NACK generated when missing data detected Reliable Multicast
SRM: Source Path Message • Each member multicasts periodic session messages that report the sequence number state for active sources • Receivers detect the loss of the last packet in a burst • Members also use session messages to determine the current participants of session • Average session message bandwidth: 5% of data bandwidth Reliable Multicast
SRM NACK Suppression • NACK is multicast to the entire group • Receiver in need of that data can suppress its own NACK • Simultaneous detection of packet loss: random delay and receiver with smallest delay wins Reliable Multicast
SRM: Loss Recovery Algorithm Loss Detection 1. set the backoff parameter b =1 2. upon miss data D from host S, choose a random delay t on 2b[C1d(S), (C1+C2)d(S)] 3. schedule a request packet, REQD, for transmission in t seconds 4. if we receive REQD from some other host before t seconds, then set b = b +1 and restart the request timer 5. otherwise, if data, D, or the repair reply, REPD, is received before t seconds, cancel REQD 6. otherwise, send REQD after t seconds Retransmission 1. upon receipt of REQD from host A, if D is locally available, choose a random delay t on [C1d(A),(C1+C2)d(A)] 2. schedule the repair packet REPD for transmission in t seconds 3. if REPD is received before t seconds, then cancel the repair timer 4. otherwise, send REPD after t seconds Reliable Multicast
Case Study: RMTP • RMTP: Reliable Multicast Transport Protocol • Designed for file dissemination(single-sender) • Deployed in AT&T’s billing network • Based on a hierarchical structure • A special Designated Receiver (DR) is responsible for sending ACKs to sender Reliable Multicast
RMTP: Network Topology • Receivers grouped into local region • Source multicasts packets to receivers • Receivers unicast periodical ACK to its AP/DR • DR provides local repair if data is • available • DR unicasts its own ACK to parent to consolidation of traffic to the next DR in hierarchy • Source determines retransmission based on status send by DR Reliable Multicast
RMTP: ACK Processing & Retransmission A sender’s send window A receiver’s receive window Reliable Multicast
RMTP: Formation of Local Region • RMTP assumes there is some information about the approximate location of receivers • Some receivers and servers are chosen as DR • Each DR periodically sends a special packet SEND_ACK_TOME in which TTL field is set to a pre-determined value(say 64) • Each receiver chooses the DR whose SEND_ACK_TOME has the largest TTL value Reliable Multicast
Case Study: PGM • PGM: Pragmatic General Multicast • Router supported to provide scaling • Provide no notion of membership • NACK based, with suppression Reliable Multicast
PGM: Data Packet Types ODATA: original content data NACK: selective negative acknowledgement NCF: NACK confirmation RDATA: retransmission data(repair) SPM: source path message TSI: Each PGM packet contains a Transport Session Identifier(TSI) to identify the session and source of data Reliable Multicast
PGM: NACK/NCF Dialogue • NACK + random delay is unicast from router upstream towards source • PGM-aware router keeps forwarding NACKs until it sees a NCF or RDATA • Only one NACK is forwarded for every packet loss • Source multicast NCFs to the whole group to provide NACK reliability Reliable Multicast
PGM: Source Path Message • SPMs are multicast downstream interleaved with ODATA • PGM-aware routers use SPM to determine unicast path forwarding NACKs • Receivers use SPM to determine the last PGM aware router to forward NACK Reliable Multicast
PGM: Retransmission • Sender • Retransmit immediately after getting a NACK • Router • Maintain retransmission states for every interface that received NACK • Only forward retransmission on one interface per NACK Reliable Multicast
PGM-aware Router Features • Routers intercept SPMs and use them to establish source path state for the corresponding source and group • Routers forward only the first copy of any NACK they receive to the upstream PGM-aware router to constrain NACK forwarding • Routers discard exact duplicates of any NACK for which they already have repair state • Routers use NACKs to maintain repair state consisting a list of interfaces upon which a given NACK was received, and return the RDATA only on these interface • Routers can also optionally redirect NACKs to a designated local retransmitter (DLR) rather than the source Reliable Multicast
Congestion Control • Why Congestion Control? • Needs to use available bandwidth fairly among multiple best-effort flows over a shared link • TCP Congestion Control • Multiplicative decrease at the indication of congestion • Linear increase when there is no congestion • Encourage fair sharing of bandwidth • No safeguard against aggressive flows (endtoend feedback controlled) • Multicast without CC • NonTCPcompatible flows can lock out competing TCP flows • Simultaneous congestion collapses • Need endtoend feedbackbased TCPcompatible congestion control mechanism Reliable Multicast
Control Metrics Fairness - How it shares bandwidth with other connections, and how it discriminates against connections of different lengths. This is the closest thing to the "performance" of a connection Safety - How wide of a range of operating conditions can the algorithm support without causing the network to go in to an unstable operating range Responsiveness - How fast an algorithm adapts to changes in the network load Variability (or accuracy) - How consistent is the performance of thealgorithm in the face of a given environment? i.e. what is the variance in throughputs? Scalability - How do these metrics scale in the face of large scale groups? Reliable Multicast
Control Approaches Window-based: “Slow start” TCP-style sliding window algorithm Rate-adaptive: Adjust transmission rate upon receipt of NACKs Forward Error Correction (FEC): Rarely used due to encoding/decoding overhead Reliable Multicast
MTCP: Hierarchical Congestion Control • Hierarchical Congestion Reports • Internal tree nodes sender's agent (SA) • receivers send feedback to their SAs • SAs send a summary of the congestion level of their children to their parents Reliable Multicast
MTCP: Hierarchical CC (cnt’d) • Window Based Control • Send controls its rate based on its summary • Congestion Window Adjustment (when CWND goes down) • RTD timeout • Fast retransmission (in conjunction with selective acknowledgment) • Three NACKs for the same packet reduces the window (note that not every loss causes CWND to go down by half) • Based on TCPVegas scheme (I.e., long RTT causes it to go down) Reliable Multicast
Forward-error Correction Coding (FEC) • "Simultaneous repair" utilize(n,k) block codes • Packet stream is grouped into platoons of n packets each Reliable Multicast
FEC/ARQ • On detected loss the receiver NACKs the platoon rather than the packet • If each receiver indicates the number m of packets loss from that platoon, then the responder can merely send m of k parity packets. Reliable Multicast
Proactive FEC/ARQ • Proactive: Send some repairs before loss • Proactive factor:r • Sender sends round(rk) packets • Recevers NACKs to get add’l repairs Reliable Multicast
Multicast Routing • Requires a significant amount of state and complexity in routers (requires at least per-group state information and often even per-source information) => Very slow deployment and use by Internet standards • Dense Mode: Sender broadcasts traffic and triggers prune messages (DVMRP, PIM-DM) • Sparse Mode: Group members explicitly sends join messages (MOSPF, CBT, PIM-SM) Advantage • Less routing state to keep (only routers on the multicast path keep) • Explicit join: multicast traffic only flows across links leading to identified receivers Disadvantage • Single-point-of-failure at RP • Hot spot of multicast traffic at RP and non-optimal path on multicast tree Reliable Multicast
Multicast Routing in Early MBone MBone on non-multicast capable Internet 1. MR3 and MR4, running the Multicast Router Daemon (mrouted), support IGMP. Mrouted encapsulates multicast datagrams in unicast datagrams to send, and decapsulates multicast datagrams from unicast datagrams it receives 2. R1 and R2 are non-multicast enabled routers. They forward unicast encapsulated multicast packets just like any other unicast datagram Reliable Multicast
DVMRP • First protocol developed to support multicast routing • Tree is constructed on demand using a “broadcast and prune” • Reverse Path Forwarding (RPF) ensures no loops in the tree and only shortest paths included • RPF uses unicast routing table • Does not scale to support multicast groups that are sparsely distributed over a large network 1. the message reaches router 1 2. the message reaches routers 2,3, and 4 3. routers 3 and 4 exchange messages. Each one just drops the message, because it didn’t arrive over the interface that gives the shortest path back to the source 4. the message reaches router 7. Router 7 realizes it is a leaf router and there are no group members on its subnet, so it sends a prune message back to router 6, the upstream router. Router 6, in turn, sends a prune message to router 4. Router 3 also sends a prune message to router 1 Reliable Multicast
MOSPF • Intended for use within a single routing domain • Dependent on the use of OSPF • Tree is only calculated when a router receives the first data-gram in a stream • All routers calculate exactly the same tree • Does not scale well due to periodic flooding of group membership reports 1. MR 1 computes tree - knows members of group via IGMP and hence knows path to MR 4 is via MR 2, path to MR 8 is via MR 5, etc. 2. MR 2 computes tree - determines path to MR 4 is direct, path to MR 8 is via MR 5 and MR 3 computes tree - determines path to MR 9 is direct 3. MR 5 computes tree - determines path to MR 8 is direct Note that the multicast transmission triggers this process (i.e. data driven process) and each router, when it receives a message, calculates exactly the same distribution tree as its predecessors and uses it to forward the message. Reliable Multicast
Core Based Tree (CBT) • a single tree that is shared by all members of the group, Multicast traffic for the entire group is sent and received over the same tree, regardless of the source • significant savings in terms of the amount of multicast state information that is stored in individual routers • concentration of traffic around the core • load balancing might be achieved by using more than one core Reliable Multicast
PIM-SM • Initial group-shared tree construction similar to CBT • Supports both group-shared tree and shortest-path tree • Relies on unicast routing tables to adapt to network topology changes • Independent of the particular unicast routing protocol 1. The sender at Source 2 registers at the Rendezvous Point Multicast Router RPt 2. A receiver joins at Rpt; there is now a bigger shared tree 3. The receiver is receiving lots of data from Source 2. The receiver sends an explicit join to Source 2 to construct a shortest path route Reliable Multicast
Interdomain Multicast Routing Near-term Solution - PIM-SM/MBGP/MSDP: • Multicast Border Gateway Protocol (MBGP): multicast route aggregation and abstraction as well as hop-by-hop policy routing is provided in unicast using the Border Gateway Protocol (BGP) • Multicast Source Discovery Protocol (MSDP): works by having representatives in each domain announce to other domains the existence of active sources. MSDP is run in the same router as a domain's RP (or one of the RPs) Long-term Solution - BGMP/MAAA: • Border Gateway Multicast Protocol (BGMP): first proposed as a long-term solution to Internet-wide, inter-domain multicast. • Multicast Address Allocation Architecture (MAAA): consists of Multicast Address-Set Claim (MASC) protocol (domain level), Address Allocation Protocol (AAP) (within a domain), and Multicast Address Dynamic Client Allocation Protocol (MADCAP) (for requesting addresses from a multicast Address Allocation Server (MAAS)) Alternative Solution - Root Addressed Multicast Architecture (RAMA) Reliable Multicast
The MBone • A virtual network layered on top of the physical Internet to support routing of IP multicast packets • Initially a test bed for multicast • Extensively exploits tunnels • Routing mainly with DVMRP “MBONE is truly the start of mass-communication that may supplant television. Used well, it could become an important component of mass communication.” -- John December Reliable Multicast
The MBone Reliable Multicast
The Internet2 Internet2 is a collaboration among more than 100 U.S. universities to develop networking and advanced applications for learning and research. The design and implementation of a deployment strategy to provide a consistent and ubiquitous multicast service within the Internet2 community. Internet2 Multicast-Peering Sites: Abilene, vBNS, NREN, DREN, Esnet, CANARIE, TEN-155/34 (DANTE), NORDUnet, SurfNet, APAN Abilene is an advanced backbone network that connects regional network aggregation points, called gigaPoPs, to support the work of Internet2 universities as they develop advanced Internet applications. vBNS maintains a native IP multicast service via a PIM sparse-dense-mode configuration among all vBNS Cisco routers. MBGP routing is used internally in combination with an MBGP default route representing MBone sources. vBNS belongs to MCI Worldcom Reliable Multicast
The Internet2 (cnt’d) For Internet2, the plan has always been to try and do multicast “the right way” in so much as is possible given the currently available set of protocols. As a result, the multicast deployment plan is following guidelines set forth by the Internet2 Multicast Working Group. Guidelines • all multicast deployed in Internet2 to be native and sparse mode • No tunnels are allowed • All routers must support inter-domain multicast routing using MBGP/MSDP. Reliable Multicast
Multicast on Abilene Network Reliable Multicast
Multicast on vBNS Reliable Multicast
Summary • IP Multicast is emerging as an utterly important topic in the future Internet • Achieving reliability: ACKs vs NACKs, Local Recovery, FEC, … • Reliable multicast protocols: SRM, RMPT, and PGM • Multicast congestion control • Routing in multicast: DVMRP, MOSPF, CBT, PIM-SM • Interdomain Multicast and multicast deployment on the MBone and the Internet2 Reliable Multicast
References Almeroth, K. C., The Evolution of Multicast: From the MBone to Inter-Domain Multicast to Internet2, Deployment, IPMI White Paper (www.stardust.com), 1999 Ballardie, A., RFC-2201: Core Based Trees (CBT) Multicast Routing Architecture, September 1997 Costello, A. M. and McCanne S., Search party: using randomcast for reliable multicast with local recovery, University of California at Berkeley Techanical Report UCB//CSD-98-1011, 1998 Estrin, D., Farinacci D., A. Helmy, Thaler D., Deering S., Handley M., Jacobson V., Liu C., Sharma P., Wei L., RFC-2362: protocol independent multicast-sparse mode (PIM-SM): protocol specification Floyd, S., Jacobson V., Liu C., McCanne S., and Zhang L., A reliable multicast framework for light-weight sessions and application level framing, IEEE/ACM Transactions on Networking, Vol. 5, No. 6, 1997 IPMI, Reliable IP multicast - PGM overview, IPMI White Paper (www.stardust.com), 1998 http://www.tascnets.com/mist/doc/mcpCompare.html http://netweb.usc.edu/multicast/ http://www.stardust.com/ http://www.starburstcom.com/ Katia Obraczka, Multicast transport mechanisms: a survey and taxonomy, IEEE Communications Magazine, January 1998 Reliable Multicast
References Mankin, A., Romanow A., Bradner S., and Paxson V., RFC-2357: IETF criteria for evaluating reliable multicast transport and application protocols, June 1998 McCanne, S., Scalable Multimedia Communication Using IP Multicast and Lightweight Sessions, IEEE Internet Computing, Vol. 3, No. 2, 1999 Moy, J., RFC-1584: multicast extensions to OSPF, March 1994 Paul, S., Sabnani K. K., Lin J. C., and Bhattacharyya S., Reliable Multicast Transport Protocol (RMTP), IEEE Journal on Selected Areas in Communications, Vol. 15 No. 3, 1997 Rekhter, Y., Li T., RFC-1771: a border gateway protocol 4 (BGP-4), March 1995 Waitzman, D. and Deering S., RFC1075: distance vector multicast routing protocol, November 1988 Reliable Multicast