1 / 48

State of IP Multicast

State of IP Multicast. Radia Perlman radia.perlman@sun.com. Outline. Addresses IGMP Various Routing Protocols review of DVMRP, MOSPF, CBT, PIM-DM, PIM-SM, MSDP, BGMP/MASC problems (scaling, etc) potential solutions: Simple Multicast/Express. Addresses. IP Address is 4 bytes

nickan
Download Presentation

State of IP Multicast

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. State of IP Multicast Radia Perlman radia.perlman@sun.com Radia Perlman

  2. Outline • Addresses • IGMP • Various Routing Protocols • review of DVMRP, MOSPF, CBT, PIM-DM, PIM-SM, MSDP, BGMP/MASC • problems (scaling, etc) • potential solutions: Simple Multicast/Express Radia Perlman

  3. Addresses • IP Address is 4 bytes • “Class A” top bit is 0 • “Class B” top bits 01 • “Class C” top bits 001 • IP Multicast address is “class D”, top bits are 0001 • Mapping to layer 2: use bottom 23 bits, top 24 is OUI, one more bit so ISOC has some Radia Perlman

  4. IGMP (Internet Group Management Protocol) • Purpose: router on a LAN discovers which multicast addresses have receivers on LAN • Rtr sends query. Members respond • V1: IGMP response to derived layer 2 multicast address after random delay. Rtr listens promiscuously • V2: Resign. Rtr queries again • V3: join ({S’s},G) sent to rtr layer 2 address Radia Perlman

  5. There are two ways of constructing a design. One way is to make it so simple there are obviously no deficiencies. The other way is to make it so complicated that there are no obvious deficiencies. ---Tony Hoare Radia Perlman

  6. DVMRP • Flood and prune • send data everywhere (optimization: reverse path forwarding) • send prune (S,G) • remember who you sent prunes to (in case join happens, so you can de-prune) • remember prunes you received (so you can filter) Radia Perlman

  7. Flooding/RPF • Forward received packet onto all links except the one it was received on • exponential overhead • RPF: Only accept pkt with source S on link L if you’d send to S via L • n2 overhead: each pkt goes on each link Radia Perlman

  8. Why DVMRP Doesn’t Scale • Leaking even a few packets for each of millions of sessions periodically • Prune state (S,G) pairs/neighbor of groups they DON’T want (most of the millions) Radia Perlman

  9. MOSPF • Pass information about all members for all groups in routing protocol • Calculate spanning tree from source when packet arrives from (S,G) (and cache result) • Scaling issues: • routing control overhead (all group members) • CPU for multiple Dijkstra calculations Radia Perlman

  10. CBT F D C R R R R R R R A B Radia Perlman

  11. CBT • Build bidirectional tree rooted at Core • Only routers on tree need to know about tree • Only problem: Who is the core? • Two mechanisms specified • configure the routers with (C,G) mappings • do PIM-SM bootstrap protocol (see next) Radia Perlman

  12. PIM-SM • Unidirectional Shared Tree (tunnel packet to core) • Plus dynamically formed per-source trees when (enough) traffic occurs Radia Perlman

  13. Unidirectional Tree F D C R R R R R R R A B Radia Perlman

  14. Bidirectional Tree F D C R R R R R R R A G B Radia Perlman

  15. Dynamically Formed Per-Source Trees • If enough traffic from S • join a tree rooted at S • prune off from shared tree for (S,G) • Routers keep more trees and more prune state • State timed out. Bursty source problem Radia Perlman

  16. (Simplified) PIM core mapping • PIM: “bootstrap” routers flood advertisements throughout domain • Core capable routers register with elected BSR • BSR announces list of cores Radia Perlman

  17. PIM core mapping (cont’d) • Hash alg to map M to one of the set of currently alive core capable routers • Core not necessarily near group, so shared tree can be really bad • Advertisements don’t scale, so this is intra-domain only Radia Perlman

  18. Interdomain • Use protocols that don’t scale within domains • Find some way of gluing domains together • BGMP/MASC • MSDP Radia Perlman

  19. BGMP/MASC • For interdomain: have each domain dynamically choose and defend a block of multicast addresses • Have interdomain routing protocol pass around “reachability” of multicast address blocks • Join is in direction of multicast address prefix Radia Perlman

  20. Scaling Problems • MASC • Harder than asking entire Internet to automatically number itself with IP addresses. • Too much bandwidth used • Too hard to debug • Too much of a burden on BGP • Will run out of addresses Radia Perlman

  21. MSDP • Multicast Source Distribution Protocol • “Interim solution” until BGMP/MASC done • Configure tunnels between core capable routers in various domains, enough so hopefully Internet is connected • Flood (S,G) for all active (S,G)’s, throughout Internet Radia Perlman

  22. MSDP x x x x x x x x x x x x Radia Perlman

  23. Why MSDP Won’t Scale • Too much information to pass around (all active (S,G) pairs • Too many tunnels to configure Radia Perlman

  24. “Current approach” • Use protocols that don’t scale within a domain • Find some way of hooking domains together for groups with members in different domains • MSDP or MASC Radia Perlman

  25. Simple Multicast • What causes the greatest complexity, scalability problems in the design? • Remove the need for those • Result: one scalable mechanism that will work both inside and between domains • Doesn’t need to be called “new protocol”. Can be modification of something else Radia Perlman

  26. Solve 90% of the problem as simply as possible. Then remove the remaining 10% from the problem requirements --- Marshall Rose Radia Perlman

  27. First Simplification • Don’t bother dynamically creating per-source trees • Instead use a single shared, good bidirectional tree • Less state • Better shared tree (bidirectional) Radia Perlman

  28. Bidirectional Suboptimal? • Cost to network to deliver data NOT MORE • Core is NOT a bottleneck • Core can be an endnode, does not need to forward data • With single exit point from “domain”, delay difference from source tree is negligible • Don’t need “optimal”. Need “good enough” Radia Perlman

  29. Bidirectional Trees Best • Per-Source Trees • Do NOT make network overhead lower (unless core is poorly chosen) • More state for net (n trees rather than one) • Only metric under which per-source tree is better is delay from source to each receiver • Bidirectional tree, with slight care, can ensure short paths to nearby members from any source Radia Perlman

  30. Choosing good bidirectional tree • From each domain (or region separated by expensive links), have routers agree on one exit point per IP address prefix • Choose core to be a member of the group, or close to a member of the group • No “bandwidth bottleneck” around core--it’s just a node in the tree • C can be endnode (only fwd tunneled pkts) Radia Perlman

  31. Good Bidirectional Tree R3 R1 R4 R2 Radia Perlman

  32. Next simplification • Forcing all routers in Internet to figure out C from M is too expensive and complicated • Instead, make group ID 8 bytes • Only extra work for endnode: look up 8 byte group ID rather than 4 bytes. • Eliminate need for multicast address allocation, domain-wide core advertisements, etc. Radia Perlman

  33. Simple Multicast • Bidirectional Tree • Group ID is (C,M) • To create group: choose C, ask C for M • Member discovers 8 byte (C,M) • via email, web page, SDR, directory, etc. • Include C and M in join or IGMP reply • Include C and M in data messages Radia Perlman

  34. Simple Multicast Variants • (C,G) in join, not in data messages • requires unique G’s • what if disagreement about C for G? • (C,G) in both join and data • explicitly (e.g., IP option) • MPLS • use link-local destination address Radia Perlman

  35. Link Local Destination Address A R1 R2 C Join C,G Join C,G Join C,G Ack C,G, use X1 Ack C,G, use X2 Ack C,G, use X3 Data, dest=X1 Data, dest=X2 Data, dest=X3 Radia Perlman

  36. Simple Multicast Variants, Cont’d • Express • 8-byte group ID (S,G) • Unidirectional Tree • If multiple senders • create multiple trees • tunnel to S Radia Perlman

  37. Issues (with good answers) • Access Control: controlling who sends by configuring “one” node • Reliability if core goes down • Backward compatibility (migrating nodes one at a time) Radia Perlman

  38. “Access Control” • Suppose want to restrict senders? • Express: S can choose not to forward from others • PIM: RP can be configured with authorized senders. Refuse to forward. (but members below 1st hop router will receive pkts) • SM: Core can be configured, and tell others in heartbeat Radia Perlman

  39. Access Control, Cont’d • What if list doesn’t fit in the heartbeat msg? • Only say no S “if needed” (after bad S sends) • Only say yes S if needed (S tunnels to core or asks permission of core) • Can have list of yes’s, no’s, or both Radia Perlman

  40. Multiple Groups for Availability • Rather than “backup core”, just create multiple groups (C1,M1), (C2,M2) and members join both • Transmit on one (one where you’re getting heartbeat). Receive on both. • Or if application requires absolute timeliness, transmit on both • Also, create multiple for load sharing Radia Perlman

  41. Multiple Groups • Interdomain policy might require a tree per source domain. • Create a single tree for each domain rather than one per source in that domain. • Can use shared tree like RP: If create extra auxiliary tree, have it advertised via heartbeat Radia Perlman

  42. Distributed Cores • If really want failover to another core • Have protocol among core capable routers • They advertise among themselves • Winner injects host route • Will be less overhead than PIM BSR protocol advertising throughout domain Radia Perlman

  43. Backward Compatibility • Simplest: look different so other multicast protocols won’t forward the packet • Assume incremental deployment • Join sent to Core. Unicast by non-SM rtrs • Data destination=core or M or tunnel endpoint Radia Perlman

  44. Automatically discovering Tunnel • R1 sends “join”. Destination=core • Forwarded until it reaches R2 • R2 notes pkt rcv’d from non-neighbor R1 • Adds “tunnel port” to R1 to state for (C,M) • Sends join-ack to R1 • R1 creates “tunnel port” to R2 as parent port for (C,M) Radia Perlman

  45. Tunnel needed R2 r R1 r r C D A R3 B R1 -- R2 and R2 -- C are “tunnels” IP option contains both C and M IP destination address has C or tunnel endpoint or M Radia Perlman

  46. New Protocol or New version of existing protocol? • No reason to do “totally new thing” • Two suggestions: bidirectional shared trees, and group ID=(C,G) • Suggestions orthogonal • CBT and BGMP already do bidirectional trees. PIM could be modified to do it • Easy to modify any of them to get core from pkt Radia Perlman

  47. Summary • Shared bidirectional trees • fewer trees to keep track of and maintain • more efficient than tunneling to core • Group ID C+M • trivial address allocation • no extra info for BGP to pass around • no “core capable router advertisements” • controlled selection of core for group Radia Perlman

  48. Summary • This stuff doesn’t have to be so complicated • It would be good for Internet if multicast really could allow millions of groups, easily formed by anyone Radia Perlman

More Related