430 likes | 437 Views
Part III: Overlays, peer-to-peer. Jinyang Li. Overlays are everywhere. Internet is an overlay on top of telephone networks Overlays: a network on top of Internet Endpoints (instead of routers) are nodes Multi-hop paths among routers are links Instant deployment!. What can overlays do?.
E N D
Part III: Overlays, peer-to-peer Jinyang Li
Overlays are everywhere • Internet is an overlay on top of telephone networks • Overlays: a network on top of Internet • Endpoints (instead of routers) are nodes • Multi-hop paths among routers are links • Instant deployment!
What can overlays do? • Routing • Improve routing robustness (e.g. convergence speed) • Multicast • Anonymous communication • New applications • Peer-to-peer file sharing and lookup • Content distribution networks • Peer-to-peer live streaming • Your imagination is the limit
Why overlays? • Internet is ossified • IPv6 proposed in 1992, still not widely deployed • Multicast (1988), QoS (early 90s) etc. • Avoid burdening routers with new features • End hosts are cheap and capable • Copy and store files • Perform expensive cryptographic operations • Perform expensive coding/decoding operations • …
Today’s class • Overlays that take over routers’ jobs • Resilient Overlay Networks (RON) • Application-level multicast (NICE)
RON’s motivation • Internet routing is not reliable
Internet routing is unsatisfactory • Slow in detecting outage and recovery • Unable to use multiple redundant paths • Unable to detect badly performing paths • Applications have no control of paths Q: Why can’t we fix BGP? Q2: Hasn’t multi-homing already solved the fault tolerance problem?
BGP converges slowly Given a failure, can take up to 15 minutes to see BGP.Sometimes, not at all. [Feamster]
RON in a nutshell A small set of (<100) nodes) • What failures? • Outages: configuration/software error, broken links • Performance failures: severe congestion, Dos attacks Scalable BGP-based IP routing substrate
RON’s goals • Fast failure detection and recovery • Detect & fail-over within seconds • Applications influence path selection • Applications define failures • Applications define path metrics • Expressive and fine-grained policies • Who and what applications are allowed to use what paths
Why would RON work? • RON testbed study (2003): • About 60% of failures within two hops of edge • RON routes around many link “failures” • If exists a node whose paths to S, D doe not contain failed link • RON cannot route around access link failure
Conduit Conduit Forwarder Forwarder Router Prober Router Prober Link-state routing protocol, disseminates info using RON! RON Design Nodes in Different ASes RON library Performance Database Application-specific routing tables Policy routing module
RON reduces loss rate 30-min avg loss rate on Internet 30-min avg loss rate with RON RON loss rate is never more than 30%
RON routes around failures 30-minute average loss rates 6,825 “path hours” represented here 5 “path hours” of 100% loss (complete outage) 38 “path hours” of TCP outage (>= 30% loss) RON routed around all of these! One indirection hop provides almost all the benefit!
Lessons of RON • End hosts know better about performance and outages than routers • Internet routing trades off scalability for performance and fast failover • A small amount of redundancy goes a long way
Scalability Performance (fast convergence etc.) Flexibility (application specific metric & policy) RON’s tradeoff BGP ??? Routing overlays (e.g., RON)
Open Questions • Efficiency • generates redundant traffic on access links • Scaling • Probing traffic is O(N^2) • Can a RON be made to scale to > 50 nodes? • Is a 1000 node RON much better than 50-node? • Interaction of overlays and IP network • Interaction of multiple overlays
Application level multicast A.k.a. overlay multicast End host multicast
Why multicast? • Send the same stream of data to many hosts • Internet radio/TV/conference • Stock quote dissemination • Multiplayer network games • An efficient way to send data to many hosts
Naïve approach is wasteful • Sender’s outgoing link carries n copies of data • 128Kbps mp3 stream, 10,000 listeners = 1.28Gbps
IP multicast service model • Mimic LAN broadcast • Anyone can send, everyone hears • Use multicast address • 224.0.0.0 -- 239.255.255.255 (2^28 addresses) • Each address is called a “group” • End hosts register with routers to receive packets
Basic multicast techniques • Construct trees • Why trees? (why not meshes?) • How many trees? • Shared vs. source specific trees • Criteria of a “good” tree? • Who build trees? • Routers vs. end hosts
IP multicast • Routers construct multicast trees for packet replication and forwarding • Efficient (low latency, no dup pkts on links)
IP multicast: Augmenting DV • How to broadcast using DV routing tables without loops? • Idea: shortest paths from S to all nodes form a tree • RPF protocol: A router duplicates and forwards all packets if they arrive via the shortest path to S
a: a, 0 b: b, 1 c: c, 10 d: c, 11 c: c, 1 d: d, 0 a: a, 1 b: b, 0 c: c, 1 d: c, 2 a: a, 10 b: b, 1 c: c, 0 d: d, 1 Reverse path flooding (RPF) a • C does not forward packets from A and vice versa • However, link a <--> c sees two packets 1 d b 10 1 1 c
Reverse path broadcast (RPB) • RPF causes every ‘upstream’ routers on a LAN (link) to send a copy • RPB: only one router sends a copy • Routers listen to each others’ DV advertisements • Only the one with lowest hopcount sends
IP multicast: augmenting DV • Requires symmetric paths • Needs to prune unnecessary broadcast packets to achieve multicast [Deering et. Al. SIGCOMM 1988, TOCS 1990]
IP multicast: augmenting LS • Basic LS: each router floods with changes in link state • LS w/ multicast: routers monitor local multicast group membership and changes result in flooding • Routers use Dijkstra to compute SP trees • How expensive to compute trees for N nodes, E edges, G groups?
IP multicast has not taken off • Requires support from routers • Do ISPs have incentives to support multicast? • Not scalable • Routers keep state for every active group! • Multicast group addresses cannot be aggregated • Group membership changes much more frequently than links going up and down • Difficult to provide congestion/flow control, reliability and security
Overlay multicast • Multicast code run on end hosts • End hosts can copy&store data • No change to IP infrastructure needed • Easy to implement complex functionalities: flow control, security, layered multicast etc. • Less efficient: higher delay, duplicate pkts per link
Overlay multicast challenge • How can hosts form an efficient tree? • Hosts do know all that routers know • What’s wrong with a random tree? • Stretch: packets travel farther than have to • Stress: packets traverse links multiple times • A particular concern with access links and cross country links
Cluster-based trees (NICE) Reside in 1 cluster • A hierarchy of clusters • Cluster consists of [k,3k-1] members • Log N depth Reside in 2 clusters Reside in 3 clusters
Cluster-based trees (NICE) • Each node knows all members of its cluster(s)
Cluster-based trees • Cluster nodes according to latency • packets do not travel too far out of the way • Not perfect • Packets are sent to cluster heads (who are in the middle) so might overshoot
NICE in action • How to join a hierarchy? • Which is the right cluster? • How long does join take? • How to split/merge clusters? • What if a cluster head fails?
When do clustering not work well? Cogent MCI • Key assumption: low latency is transitive • As a node descends tree to join, assumes children of close-by cluster head are also close-by MIT Harvard Boston U MIT & Harvard peers with each other
Lessons • Where should a functionality reside? Routers vs. end hosts • End hosts • Scalability vs. Performance • Flexibility • Instant deployment! • Routers • Efficiency
Project draft report • You should be able to reuse your draft for the final report • You should have complete related work by now • You should have a complete plan • Most of the system design • Most of the experiment designs • If you have preliminary graphs, use them, try to explain them
The sandwich method for explanation • An easy example illustrating the basic idea • Detailed explanations of challenges and how your system addresses them • Does it work in general environments?