470 likes | 474 Views
Presented by Kelly Whitacre Written by John W. Byers, Jeffrey Considine, Michael Mitzenmacher , Member, IEEE, and Stanislav Rost. Informed Content Delivery Across Adaptive Overlay Networks. Problem.
E N D
Presented by Kelly Whitacre Written by John W. Byers, Jeffrey Considine, Michael Mitzenmacher, Member, IEEE, and Stanislav Rost Informed Content Delivery Across Adaptive Overlay Networks
Problem Distributing a large new file across the Internet to millions of users simultaneously has proven to be challenging
Possible Solution: Point-to-Point? Wasted Bandwidth Limited Transfer Rates • Having individual point-to-point connections from a single source wastes bandwidth • Server must handle load of possible many clients • Bandwidth costs money • Server should utilize available Bandwidth • Transfer rates are limited by the characteristics of the end-to-end paths
Possible Solution: IP Multicast? Pros Cons • Solves bandwidth problems of point-to-point • Server sends one copy • Network handles the rest • No flow control • No retransmission of lost packets • Limited deployment
Reliable Multicast • Digital fountain approach • Erasure codes—sends parity information with packets to recover lost (no feedback channels are needed to ensure reliable delivery) • Recirculation—information is re-circulated (fountain) for asynchronous client arrivals • Parallel Transfer rates—heterogeneous client transfer rates so as to not flood network
Digital Fountain Approach k Source Instantaneous Encoding Stream Transmission k Received Instantaneous k Message Can recover file from any set of k encoding packets. Source: http://www.sigcomm.org/sigcomm98/tp/abs_05.html
0 hours 1 hour 2 hours 3 hours 4 hours 5 hours Digital Fountain Approach Transmission File User 1 User 2 Source: http://www.sigcomm.org/sigcomm98/tp/abs_05.html
Cyclic Interleaving Transmission Encoded Blocks Interleaved Encoding Blocks Encoding Copy 1 File Encoding Copy 2 Tornado Encoding Source: http://www.sigcomm.org/sigcomm98/tp/abs_05.html
Solution: Adaptive Overlay Networks Source: http://www.cs.virginia.edu/~mngroup/hypercast/designdoc/Chp1-Overview/Chp1-Overview.html
Adaptive Overlay Networks Differs from IP Multicast • Do not use Multicast tree • Flexibly adapt to changing network conditions • End systems are explicitly required to collaborate! • Can improve performance by additional cross-connections and active collaboration
Addressing Limitations: Content Delivery Scenario Consider: Initial Delivery Tree S = Source Shaded Area = each node has a working set of packets, the subset of packets it has received
Addressing Limitations: Improving Transfer Rates Harnessing the Power of Parallel Downloads Tree Directed Acyclic Graph Establishing concurrent connections to multiple servers or peers with complete copies of the file
Addressing Limitations: Improving Transfer Rates Harnessing the Power of Collaborative Transfer Establishing concurrent connections to multiple peers
Addressing Limitations: Improving Transfer Rates Power of Cross-Connections & Collaboration (d) depicts the portions of content which can be beneficially exchanged via pair-wise transfers
Considerations • (a) & (b) impede the full flow of content to downstream receivers • Opportunistic connections of (c) & (d) allow for higher transfer rates • Yet, demand more careful orchestration between end systems • Must determine set difference of working sets • Reconciliation is simple in working sets limited to small contiguous blocks • Limits flexibility of frequent changes that arise in AON
Content Delivery Across Adaptive Overlay Networks Challenges Stateful vs. Non-Stateful Solutions
Adaptive Overlay Networks in a Fluid Internet Challenges … Need to … • Asynchrony • Receivers may open and close connections or leave and rejoin the infrastructure at arbitrary times • Heterogeneity • Connections vary in speed and loss rates • Transience • Routers, links, and end systems may fail and their performance may fluctuate over time • Scalability • The service must scale to large receiver populations and large content • Adaptively detect and avoid congested or temporarily unstable areas of the network • Dynamically establish paths with the most desirable end-to-end characteristics • Deliver useful content, often in parallel with a minimum of setup overhead and message complexity
Limitations of Stateful Solutions Addresses A significant per-connection state • Issues of connection • Connections that vary in speed and loss rates • Clients coming and going at arbitrary times • Is highly unscalable • May impact performance • state must be maintained in the face of reconfiguration and reconnection • With parallel downloading is problematic
Alternative: Encoded Content through Digital Fountain Approach • Digital Fountain Approach • Resilience to packet loss—erasure-correcting code • Guarantee • Claims : recover the original source file from any subset of distinct symbols in the encoding stream equal to the size of the original file • In practice : recover a file from a few percent more than the number of symbols in the original file
Encoded Content through Digital Fountain Approach Pros • Continuous Encoding • Senders with a complete copy of a file may continuously produce fresh encoding symbols • Time Invariance • New encoding symbols are produced independently from symbols produced in the past • Tolerance • Digital fountain streams are useful to all receivers regardless of the times of their connections or disconnections and their rates of sampling the stream • Additivity • Parallel downloads from multiple servers with complete copies of the content require no orchestration Stateless!
Encoded Content through Digital Fountain Approach Cons • Encoding/Decoding Overhead • Reconciliation methods are needed for those collaborating end systems have only a portion of the content
Reconciliation and Informed Delivery Coarse-grained reconciliation Speculative transfers Fine-grained reconciliation
Note: Approaches proposed are local in scope and typically involve a pair or a small number of end systems Goal is to provide the most cost-effective reconciliation mechanisms measuring cost both in computation and message complexity
Coarse-Grained Reconciliation • Estimate resemblance working sets of pairs of nodes prior to establishing connections • Quick estimates of the fraction of symbols common to the working sets of both peers • Approach 1: Employs Random Sampling • Approach 2: Employs sketches of each peer’s working set • High-level information • Lightweight, computed efficiently • Incrementally updated • Fit into a single 1-kB packet
Notation & Framework • Let peers A and B have working sets SAand SB containing symbols from an encoding of the file • Containment • The containment of B in A is the quantity • Resemblance • The resemblance of A and B is the quantity
Notation & Framework • Each element of a working set is identified by an integer key (sending an element entails sending its key) • Keys are distributed over the key space uniformly at random • With 64-bit keys, a 1-kB packet can hold roughly 128 keys • Can be the same • If the elements are determined by a hash function seeded by the key, two keys may generate the same element with small probability • Minimal impact
Random Sampling Select elements of the working set at random and transport those to the peer.
Random Sampling Pros Cons • Unbiased estimate of containment • Can be incrementally updated using reservoir sampling • Must search its own working set for each element in random set • Do not easily allow one peer to check the resemblance between prospective peers • A cannot check resemblance between B & C
Min-Wise Sketches Calculates working set resemblance based on min-wise sketches
Min-Wise Sketches ∏i represents a random permutation on the key universe A sends B a vector of A’s minima (elements that lie in both sets) B Counts the number of positions where the two are equal Divides by the total number of permutations The result is an unbiased estimate of the resemblance
Min-Wise Sketches Pros Cons • Unbiased estimate of resemblance • Allows similarity comparisons given any two sketches for any two peers • A can check resemblance between B and C • Truly random permutations cannot be used • Storage requirements are impractical • Possibility of false positives • ∏i values are hashed to fewer bits to allow for more sketch elements in packet • (Details not discussed)
Speculative Transfers • Involve a sender performing “educated guesses” as to which symbols to generate and transfer • Send symbols which are probably useful to the other • This process can be fine-tuned using the results of coarse-grained reconciliation
Speculative Transfers • When containment of B in A is low, speculative transfers is trivial since most of B’s symbols are useful to A • When containment of B in A is high, strategy is inefficient—use recoding
Recoding • A recoding symbol is simply the bitwise XOR of a set of encoding symbols • Must be accompanied by a specification of the encoding symbols blended to create it • Must explicitly list the random seeds of the encoding symbols from which it was produced
Encoding/Decoding Recoding Symbols • Similar to the substitution rule • Example—peers with y5, y8, y13 generate recoding symbols: • Z1 = y13 • Z2 = y5 XORy8 • Z3 = y5 XOR y13 • Peer receives Z1, Z2, Z3 can recover y13 • By substitution recover y5 & y8
Fine-grained Reconciliation • Is a set-difference problem • Tries to determine the exact difference of SA- SB • Many approaches • Polynomial-Based • Enumeration-Based • Bloom filter • Search-Based • Approximate Reconciliation Trees (ART) which combine the compact representation of Bloom filters with the speed of a search-based approach
Bloom Filter • A set of n elements that represent the working set calculated by independent random hash functions • Flow • Peer A sends B a Bloom filter FA of SA • Peer B then checks for each element of SBin FA • Peer B has determined SA- SB • This solution is effective particularly when the number of differences is a large fraction of the set size
Experimental Results Demonstrate the benefits and costs of using reconciliation in peer-to-peer transfers and in parallel downloads
Simulation Parameters • All consider transfer of a 128-MB file • Origin server • Divides this file into input symbols of 1400 bytes each (fit it in an Ethernet packet with headers) • Encodes this file into a large set of encoding symbols • Associate each encoding symbol with a 64-bit identifier representing the set of input symbols used to produce it • Min-wise sketches used 180 permutations, yielding 180 entries of 64 bits each for a total of 1440 bytes per summary • Bloom filters used 6 hash functions and 8(1 + 0.0025)L bits for a total of 96 kB per filter
Collaboration Methods • Uninformed • The sending peer picks a symbol to send at random • Speculative • The sending peer uses a min-wise sketch from the receiving peer to estimate the containment • Reconciled • The sending peer uses either a Bloom filter or an ART from the receiving peer to filter out duplicate symbols and sends a random permutation of the differences.
Scenarios and Evaluation • Varying 3 experimental factors: • Set of connections in the overlay formed between sources and peers • Distribution of content among collaborating peers • Slack of the scenario (1.1 & 1.3) • When smaller than (1+ decoding overhead), the set of peers will be unable to recover the file • When larger than (1+decoding overhead), the set of peers will most likely recover the file • Methods provide the most significant benefits over naive methods when there is only a small amount of slack
Scenario 1: Two peers with Partial Content • One peer sends symbols to the other % of Shared Encoding Symbols • Uninformed collaboration performs poorly and degrades significantly as the containment increases • Speculative collaboration is more efficient, but the overhead still increases slowly with containment • Overhead of reconciliation is purely from the cost of transmitting a Bloom filter or ART (less than a %)
Scenario 2: Download from a Server with Complete Content • With concurrent transfer from a peer % of Shared Encoding Symbols • Uninformed collaboration overhead is considerably lower than in the scenario 1 (larger fraction of the content is sent directly via fresh symbols from the server) • Speculative collaboration performs similarly to scenario 1 • Reconciled collaboration has overhead slightly higher than receiving symbols directly from the server
Scenario 3: Parallel Download from Peers with Partial Content • Collaborating With Multiple Peers in Parallel % of Shared Encoding Symbols • Can leverage bandwidth from peers with partial content with only a slight increase in overhead • Uninformed collaboration performs extremely poorly • Speculative collaboration dramatically improves as containment increases • Reconciled collaboration has much higher overhead than before
Conclusions • Adaptive overlay networks offer a powerful alternative to traditional mechanisms for content delivery • Flexibility, scalability, and deploy-ability. • Informed and effective collaboration between end systems can be achieved through the digital fountain approach • Care is needed to provide methods for representing and transmitting the content in a manner that is as flexible and scalable as the underlying capabilities of the delivery model
Supplemental Reading and Resources • A Digital Fountain Approach to Reliable Distribution of Bulk Data http://www.ecse.rpi.edu/Homepages/shivkuma/teaching/sp2001/readings/digital-fountain.pdf • ACM SIGCOM ’98, A Digital Fountain Approach to Reliable Distribution of Bulk Data http://www.sigcomm.org/sigcomm98/tp/abs_05.html