320 likes | 339 Views
Explore network coding for cooperative content distribution at scale, comparing to traditional methods, with a focus on performance evaluation and future implications.
E N D
Network Coding for Large Scale Content Distribution Pablo Rodriguez Microsoft Research Christos Gkantsidis Georgia Institute of Technology IEEE INFOCOM 2005 Presented by Ryan
Outline • Introduction • Related Works • Model for Cooperative Content Distribution • Performance Evaluation • Conclusion and Future Works
Introduction • Large Scale Content Distribution • Typical content distribution solutions • CDN – Content Delivery Network • Placing dedicated equipment around the network • e.g. Akamai • Cooperative content distribution solutions • Self-scalable • Preventing sudden surge of traffic to the source • e.g. BitTorrent
Introduction • Network Coding • Allowing intermediate nodes to encode packets • Making optimal use of the available network resources
Introduction • An example • Without a global coordinated scheduler • Node B, receiving Packet 1 or 2 from Node A?
Introduction • Contributions in the Paper • Proposing a practical system based on network coding • Not require the knowledge of the underlying topology and centralized scheduling • Robust to extreme situations with sudden server and nodes departures • Better performance comparing to source coding and no encoding schemes
Related Works • Tree-Based Cooperative Systems • Creating and maintaining shortest-path multicast trees • Bandwidth-limited (by the bottleneck link on the path from the server) • e.g. SplitStream
Related Works • Mesh Cooperative Architectures • Improving the download rates by using parallel downloads • Under-utilizing the network resources (the same block traveling over multiple competing paths) • e.g. BitTorrent
Related Works • Erasure Codes • Reconstructing the original content of size n from roughly a subset of any n symbols from a large universe of encoded symbols • Network Coding • Based on theoretical calculations (with the detailed knowledge of the topology and a centralized scheduler)
The Model • Server • Dividing the file into k blocks • Uploading blocks at random to different clients • Clients (Users) • Collaborating with each other to assemble the blocks and reconstruct the original file • Exchanging information and data with only a small subset of others (neighbors) • Symmetric neighborhood and links
The Model • Upon arrival • Contacting a centralized server (like the tracker in BitTorrent) to get a random list of users in the system • Connecting to the returned users to construct the neighborhood
The Model • Content Propagation • 1) No Coding • 2) Source Coding • 3) Network Coding
The Model • No Coding and Source Coding • Based only on local information for deciding which block to transfer • Random • A random block • Local Rarest • The rarest block in the neighborhood
The Model • e.g. BitTorrent system • A combination of the Random and Local Rarest schemes • Random for the first few blocks • Local Rarest afterwards
The Model • Network Coding • The node generates and sends a linear combination of all the information available to it
The Model • Recovering the original file after receiving k blocks (associated coefficient vectors are linearly independent to each other) • Just solving the system of linear equations
The Model • Incentive Mechanisms • Discouraging free-riding • Scheme 1 • Preference to mutual exchanges • Scheme 2 (Tit-for-tat) • Bounding the absolute difference of uploading minus downloading from one to another
Performance Evaluation • Round based simulator • Input • Overlay topology • Users’ upload and download capacities • Server’s capacity • Capacity: number of blocks that can be downloaded/uploaded in a single round • Size of file to distribute • Metric • Download finish time
Performance Evaluation • Connecting to 4 peers when joining • Max number of neighbors = 6 • Discovering new neighbors when the utilization of the download capacity is below a certain threshold (10%)
Performance Evaluation • Homogeneous topologies • 200 users with capacity = 1 • Server’s capacity = 1 • File size = 100 blocks No Coding Source Coding Network Coding
Performance Evaluation • Topologies with clusters • Two clusters, 100 users each • Capacity • Within cluster = 8 • Cluster to cluster = 4 • Server • Capacity = 4 • Departing at round 30 • File size = 100 blocks
Performance Evaluation No Coding Source Coding Network Coding
Performance Evaluation • Heterogeneous capacities • 10 fast users with capacity = 4 • 190 slow users with capacity = 1 • Server’s capacity = 4 • File size = 400 blocks No Coding Source Coding Network Coding
Performance Evaluation • Minimum finish time for the fast users = 50 rounds
Performance Evaluation • Dynamic Arrivals • 40 empty nodes every 20 rounds • Capacity = 1 • Staying in the system 10 more rounds after finishing • Server’s capacity = 1 • File size = 100 blocks
Performance Evaluation • Robustness to node departures
Performance Evaluation • Leaving after serving 5% extra blocks • Network coding : 100% finish • Source coding : 40% finish • No coding : 10% finish Network Coding Source Coding No Coding
Performance Evaluation • Incentive mechanisms • Max difference = 2 (tit-for-tat)
Conclusion • A new content distribution system • Not require knowledge of the whole network topology • Easy to schedule content propagation • Good performance in simulations • Download finish time • Robust to server and users departures • Avalanche – a real system implementation using network coding
Future Works • Speed of encoding and decoding • Encoding : O(k) • Decoding : inverting a matrix O(k3), reconstructing the file O(k2) • Dominated by reconstruction • Many reads of large blocks from the harddisk • Protection against malicious nodes • Introducing arbitrary blocks • Making the reconstruction of the original file impossible