FastReplica : Efficient Large File Distribution within Content Delivery Networks

FastReplica:Efficient Large File Distribution within Content Delivery Networks Lucy Cherkasova and Jangwon Lee HPLabs, Palo Alto UT Austin

What Is the Problem? • Content Delivery Networks (CDNs): • large-scale distributed network of servers, • servers are located closer to the edges of the Internet. • Main goal of CDN’s architecture is • minimize the network impact in the content delivery path, • overcome server overload problem for popular sites. • Content distribution within CDNs, i.e. to the edge servers • pull model: performance penalty is insignificant for small/medium documents; • push model: active replication of the original content is desirable for large documents such as software download packages, media files, etc. • Replicating a large file to a large set of edge servers is a challenging and resource intensive task!!!

Content Distribution in the Internet Environment • Satellite distribution • content distribution server (or original site) has a transmitting antenna, • replica-servers (edge servers) have a satellite receiving dish, • content distribution server broadcasts a file via satellite channel, • requires special hardware, expensive. • Multicastdistribution • requires a multicast support in routers, • not widely available across the Internet infrastructure. • Application-level multicast distribution • nodes act as intermediate routers to distribute a content along predefined mesh or tree • performance is limited by the bottleneck link in the path, • informed content delivery across adaptive overlay networks (SIGCOMM,2002)

What Do We Propose? FastReplica PresentationOutline: • FastReplica in the small (algorithm core, applicable to 10-30 nodes) • Preliminary performance analysis of FastReplica in the small • FastReplica in the large (scaling algorithm core to thousands of nodes) • Reliable FastReplica Algorithm • Performance evaluation of FastReplica prototype in a wide-area testbed

FastReplica in the Small • Problem Statement: • Let N0 be a node which has an original file F and let Size(F) denote the size of file F in bytes; • Let R = {N1, … ,Nn} be a replication set of nodes. The problem consists in replicating file F across nodes N1, … ,Nn while minimizing the overall replication time. • Let set N1, … ,Nn be in a range 10-30 nodes. • File F is divides in nequal subsequent files: F1, … ,Fn where Size(Fi) = Size(F) / n bytes for each i = 1, … , n. • FastReplica in the small consists of two steps: • Distribution Step , • Collection Step.

F3 F2 F n-1 F1 F n FastReplica in the Small: Distribution Step N3 N2 Nn-1 N1 N n File F N0 F1 F2 F3 F n-1 F n • Origin node N0 opens n concurrent connections • to nodes N1, … ,Nn and sends to each node the following items: • a distribution list of nodes R = {N1, … ,Nn} to which subfile Fi has to be sent • on the next step; • subfile Fi .

N3 F3 F2 Nn-1 N2 F1 F1 F1 F1 N1 F1 N n File F N0 F1 F2 F3 F n-1 F n FastReplica in the SmallCollection Step (View “from a Node”) F n-1 F n After receiving Fi , node Ni opens(n-1)concurrent network connections to remaining nodes in the group and sends subfile Fi to them.

N3 F3 F2 N2 Nn-1 F3 F2 F n-1 F n-1 F1 N1 F n F n N n File F N0 F1 F2 F3 F n-1 F n FastReplica in the SmallCollection Step (View “to a Node”) • Thus each node Ni has: • (n - 1) outgoing connections for sending subfile Fi , • (n - 1) incoming connections from the remaining nodes in the group for sending complementary subfiles F1, … , Fi-1 ,Fi+1 , … , Fn .

What Is the Main Idea of FastReplica? Instead of typical replication of the entire file F to nnodes using nInternet pathsFastReplica exploits(n x n)different Internet paths within the replication group, where each path is used for transferring 1/n-th of file F. Benefits: • The impact ofcongestion along the involved paths is limited for a transfer of 1/n-th of the file, • FastReplica takes advantage of the upload and downloadbandwidth of recipient nodes.

Preliminary performance analysis of FastReplica in the small • Two performance metrics: average and maximum replication time. • Idealistic setting: all the nodes and links are homogeneous, and let each node can support n network connections to other nodes at B bytes/sec. Timedistr = Size(F) / (nxB) Timecollect = Size(F) / (nxB) FastReplica: TimeFR =Timedistr + Timecollect = 2 x Size(F) / (nxB) Multiple Unicast: TimeMU = Size(F) / B Replication_Time_Speedup = TimeMU / TimeFR = n / 2

Uniform-Random Model Let BW denote bandwidth matrix, where BW [i][j] reflects available bandwidth of the path from Ni to Nj. Let BW [i][j] = B x random(1,Var), where Var is a bandwidth variance.

N3 F3 F2 N2 Nn-1 F n-1 F1 N1 F n N n File F N0 F1 F2 F3 F n-1 F n Maximum Latency Speedup under Uniform-Random Model Worst path transferring the entire file F against worst path with two segments transferring 1/n-th of file F leads to n/2 in maximum latency improvement.

0.1B 0.1B B B B 0.1B B Example with Skewed Path Bandwidth Bandwidth of Paths N3 F3 0.1B F2 B N2 N9 F 9 F1 N1 F 10 N 10 At a first glance, the cross-nodes connections have significantly worse available bandwidth. Question: What is FastReplica performance in this configuration? File F N0 F1 F2 F3 F 9 F 10

FastReplica Performance for “Skewed” Example While the average replication time is almost the same under Fastreplica and Multiple Multicast, the maximum replication time under Fastreplica provides 5 times performance benefits!

0.1B 0.1B 0.1B B B B B B Modified Example N3 F3 Bandwidth of Paths F2 0.1B N2 N9 B F 9 F1 N1 F 10 N 10 Let all the connections from origin node to recipient nodes are B, while all the cross-nodes connections have available bandwidth of 0.1B. Question: What is performance of FastReplica in this configuration? File F N0 F1 F2 F3 F 9 F 10

FastReplica Performance for Modified “Skewed” Example • In thisconfiguration, FastReplica does not provide any performance benefits • compared to Multiple Multicast. • Number n of nodes in FastReplica in the small plays an important role here: • a larger value of n provides a higher “safety” level for FastReplica performance. • A larger value of n helps to offset a higher difference in bandwidth between • the available bandwidth from the origin node to the nodes in the • replication group, and • the available bandwidth within the replication group.

FastReplica in the Large • Scaling Process: • All the nodes are partitioned in groups of k nodes, • where k is a number of network connections chosen for concurrent transfers between a single node and multiple receiving nodes. • Once a group of nodes receives the entire file F, they act as origin nodes and replicate file F to the • next set of nodes. • Example. Let k =10. • In 3 iterations(each taking 2 steps: • distribution and collection), • the original file can be replicated to • 1000 nodes (10x10x10).

Reliable FastReplica • The basic algorithm is sensitive to node failures: • if node N1 fails during either distribution or collection step then this event may impact all the nodes N2 , … , Nn in the group because each node depends on node N1 to receive subfile F1. • if node N1 fails when it acts as an origin node, this failure impact all of the replication groups in the dependent replication subtree. • Goal: to design an algorithm which efficiently deals with node failures by makinglocal repair decision within the particular group of nodes.

Reliable FastReplica • Heartbeat Group: origin and recipient nodes: • the recipient nodes send heartbeat messages to the origin node: • “I’m alive. I perform a distribution (or collection) step to nodes • {Ni1, ….,Nij} in group G/ “. • Different failure modes of a node: • node acts as an origin node; • node acts as a recipient node performing distribution/collection step. ^ If node N/0 fails while acting as origin node for replication group G / then G/ should be “reattached” to a higher-level origin node N0 and N0 acts as a replacement node for N/0 N0 ^ G ^ / N0 / G / / / N1 Ni N k … … / / G

/ N i / N2 / Nk-1 Fi F2 F k-1 F1 F k / N1 / N k Fi Fi Fi Fi File F / N0 F1 F2 F3 F n-1 F n Reliable FastReplica (cont.) • If N/i failswhile acting as a recipient node either during • collection (or distribution) step then N/0 performs the • following repair step:

Reliable FastReplica (cont.) • Proposed algorithm handles a single node failure within a group with minimal performance penalty. • The number of heart-beat messages in such a group is very small (because only the recipient nodes send the heart-beat messages to their origin node).This structure significantly simplifies the protocol.

Performance Evaluation of FastReplica Prototype in a Wide-Area Testbed Thanks to our summer interns, we built a wide-area testbed of 9 nodes and used it for performance evaluation of FastReplica prototype.

N7 N2 N6 N8 N0 N4 N5 N3 N1 Experimental Wide-Area Testbed Geographic location of hosts:

Goals of Performance Study • We compare the following distribution schemes: • FastReplica in the small • Sequential Unicast -- approximates distribution via IP multicast, measures transfer time of entire file from the source to each recipient independently; • Multiple Unicast -- simultaneously transfers the entire file to all the recipient nodes by using concurrent connections. • We evaluate two metrics: • average replication time • maximumreplication time • We experimented with 9 different size files: 80 KB, 750 KB, 1.5 MB, 3 MB, 4.5 MB, 6 MB, 7.5 MB, 9 MB, 36 MB. • Each point in the results averages 10 different runs which were performed over 10 day period.

Average Replication Time n paths transferring the entire file vs(n x n) paths transferring only 1/n-th of the file Congestion on any of the n paths from origin node to recipient nodes impact both Multiple Unicast and Sequential Multicast. FastReplica uses any of those paths for transferring only 1/n-th of the file. FastReplica significantly outperforms Multiple Unicast and, in most cases , outperforms Sequential Multicast.

Maximum Replication Time FastReplica significantly outperforms both Multiple Unicast and Sequential Multicast. Maximum replication time under Multiple Unicast and Sequential Multicast is much higher than corresponding average replication time.

FastReplica: Average and Maximum Replication Times Maximum and average replication time under FastReplica are very close. These results demonstrate the robustness and predictability of performance results under new strategy.

FastReplica Performance (cont.) Figure shows the average replication time measured by different, individual recipient nodes for a 9MB file and 8 nodes in replication set. High variability of replication time under Multiple and Sequential Multicast. File replication time under FastReplica across different nodes in replication set are much more stable and predictable.

Average and Maximum Time Speedupunder FastReplica FastReplica significantly outperforms Multiple Unicast. For configuration of 8 nodes, performance benefits are 4 (aver) - 13 (max) times for a 1.5 MB file, 3.5 (aver) - 5 (max) times for a 9 MB file, 4 (aver) - 6.5 (max) times for a 36 MB file

File Size Sensitivity Analysis The files of 80 KB and 750 KB are the smallest ones used in our experiments. For a 80 KB, FastReplica is not efficient, while for 750 KB, it becomes efficient. (These results are dependent on the number of nodes in the replication set!!!).

Experiments with Different Configuration • The additional analysis revealed that the available bandwidth of the paths between the origin node N0 (hp.com) and nodes N1, N2 , … , N7 (universities’ machines) is significantly lower than the cross bandwidth between nodes N1 , N2 , … , N7 . Node N8 had also a very limited incoming bandwidth from N0, N1 , … , N7. The outgoing bandwidth from N8 to N0, N1 , … , N7 was significantly higher. • Different configuration: let N1 (utexas.edu) be the origin node. • What is FastReplica performance in a new configuration?

FastReplica Speedup in a New Configuration In thenew configuration, the average replication times under FastReplica and Multiple Unicast are similar, but the maximum speedup under FastReplica is significantly better than under Multiple Unicast.

Conclusion and Future Directions • In this work, we introduce FastReplica for efficient and reliable replication of large files in the Internet environment • FastReplica is simple and inexpensive. It does not require any changes or modification to the existing Internet infrastructure, and significantly reduces the file replication time. • Interesting future directions are • how to better cluster nodes in the replication groups? • how to build an efficient overlay tree on top of those groups? • designing ALM-FastReplica via combination of FastReplica’s ideas with ALM (Application Level Multicast).

Acknowledgements We would like to thank: • HPLabs summer interns who helped us to build wide-area testbed: Yun Fu, Weidong Cui,Taehyun Kim, Kevin Fu, Zhiheng Wang, Shiva Chetan, Xiaoping Wei, and Jehan Wickramasuriya; • John Apostolopoulos for motivating discussions; • John Sontag for his active support of this work; • our shepherd Srinivasan Seshan and the anonymous referees for useful remarks and insightful questions. Their help is highly appreciated !

FastReplica : Efficient Large File Distribution within Content Delivery Networks