240 likes | 350 Views
Performance Analysis of a Parallel Downloading Scheme from Mirror Sites Throughout the Internet. Allen Miu, Eugene Shih 6.892 Class Project December 3, 1999. Overview. Problem Statement Advantages/Disadvantages Operation of Paraloading Goals of Experiment Setup of Experiment
E N D
Performance Analysis of a Parallel Downloading Scheme from Mirror Sites Throughout the Internet Allen Miu, Eugene Shih 6.892 Class Project December 3, 1999
Overview • Problem Statement • Advantages/Disadvantages • Operation of Paraloading • Goals of Experiment • Setup of Experiment • Current Results • Summary • Questions
Problem Statement: Is “Paraloading” Good? Paraloading is the downloading from multiple mirror sites in parallel. Mirror C Paraloader Mirror A Mirror B
Advantages of Paraloading • Performance is proportional to the realized aggregate bandwidth of the parallel connections • Less prone to complete download failures compared to the single connection download • Facilitates dynamic load balancing among parallel connections • Facilitates reliable, out-of-order delivery (similar to Netscape)
Disadvantages of Paraloading • Can be overly aggressive • Consumes more server resources • Overhead costs for scheduling, maintaining buffers, and sending block request messages • Only effective when mirror servers are available
Step 1: Obtain Mirror List • Hard-coded • DNS? Mirror List Mirror C Paraloader Mirror B Mirror A
Step 2: Obtain File Length Mirror C Paraloader Mirror B Mirror A
Step 3: Send Block Requests Mirror C Paraloader Mirror B Mirror A
Step 4: Re-order Mirror C Paraloader Mirror B Mirror A
Step 5: Send Next Request Mirror C Paraloader Mirror B Mirror A
Goals of Experiment • Main goal: To compare the performance of serial and parallel downloading • To verify the results of Rodriguez et al. • To examine whether varying the degree of parallelism, the number of mirror servers used, affects performance • To gain experience with paraloading and to find out what issues are involved in designing efficient paraloading systems
Experiment Setup • Implemented a paraloader application in Java, using HTTP1.1 (range-requests and persistent connections) • Files are downloaded at MIT from 3 different sets (kernel, mars, tucows) of 7 mirror servers • Degree of parallelism examined: M = 1, 3, 5, 7 • Downloaded a 1MB and a 300KB file (S = 1MB, 300KB) in 1 hour intervals for 7 days • Block Size = 32KB
Results • Paraloading decreases download time over the average single connection case • Speedup is far from optimal case (aggregate bandwidth) • Block request gaps result in wasted bandwidth • Gaps are proportional to RTT • Congestion at client? Possible but unlikely.
S = 763KB, B = 30, M = 4 S - 763K
Acknowledgements • Dave Anderson • Dorothy Curtis • Wendi Heinzelmann • WIND Group
Summary of Contributions • Implemented a paraloader • Verified that paraloading indeed provides performance gain… sometimes • Increasing degree of parallelism improves overall performance • Performance gains are not as good as those reported by Rodriguez et al.
Future Work • Examine how block size affects performance gain • Examine cost of paraloading • Implement and test various optimization techniques • Perform measurements at different client sites
Paraloading Will Not Be Effective In All Situations • Clients should have enough “slack” bandwidth capacity to open more than one connection • Parallel connections are bottleneck disjoint • Target data on mirror servers is consistent and static • Security and authentication services are installed where appropriate • Data transport is reliable • Mirror locations are quickly and easily obtained
Step-by-step Process of the Block Scheduling Paraloading Scheme 1. Obtain a list of mirror sites 2. Open a connection to a mirror server and obtain file length 3. Divide file length into blocks 4. Send a block request to each open connection 5. Wait for a response 6. Send a new block request to the first connection that finished downloading a block 7. Loop back to 5 until all blocks are retrieved
Paraloading is Not a Well-studied Concept • Byers et al. proposed using Tornado codes to facilitate paraloading. • Rodriguez et al. proposed the block scheduling paraloading scheme that is used in our project