1 / 24

New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks

New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks. Brian Cho Indranil Gupta University of Illinois at Urbana-Champaign. Motivation: Ad-hoc Data Processing. Data-intensive research on OpenCirrus Federated cloud: diverse geographic locations

gina
Download Presentation

New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. New Algorithms for Planning Bulk Transfervia Internet and Shipping Networks Brian Cho Indranil Gupta University of Illinois atUrbana-Champaign

  2. Motivation: Ad-hoc Data Processing • Data-intensive research on OpenCirrus • Federated cloud: diverse geographic locations • Data scale of TBs • Limited wide area bandwidth is a big bottleneck : Can take days or weeks to transfer over internet [Garfinkel 07] • Success story: Washington Post • Hillary Clinton White House schedule • Released as 17,481 pages non-searchable PDF images • Convert to searchable text and deliver to newsroom within the same news cycle • Done within 26 hours with Amazon AWS • Pay for bandwidth and computer usage

  3. Bulk Transfer Options • Internet Transfer • Grid: [GridFTP] • PlanetLab: [CoBlitz 06] • Disk Shipping Transfer • [Jim Gray 03] • [PostManet 04] • [DOT 06] • Amazon AWS Import/Export • Pandora (People and networks moving data around) • First ever solution to transfer data cooperatively between multiple sources with internet and shipping edges • Produce optimal transfer plans that obey time deadlines and minimize dollar cost • Better than internet-only and shipping-only strategies

  4. Option 1: Internet Transfer 5-20 Mbps 1TB: 5-20 days $0.10 per GB Computation Provider (Amazon) Data Source (CMU) No Cost Data Source (Illinois)

  5. Option 2: Disk Shipping Transfer Overnight: $60 per Disk Two-Day: $30 per Disk Ground: $10 per Disk Disk Interface 40 MB/s $0.02 per GB $80 per Disk Computation Provider (Amazon) Data Source (CMU) Overnight: $40 per Disk Two-Day: $15 per Disk Ground: $5 per Disk Data Source (Illinois) Overnight: $50 per Disk Two-Day: $25 per Disk Ground: $5 per Disk

  6. Cooperative Transfer Solutions • Good solutions • Meet deadlines • Minimize dollar cost • Complexity • Global scale • Many strategies • Collaboration helps • How to find the best solution? Open Cirrus Sites

  7. Example: Minimize Dollar Cost 0.8 TB Data Source A Cloud Service Provider Total Cost: $125 Total Time: 20 Days 15 Days No Cost Data Source B Loading: $40 Handling: $80 5 Days . 1.2 TB Ground: $5 14 hours

  8. Example: Meet Deadline (3 days)while Minimizing Dollar Cost 0.8 TB Data Source A Cloud Service Provider 1 Day Total Cost: $210 Total Time: 3 Days Overnight: $40 6 hours Data Source B Loading: $40 Handling: $80 1 Day . 1.2 TB Overnight: $50 . 14 hours

  9. Outline • Motivation • Problem Formulation • Graph Model • Flow Over Time • Solution: Pandora • Experimental Results • Conclusion

  10. Graph Model: Internet Links Capacity (Mb/s) Cost ($/GB) Transit time (almost instantaneous) Incoming/ Outgoing BW inet_out inet_out inet_in inet_in Site A Site B

  11. Graph Model: Shipment Links Capacity (Mb/s) Cost ($/GB) Transit time (almost instantaneous) Incoming/ Outgoing BW inet_out inet_out inet_in inet_in ship_in ship_in Site A Site B Capacity (almost infinite) Cost: Shipping and Handling ($/Disk) Transit time (Hrs) Disk Interface BW e.g., 40 MB/s Cost: Loading ($/GB)

  12. Data Transfer Over Time • Goal: Meet time deadline T while minimizing dollar cost C • Hard problem on graph with both Internet and Shipment links • NP-Hard • Formal problem and proof in paper • Solution: Pandora computes optimal and approximate solutions

  13. Solution: Pandora Overview • Transform into static time-expanded network • Decomposition of shipping edges • Solve min-cost flow on static network • Mixed Integer Program • Optimizations to reduce computation time

  14. Time-expanded Network • Intuitively, incorporate time into graph to create an extended graph representation • Make T=deadlinecopies of each vertex • Draw edges according to transit time • Draw holdover edges • [Ford Fulkerson 58] • Disk shipment represented as time-expanded network τ = 3 τ = 1 T = 5 time

  15. Decomposed Shipping Edges • Decompose shipping edges to fixed cost edges • Transit time • Fixed cost • Capacity capacity = 2 TB cap = 2 TB cost = $130 cost = $110 cost = $100 capacity = 2 TB

  16. Solution: Min-cost Flow Calculation using Mixed-Integer Program • Fixed-cost edges make min-cost flow calculation NP-Hard • Mixed-Integer Program (MIP) • Binary variable yedefined on fixed-cost edges • Goal: Minimize dollar cost • Subject to • Capacity constraints (flowe ≤ capacitye ∙ ye) • Conservation of flow • Demands of sources and sink • Proof of NP-Hardness and formal MIP in paper

  17. Optimizations: Overview • Size of MIP grows linearly with deadline T • Worst-case running time grows exponentially with T • Reduce size of the MIP • Reduce number of shipment edges • Δ -condensed time-expanded networks • More optimizations in paper

  18. Optimizations: Reduce numberof shipment edges • Can remove redundant shipment edges • Example: • Overnight shipment sent anytime before 4pm will arrive at destination at 8am 8am 7am 4pm 3pm 2pm 1pm noon

  19. Optimization: Δ-condensedTime-expanded Network • Each batch of consecutive Δ time units condensed into one virtual time unit • Solution has • Minimum cost • Deadline approximation depending on Δ • More details in paper • [Fleischer Skutella 07] Δ = 2

  20. Experimental Setup • Trace-driven • Wrote scripts to communicate with FedEx web services: queried package rates and destination time • Internet BW from PlanetLab measurements • GNU Linear Programming Kit (GLPK)

  21. Experimental Results:8 sources, 0.25 TB per node, Heterogeneous BW • Direct Internet • Cost: $200 • Time: 280 hrs • Cannot take advantage of heterogeneous bandwidth • Direct Overnight • Cost: $1,500 • Time: 38hrs • Cannot fill disks to capacity 4 5 3 6 2 7 1 8 x 8 Width proportional to BW t 0.25 TB

  22. Experimental Results:8 sources, 0.25 TB per node, Heterogeneous BW • Pandora Deadline=96hrs • Cost: $183 • Time: < 96 hrs • Direct Internet • Cost: $200 • Time: 280 hrs • Cannot take advantage of heterogeneous bandwidth • Direct Overnight • Cost: $1,500 • Time: 38hrs • Cannot fill disks to capacity 3 2 4 1 5 6 0.06 TB 0.08 TB 7 8 t 0.14 TB 1.92 TB

  23. Experimental Results: Optimizations • Reducing shipment edges decreases computation time • Using Δ-condensed time-expanded networks decreases computation time • Deadlines met in our experiments 2 sources 1 source

  24. Conclusion • First ever solution to transfer data cooperatively between multiple sources with internet and shipping edges • Produce optimal transfer plans that obey time deadlines and minimize dollar cost • Better than internet-only and shipping-only strategies • Reasonable computation time by using optimizations

More Related