1 / 26

Beyond Music Sharing: An Evaluation of Peer-to-Peer Data Dissemination Techniques in

Beyond Music Sharing: An Evaluation of Peer-to-Peer Data Dissemination Techniques in Large Scientific Collaborations. Thesis defense: Samer Al-Kiswany. /26. Samer Al-Kiswany. Introduction.

wright
Download Presentation

Beyond Music Sharing: An Evaluation of Peer-to-Peer Data Dissemination Techniques in

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Beyond Music Sharing: An Evaluation of Peer-to-Peer Data Dissemination Techniques in Large Scientific Collaborations Thesis defense: Samer Al-Kiswany

  2. /26 Samer Al-Kiswany Introduction • Data-intensive science: large-scale simulations and new scientific instruments generate huge volumes of data (PetaBytes). • User communities: large, geographically dispersed Requirement : Efficient data dissemination tools

  3. /26 Samer Al-Kiswany Introduction - Example

  4. /26 Samer Al-Kiswany Question ? Data dissemination solutions: IP-Multicast, Bullet, BitTorrent, SPIDER, OMNI, ALMI, Logistical-Multicast, Narada, Scribe, Grido, FastReplica… and many others. What data dissemination strategies perform best in today's Grids deployments?

  5. Workload characteristics Evaluation Recommendations Deployment platform characteristics Data dissemination proposed solutions /26 Samer Al-Kiswany Roadmap What data dissemination strategies perform best in today's Grids deployments?

  6. /26 Samer Al-Kiswany Workload and Deployment Platform Data-intensive scientific collaboration characteristics: • Scale of data: massive data collections (TeraBytes) • Data usage: Uniform popularity distributions, and co‑usage • Near real time processing. Deployment platform characteristics: • Resource availability: low churn rate, high node availability, well-provisioned networks. • Collaborative environments: no freeriding, • thus less effort is needed to control fair resource sharing.

  7. /26 Samer Al-Kiswany Roadmap What data dissemination strategies perform best in today's Grids deployments? Workload characteristics Evaluation Recommendations Deployment platform characteristics Data dissemination proposed solutions

  8. /26 Samer Al-Kiswany Classification of Approaches • Base Cases: • IP-Multicast. • Parallel transfers: separate data channels from the source to each destination.

  9. Drawbacks: • Overwhelms the source – does not scale • Generates high duplicate traffic at the links around the source • Does not exploit all available transport capacity. Separate Transfer from the Source to every Destination /26

  10. 10 10 10 10 5 5 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 IP Multicasting /26

  11. IP Multicast Drawbacks: • Limited deployment • Vulnerability to nodes failures • Does not exploit all available transport capacity. • Throughput limited by bottleneck link 10 10 5 10 10 10 10 10 10 10 10 10 /26

  12. Source 1 5 6 4 3 2 ALM Tree Tree Based Techniques: Application Level Multicast (ALM) Source 6 1 5 2 4 3 /26

  13. Source Source Drawbacks: 1 5 • Vulnerability to nodes failures • Does not exploit all possible routes in the network. 6 4 3 2 6 ALM Tree 1 5 2 4 3 Tree Based Techniques: Application Level Multicast (ALM) /26

  14. Swarming Techniques: BitTorrent and Bullet 4 1 2 3 Complete file 1 2 3 4 /26

  15. Swarming Techniques: BitTorrent and Bullet 1 Complete file 1 2 3 4 4 4 1 2 1 3 2 3 /26

  16. Complete file 1 2 3 4 Drawbacks: • Generates high duplicate traffic. 3 4 1 2 1 2 1 3 4 Swarming Techniques: BitTorrent and Bullet /26

  17. Logistical Multicasting /26

  18. Workload characteristics Recommendations Deployment platform characteristics Data dissemination proposed solutions /26 Samer Al-Kiswany Roadmap Question: What data dissemination strategies perform best in today's Grids deployments? Evaluation Approaches: Evaluation • Analytical Modeling • Deployment based • Simulation

  19. Inputs: • Real topologies of three deployed Grid testbeds: LCG, GridPP, EGEE. • Generated topologies: 100 (using BRITE) Methodology • Simulator Design: • Block-level simulation. • Simulates physical layer link-contention /26 Samer Al-Kiswany

  20. /26 Samer Al-Kiswany Methodology

  21. /26 Samer Al-Kiswany TransferTime Number of destinations that have completed the file transfer for the original EGEE topology.

  22. /26 Samer Al-Kiswany Transfer Time – With reduced core-link bandwidth • Conclusions: • On well-provisioned topologies even naïve algorithms perform well. • On constrained topologies application‑level techniques perform uniformly well: are among the first to finish the transfer with good intermediate progress. Number of destinations that have completed the file transfer – EGEE topology with core bandwidth reduced to 1/8 of the original one.

  23. /26 Samer Al-Kiswany Summary Motivating question: What data dissemination strategies perform best in today's Grids deployments? In this project, we: • Simulated representative solutions. • Considering the characteristics of the workload and deployed platforms • Our results provide guidelines for selecting the data dissemination technique, depending on the: • Target environment. • Overall system workload characteristics. • Success Criteria.

  24. /26 Samer Al-Kiswany Research Publications This work resulted in two refereed publications, and one journal submission: • Beyond Music Sharing: An Evaluation of Peer-to-Peer Data Dissemination Techniques in Large Scientific Collaborations, S. Al-Kiswany, M. Ripeanu, A. Iamnitchi, and S. Vazhkudai, Submitted to the Journal of Grid Computing. • Are P2P Data-Dissemination Techniques Viable in Today's Data-Intensive Scientific Collaborations?, S. Al-Kiswany, M. Ripeanu, A. Iamnitchi, and S. Vazhkudai, EuroPar, 2007, France.( acceptance rate = 26%) • A Simulation Study of Data Distribution Strategies for Large-scale Scientific Data Collaborations, S. Al-Kiswany and M. Ripeanu, IEEE CCECE 2007.

  25. /26 Samer Al-Kiswany Other Research Work I am involved in another two research projects: Scavenged Storage System • stdchk: A Checkpoint Storage System for Desktop Grid Computing • A High-Performance GridFTP Server at Desktop Cost StoreGPU Exploiting the GPU for computationally intensive storage system operations.

  26. Thank you www.ece.ubc.ca/~samera

More Related