1 / 21

C l oud MapReduce: A MapReduce Implementation on top of a Cloud Operation System

C l oud MapReduce: A MapReduce Implementation on top of a Cloud Operation System. Huan Liu, Dan Orban Accenture Technology Labs. 9962161 江嘉福 100062228 徐光成 100062229 章博遠. 2011, 11th IEEE/ACM International Symposium on. 1. OUTLINE. I. I ntroduction

jacie
Download Presentation

C l oud MapReduce: A MapReduce Implementation on top of a Cloud Operation System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cloud MapReduce: A MapReduce Implementation on top of a Cloud Operation System Huan Liu, Dan Orban Accenture Technology Labs 9962161 江嘉福 100062228 徐光成 100062229 章博遠 2011, 11th IEEE/ACM International Symposium on 1

  2. OUTLINE I. Introduction II. Cloud MapReduceArchitecture & Implementation III. Pros & Cons of Cloud MapReduce IV. Experimental Evaluation V. Conclusions & Future Works VI. References 2 9962161 江嘉福100062228 徐光成 100062229 章博遠

  3. INTRODUCTION 1. What is Cloud OS ? 2. Challenges posed by a cloud OS 3. Cloud MapReduce? 4. Advantages of Cloud MapReduce 3 9962161 江嘉福100062228 徐光成 100062229 章博遠

  4. What is Cloud OS ? 1.Managing the low level cloud resources 2.Presenting a high level interface to the application programmers 3.key difference :scalable 圖一 4 9962161 江嘉福100062228 徐光成 100062229 章博遠

  5. Challenges posed by a cloud OS 1.Scalability comes at a price. 2.Data consistency, system availability, and tolerance to network partition. 圖二 5 9962161 江嘉福100062228 徐光成 100062229 章博遠

  6. Cloud MapReduce? 1.MapReduce programming model 2.horizontal scaling 3.eventual consistency 4.overcome limitations 6 9962161 江嘉福100062228 徐光成 100062229 章博遠

  7. Advantages of Cloud MapReduce 1.Incremental scalability: Can scale incrementally in the number of computing nodes. 2.Symmetry and Decentralization: Node has the same set of responsibilities. 3.Heterogeneity: Nodes have varying computation capacity. 7 9962161 江嘉福100062228 徐光成 100062229 章博遠

  8. Cloud MapReduceArchitecture and Implementation 1.The architecture 2.Cloud challnenges 3.General solution approaches 8 9962161 江嘉福100062228 徐光成 100062229 章博遠

  9. The Architecture 9 9962161 江嘉福100062228 徐光成 100062229 章博遠

  10. Cloud challenges & General solution approaches 1.Long latency 2.Horizontal scaling 3.Don’t know when a queue is created for the first time 10 9962161 江嘉福100062228 徐光成 100062229 章博遠

  11. Con’t 4.Duplicate message 5.Potential node failure 6.Indeterminstic eventual consistency windows 11 9962161 江嘉福100062228 徐光成 100062229 章博遠

  12. Pros • 3000 lines of Java code(L.O.C) vs 285375 Hadoop L.O.C • Large & Reliable FS • High Bandwidth(fast read/write) • Single point of contact(high throughput) 12 9962161 江嘉福100062228 徐光成 100062229 章博遠

  13. Cons • Uses only network(no local storage) • Leads to bottleneck 13 9962161 江嘉福100062228 徐光成 100062229 章博遠

  14. Evaluation Almost twice as fast! 14 9962161 江嘉福100062228 徐光成 100062229 章博遠

  15. Evaluation • Hadoop - 385s total, network/CPU under utilized • CMR - 210s, more efficient network/CPU usage 15 9962161 江嘉福100062228 徐光成 100062229 章博遠

  16. Evaluation Wiki Word Count • Combiner: Hadoop - 747s CMR - 436s • No Combiner: Hadoop - 1733s CMR - 1247s 16 9962161 江嘉福100062228 徐光成 100062229 章博遠

  17. Evaluation Amazon • Word Count -> 400GB using 100 nodes • Approx. 1hr • 983,152 Requests -> $0.98 • Using SimpleDB? • 3.7hrs -> $0.52 17 9962161 江嘉福100062228 徐光成 100062229 章博遠

  18. Evaluation Comparison • Distributed Grep Word Count -> 13GB of data • CMR = 962 seconds • Hadoop 1047 seconds • Results are almost the same, why? • More CPU intensive tasks 18 9962161 江嘉福100062228 徐光成 100062229 章博遠

  19. Evaluation 12GB - 923670 HTML files • Hadoop -> 6hrs+ • CMR -> 297 seconds • Hadoop - High overhead from task creation 19 9962161 江嘉福100062228 徐光成 100062229 章博遠

  20. Conclusion • Cloud cannot be implemented on any system • Poor Performance • CMR techniques overcome cloud limitations • 0 Performance Degradation • Good to use for other systems 20 9962161 江嘉福100062228 徐光成 100062229 章博遠

  21. REFERENCES 圖一:http://techcrunch.com/ 圖二:http://blog.csdn.net/zouqingfang/article/details/7269920 http://zh.wikipedia.org/ https://code.google.com/p/cloudmapreduce/ http://searchcloudcomputing.techtarget.com/definition/MapReduce http://myblog-maurice.blogspot.tw/2012/08/nosqlcap.html 21 9962161 江嘉福100062228 徐光成 100062229 章博遠

More Related