210 likes | 430 Views
C l oud MapReduce: A MapReduce Implementation on top of a Cloud Operation System. Huan Liu, Dan Orban Accenture Technology Labs. 9962161 江嘉福 100062228 徐光成 100062229 章博遠. 2011, 11th IEEE/ACM International Symposium on. 1. OUTLINE. I. I ntroduction
E N D
Cloud MapReduce: A MapReduce Implementation on top of a Cloud Operation System Huan Liu, Dan Orban Accenture Technology Labs 9962161 江嘉福 100062228 徐光成 100062229 章博遠 2011, 11th IEEE/ACM International Symposium on 1
OUTLINE I. Introduction II. Cloud MapReduceArchitecture & Implementation III. Pros & Cons of Cloud MapReduce IV. Experimental Evaluation V. Conclusions & Future Works VI. References 2 9962161 江嘉福100062228 徐光成 100062229 章博遠
INTRODUCTION 1. What is Cloud OS ? 2. Challenges posed by a cloud OS 3. Cloud MapReduce? 4. Advantages of Cloud MapReduce 3 9962161 江嘉福100062228 徐光成 100062229 章博遠
What is Cloud OS ? 1.Managing the low level cloud resources 2.Presenting a high level interface to the application programmers 3.key difference :scalable 圖一 4 9962161 江嘉福100062228 徐光成 100062229 章博遠
Challenges posed by a cloud OS 1.Scalability comes at a price. 2.Data consistency, system availability, and tolerance to network partition. 圖二 5 9962161 江嘉福100062228 徐光成 100062229 章博遠
Cloud MapReduce? 1.MapReduce programming model 2.horizontal scaling 3.eventual consistency 4.overcome limitations 6 9962161 江嘉福100062228 徐光成 100062229 章博遠
Advantages of Cloud MapReduce 1.Incremental scalability: Can scale incrementally in the number of computing nodes. 2.Symmetry and Decentralization: Node has the same set of responsibilities. 3.Heterogeneity: Nodes have varying computation capacity. 7 9962161 江嘉福100062228 徐光成 100062229 章博遠
Cloud MapReduceArchitecture and Implementation 1.The architecture 2.Cloud challnenges 3.General solution approaches 8 9962161 江嘉福100062228 徐光成 100062229 章博遠
The Architecture 9 9962161 江嘉福100062228 徐光成 100062229 章博遠
Cloud challenges & General solution approaches 1.Long latency 2.Horizontal scaling 3.Don’t know when a queue is created for the first time 10 9962161 江嘉福100062228 徐光成 100062229 章博遠
Con’t 4.Duplicate message 5.Potential node failure 6.Indeterminstic eventual consistency windows 11 9962161 江嘉福100062228 徐光成 100062229 章博遠
Pros • 3000 lines of Java code(L.O.C) vs 285375 Hadoop L.O.C • Large & Reliable FS • High Bandwidth(fast read/write) • Single point of contact(high throughput) 12 9962161 江嘉福100062228 徐光成 100062229 章博遠
Cons • Uses only network(no local storage) • Leads to bottleneck 13 9962161 江嘉福100062228 徐光成 100062229 章博遠
Evaluation Almost twice as fast! 14 9962161 江嘉福100062228 徐光成 100062229 章博遠
Evaluation • Hadoop - 385s total, network/CPU under utilized • CMR - 210s, more efficient network/CPU usage 15 9962161 江嘉福100062228 徐光成 100062229 章博遠
Evaluation Wiki Word Count • Combiner: Hadoop - 747s CMR - 436s • No Combiner: Hadoop - 1733s CMR - 1247s 16 9962161 江嘉福100062228 徐光成 100062229 章博遠
Evaluation Amazon • Word Count -> 400GB using 100 nodes • Approx. 1hr • 983,152 Requests -> $0.98 • Using SimpleDB? • 3.7hrs -> $0.52 17 9962161 江嘉福100062228 徐光成 100062229 章博遠
Evaluation Comparison • Distributed Grep Word Count -> 13GB of data • CMR = 962 seconds • Hadoop 1047 seconds • Results are almost the same, why? • More CPU intensive tasks 18 9962161 江嘉福100062228 徐光成 100062229 章博遠
Evaluation 12GB - 923670 HTML files • Hadoop -> 6hrs+ • CMR -> 297 seconds • Hadoop - High overhead from task creation 19 9962161 江嘉福100062228 徐光成 100062229 章博遠
Conclusion • Cloud cannot be implemented on any system • Poor Performance • CMR techniques overcome cloud limitations • 0 Performance Degradation • Good to use for other systems 20 9962161 江嘉福100062228 徐光成 100062229 章博遠
REFERENCES 圖一:http://techcrunch.com/ 圖二:http://blog.csdn.net/zouqingfang/article/details/7269920 http://zh.wikipedia.org/ https://code.google.com/p/cloudmapreduce/ http://searchcloudcomputing.techtarget.com/definition/MapReduce http://myblog-maurice.blogspot.tw/2012/08/nosqlcap.html 21 9962161 江嘉福100062228 徐光成 100062229 章博遠