1 / 13

MapReduce : Simplified Data Processing on Large Clusters

MapReduce : Simplified Data Processing on Large Clusters. B97902029 葉彥廷 B97902083 林廷韋 B97902085 王頃恩. Outline. Why we choose this topic Introduction Programming Model Example Implementation Conclusion. Why we choose this topic. 趨勢騰雲駕霧程式競賽 (2010 )

nbriggs
Download Presentation

MapReduce : Simplified Data Processing on Large Clusters

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MapReduce: Simplified Data Processing on Large Clusters B97902029 葉彥廷 B97902083 林廷韋 B97902085 王頃恩

  2. Outline • Why we choose this topic • Introduction • Programming Model • Example • Implementation • Conclusion

  3. Why we choose this topic • 趨勢騰雲駕霧程式競賽(2010) • Miserable memory in the last summer vacation. • We didn’t design a distributed system successfully in the end. • So we want to learn the ideas of cloud computing more.

  4. Introduction(1) • How long can you stand for searching the answer of automata homework? • A week? • A day? • Or ask Google for instant answers?

  5. Introduction(2) • But how can Google do it so fast? • Google is good at automata? • It’s MapReduce!! • And what can MapReduce do?

  6. Introduction(3) • MapReduce can: • Simplified the procedure of computing large amount of data. • Split works into independent jobs, which can be computed in distributed clusters. • For programmer, he/she only needs to implement the interface of Map and Reduce without much effort. • But how does it work?

  7. Programming Model(1) • Map function: • Take two input parameters : KEY/VALUE • Split the VALUE into several intermediate key/value pairs with user defined implementation. (may use KEY or not) • Send key/value pair to Reduce functions.

  8. Programming Model(2) • Reduce function: • Receive input key/value pairs from Map function. • Merge together these values to form a possibly smaller set of values with the same key. • Collect the output from all clusters, and show the result to the user.

  9. Example • Assume we have a log file of web page requests and it’s name. • We want to know what web page appears in the log file and it’s frequency. • Map function • Input: <logs file name , web page requests> • Output:<URL,1> • Reduce function • Input:<URL,1> • Output:<URL, total counts>

  10. Implementation(1)

  11. Implementation(2) • Master Data Structure • For each map and reduce, it stores the state, and the identity of worker machine. • Fault Tolerance • Worker Failure • Master Failure

  12. Implementation(3) • Locality • Read the input locally without much use of the network. • Task Granularity • Backup Tasks

  13. Conclusion • Please DO NOT assign papers without inform us in the beginning of this semester. • Please stop FLIRTING with CHINA student. • Please PREPARE the course content instead of discussing 5 minutes. • Please OK?

More Related