420 likes | 574 Views
MapReduce : Simplified Data Processing on Large Clusters. 2009-21146 Lim JunSeok. Contents. 1. Introduction 2. Programming Model 3. Structure 4. Performance & Experience 5. Conclusion. Introduction. Introduction. What is MapReduce ?
E N D
MapReduce: Simplified Data Processing on Large Clusters 2009-21146 Lim JunSeok
Contents 1. Introduction 2. Programming Model 3. Structure 4. Performance & Experience 5. Conclusion
Introduction • What is MapReduce? • A simple and powerful interface that enables automatic parallelization and distribution of large-scale computations. • A programming model • executes process in distributed manner • exploits large set of commodity computers • for large data set(> 1 TB) • with underlying runtime System • parallelizes the computation across large-scale clusters of machines • handles machine failures • schedules inter-machine communication to make efficient use of the network and disk
Motivation • Want to process lots of data( > 1TB) • E.g. • Raw data: crawled documents, Web request logs, … • Derived data: inverted indices, summaries of the number of pages, a set of most frequent queries in a given day. • Want to parallelize across hundreds/thousands of CPUs • And, want to make these easy The Digital Universe 2009-2020 Google Data Centers – File System Distributed
Motivation • Application: Sifting through large amounts of data • Used for • Generating the Google search index • Clustering problems for Google News and Froogle products • Extraction of data used to produce reports of popular queries • Large scale graph computation • Large scale machine learning • … Google Search PageRank Machine learning
Motivation • Platform: clusters of inexpensive machines • Commodity computers(15,000 Machines in 2003) • Scale to large clusters: thousands of machines • Data distributed and replicated across machines of the cluster • Recover from machine failure • Hadoop, Google File System Google File System Hadoop
Programming Model
MapReduce Programming Model • MapReduce framework • Partitioning function: (default) • Related well balanced partitions • Partitioning function and can be specified by users. Map Partitioning function Reduce
MapReduce Programming Model • Map phase • Local computation • Process each record independently and locally • Reduce phase • Aggregate the filtered output Result Local Storage Reduce Map Commodity computers
Example: Word Counting File 2: Hello Map Bye Reduce File 1: Hello World Bye SQL Map procedure <Hello, 1> <World, 1> <Bye, 1> <SQL, 1> <Hello, 1> <Map, 1> <Bye, 1> <Reduce, 1> Partitioning Function <Hello, {1,1}> <World, 1> <Map,1> <Bye, {1,1}> <SQL, 1> <Reduce,1> <Hello, 2> <World, 1> <Map,1> <Bye, 2> <SQL, 1> <Reduce,1> Reduce procedure
Example: PageRank • PageRank review: • Link analysis algorithm • : set of all Web pages • : set of pages that link to the page • : the total number of links going out of • : the PageRank of page
Example: PageRank • Key ideas for Map Reduce • RageRank calculation only depends on the PageRank values of previous iteration • PageRank calculation of each Web pages can be processed in parallel • Algorithm: • Map: Provide each page’s PageRank ‘fragments’ to the links • Reduce: Sum up the PageRank fragments for each page
Example: PageRank • Key ideas for Map Reduce
Example: PageRank • PageRank calculation with 4 pages
Example: PageRank • Map phase: Provide each page’s PageRank ‘fragments’ to the links PageRank fragment computation of page 2 PageRank fragment computation of page 1
Example: PageRank • Map phase: Provide each page’s PageRank ‘fragments’ to the links PageRank fragment computation of page 3 PageRank fragment computation of page 4
Example: PageRank • Reduce phase: Sum up the PageRank fragments for each page
Execution Overview (1) Split the input files into M pieces of 16-64MB per piece. Then start many copies of program (2) Master is special: the rest are workers that are assigned work by the master. • M map tasks and R reduce tasks (3) Map phase • Assigned worker read the input files • Parse the input data into key/value pairs • Produce intermediate key/value pairs (7) Return to user code
Execution Overview (4) Buffered pairs are written to local disk, partitioned into R regions by the partitioning function • The locations are passed back to the master • Master forwards these locations to the reduce workers (5) Reduce phase 1: read and sort • Reduce workers read the data from intermediate data for its partition • Sort intermediate key/value pairs to group data by same key (7) Return to user code
Execution Overview (6) Reduce phase 2: reduce function • Iterate over the sorted intermediate data in the reduce function • The output is appended to a final output file for the reduce function (7) Return to user code • The master wakes up the user program • Return back to the user code (7) Return to user code
Failure Tolerance • Handled via re-execution: worker failure • Failure detection: heartbeat • The master pings every worker periodically • Handling Failure: re-execution • Map task: • Re-execute completed and in-progress map tasks since map tasks are performed in the local • Reset the state of map tasks and re-schedule • Reduce tasks • Re-execute in-progress map tasks since the data is stored in local • Completed reduce tasks do NOT need to be re-executed • The results are stored in global file system
Failure Tolerance • Master failure: • Job state is checkpointed to global file system • New master recovers and continues the tasks from checkpoint • Robust to large-scale worker failure: • Simply re-execute the tasks! • Simply make new masters! • E.g. • Lost 1600 of 1800 machines once, but finished fine.
Locality • Network bandwidth is a relatively scarce resource • Input data is stored on the local disks of the machines • GFS divides each file into 64MB blocks • Store several copies of each block on different machines • Local computation: • Master takes the information of location of input data’s replica • Map task is performed in the local disk that contains the replica of the input data • If it fails, master schedules the map task near a replica • E.g.: worker on the same network switch • Most input data is read locally and consumes no network bandwidth
Task Granularity • Fine granularity tasks • Many more map tasks than machines • The many map tasks can be completed by spread out across all the other worker machines • Practical bounds on the size of M and R • for scheduling • for state in memory • The constant factors for memory usage are small • One piece of the state is approximately one byte of data per map/reduce task pair
Backup Tasks • Slow workers significantly lengthen completion time • Other jobs consuming resources on machine • Bad disks with soft errors • Data transfer very slowly • Weird things • Processor cashes disabled • Solution: Near end of phase, spawn backup copies of tasks • Whichever, one finishes first wins • As a result, job completion time dramatically shortened • E.g. 44% longer to complete if backup task mechanism is disabled
Performance & Experience
Performance • Experiment setting • 1,800 machines • 4 GB of memory • Dual-processor 2 GHz Xeons with Hyperthreading • Dual 160 GB IDE disks • Gigabit Ethernet per machine • Approximately 100-200 Gbps of aggregate bandwith
Performance • MR_Grep: Grep task with MapReduce • Grep: search relatively rare three-character pattern through 1 terabyte • 80 sec to hit zero • Computation peaks at over 30GB/s when 1764 workers are assigned • Locality optimization helps • Without this, rack switches would limit to 10GB/s Data transfer rate over time
Performance • MR_Sort: Sorting task with MapReduce • Sort: sort 1 terabyte of 100 byte records • Takes about 14 min. • Input rate is higher than the shuffle rate and the output rate; locality • Shuffle rate is higher than output rate • Output phase writes two copies for reliability
Performance • MR_Sort: Backup task and failure tolerance • Backup tasks reduce job completion time significantly • System deal well with failures
Experience • Large-scale indexing • MapReduce used for the Google Web search service • As a results, • The indexing code is simpler, smaller, and easier to understand • Performance is good enough • Locality makes it easy to change the indexing process • A few months a few days • MapReduce takes care of failures, slow machines • Easy to make indexing faster by adding more machines
Experience • The number of MapReduce instances grows significantly over time • 2003/02: first version • 2004/09: almost 900 • 2006/03: about 4000 • 2007/01: over 6000 MapReduce instances over time
Experience • New MapReduce Programs Per Month • The number of new MapReduce programs increases continuously
Experience • MapReduce statistics for different months
Are every tasks suitable for MapReduce? • NOT every tasks are suitable for MapReduce: • NOT Suitable if… • Suitable if…
Is it trend? Really? • Job market trend: • ‘World says 'No' to NoSQL’ – written by IBM (2011.9, BNT RackSwitchG8264) • Comparing to SQL, • Much harder to learn • It cannot solve all problems in the world • E.g. Fibonacci: • Main stream enterprise don’t need it • they already have skillful engineers of another languages. SQL: 4% MapReduce: 0……% Percentage of matching job postings
Conclusion • Focus on problem: • let library deal with messy details • Automatic parallelization and distribution • MapReduce has proven to be a useful abstraction • MapReduce Simplifies large-scale computations at Google • Functional programming paradigm can be applied to large-scale application