160 likes | 315 Views
Lecture #6 MapReduce (II). CS492 Special Topics in Computer Science: Distributed Algorithms and Systems. MapReduce Assumptions. Hardware Components are reliable Components are homogeneous Software It’s correct Network Latency is zero Bandwidth is infinite It’s secure Overall system
E N D
Lecture #6MapReduce (II) CS492 Special Topics in Computer Science: Distributed Algorithms and Systems
MapReduce Assumptions • Hardware • Components are reliable • Components are homogeneous • Software • It’s correct • Network • Latency is zero • Bandwidth is infinite • It’s secure • Overall system • Configuration is stable • There is one administrator
Question of the Day What goes on underneath?
Step #1 Splits the input files into M pieces (64MB) Starts up many copies of the program
Step #2 One special copy (the master) of the porgram assigns work to the rest of copies (workers) M map tasks and R reduce tasks
Step #3 A worker with a map task conducts the Map function. Output buffered in memory
Step #4 Periodically, the buffered output is written to local disk, partitioned into R regions by the partitioning function => info passed onto the master
Step #5 When a reduce worker is notified by the master about the locations, it uses RPC to read the buffered data from the local disks of the map worker. The reduce worker sorts the intermediate keys
Step #6 It goes thru the unique keys and perform Reduce
Step #7 When all Map and Reduce tasks are complete, the master wakes up the user program
Fault Tolerance • Master detects worker failures • How? • What if a Map worker dies? • What if a Reduce worker died after completing the task?
Locality • Task assignment by the Master • Mapping between the input file and workers?
Backup How to deal with stragglers?
Refinements Partitioning functions Ordering guarantees Combiner function Input and output types Side-effects Skipping bad records Local execution Status information Counters
Reading for next class “Lessons from Giant-Scale Services” by Eric Brewer, IEEE Internet Computing, July-August 2001 “The Google File System” by Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung, SOSP 2003, NY Short quiz on “Lessons ...”