1 / 6

Introduction to MapReduce

“Evaluating MapReduce for Multi-core and Multiprocessor Systems” Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, Christos Kozyrakis Computer Systems Laboratory Stanford University Presented by JP Cafaro. Introduction to MapReduce.

nedaa
Download Presentation

Introduction to MapReduce

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. “Evaluating MapReduce for Multi-core and Multiprocessor Systems”Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, Christos KozyrakisComputer Systems LaboratoryStanford UniversityPresented by JP Cafaro

  2. Introduction to MapReduce • MapReduce is a programming model created by Google to help with the automatic parallelization and distribution of code over thousands of servers. • It allows for the programmer to write simple functional code without needing to worry about all of the low-level parallelization under the hood. • It works by taking an input data, and mapping it to intermediate <key,value> pairs. Disjoint portions of the input data can be worked on in parallel. • The intermediate pairs are then reduced to produce the final output. This can also be done in parallel. ECE 259 / CPS 221

  3. Proposal and Features • MapReduce is for thousands of distributed systems and relies on remote file accesses. The researchers wanted to create a shared memory system implementation of MapReduce for commercial systems (Phoenix) • Phoenix can do a number of really cool things like dynamically spawn threads taking into account the number of cores, hardware threads per core, system load, etc. • Work Stealing/Load Balancing, Prefetching, Granularity, Fault Tolerance • It deals with a lot of the low level stuff automatically to create a simplistic programming model to greatly facilitate programmer efficiency. ECE 259 / CPS 221

  4. Benchmark and Results • The researchers used a number of parallelizable types of programs including word count, matrix multiply, reverse index, etc. • Speedups were determined based on comparisons to sequential versions of the code. • In all cases, using the MapReduce implementation was better than using the sequential version. • In some cases, the overhead introduced by Phoenix made it less efficient than a low-level implementation in P-Threads. ECE 259 / CPS 221

  5. Questions • The main question is the tradeoff between programming simplicity and performance. • The low level P-threads implementation didn’t use dynamic scheduling because of programming complexity even though it would have probably made the Phoenix implementation look less attractive from a performance standpoint. • Are we giving up too much to make programmers’ lives easier? • How many types of applications can we use this MapReduce implementation on? • Are there other types of programming models that are similar to MapReduce that we could fit to other problems types? ECE 259 / CPS 221

  6. Conclusions • MapReduce/Phoenix can be really useful for some algorithms that map nicely onto this programming model as shown by the results. • Other types of programs that this model isn’t naturally suited for experience less speedups. The overhead introduced by Phoenix makes alternatives such as using a lower level P-threads implementation perform better. • Overall, this model is extremely simple and techniques such as MapReduce which automatically parallelize code are important to think about as we try and figure out how to write software for tons of cores. ECE 259 / CPS 221

More Related