1 / 8

Data Parallelism & MapReduce | CS492 Special Topics

Explore Data Parallelism, MapReduce, & Distributed Algorithms in Computer Science. Learn parallelization techniques, function calculations, word count, & more in this informative lecture. Understand Google's MapReduce library & execution overview.

isisj
Download Presentation

Data Parallelism & MapReduce | CS492 Special Topics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture #4Introduction to Data Parallelism and MapReduce CS492 Special Topics in Computer Science: Distributed Algorithms and Systems

  2. Today’s Topics to Cover • Short quiz on programming in Ocaml

  3. How to parallelize (I) Runlength encoding Fibonacchi function Calculation of π Word count Inverted index

  4. How to parallelize (II) SIMD MIMD via shared memory MIMD via message passing Distributed computing

  5. MapReduce • Functional programming “Map / Reduce” way of thinking about problem solving • Google’s runtime library supporting MR paradigm at a very large scale

  6. Fall 2008 CS492 MapReduce Execution Overview

  7. How popular is MapReduce? • In September 2007, Google used 11,081 “machine-years” (roughly, CPU-years) on MapReduce jobs alone • Assume all machines were busy 100% and ran only MR 11,081 x 365 / 30 = 134,818 • If a rack holds 176 CPUS (88 1U dual-processor) 134,818 / 176 = 766

  8. Reading material “MapReduce: Simplified data processing on large clusters” by J. Dean and S. Ghemawat Communications of the ACM, Jan. 2008/Vol. 51, No. 1 “MapReduce: Simplified data processing on large clusters” by J. Dean and S. Ghemawat USENIX OSDI 2004

More Related