1 / 31

Large-scale file systems and Map-Reduce

Large-scale file systems and Map-Reduce. Google example: 20+ billion web pages x 20KB = 400+ Terabyte 1 computer reads 30-35 MB/sec from disk ~4 months to read the web ~1,000 hard drives to store the web Takes even more to do something useful with the data

ownah
Download Presentation

Large-scale file systems and Map-Reduce

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Large-scale file systems and Map-Reduce • Google example: • 20+ billion web pages x 20KB = 400+ Terabyte • 1 computer reads 30-35 MB/sec from disk • ~4 months to read the web • ~1,000 hard drives to store the web • Takes even more to do something useful with the data • New standard architecture is emerging: • Cluster of commodity Linux nodes • Gigabit ethernet interconnect CPU Single-node architecture Memory Disk

  2. Distributed File Systems • Files are very large, read/append. • They are divided into chunks. • Typically 64MB to a chunk. • Chunks are replicated at several compute-nodes. • A master (possibly replicated) keeps track of all locations of all chunks.

  3. Commodity clusters: compute nodes • Organized into racks. • Intra-rack connection typically gigabit speed. • Inter-rack connection faster by a small factor. • Recall that chunks are replicated Some implementations: • GFS (Google File System – proprietary). In Aug 2006 Google had ~450,000 machines • HDFS (Hadoop Distributed File System – open source). • CloudStore (Kosmix File System, open source).

  4. File Chunks Replication Racks of Compute Nodes

  5. Replication 3-way replication of files, with copies on different racks.

  6. Map-Reduce • You write two functions, Map and Reduce. • They each have a special form to be explained. • System (e.g., Hadoop) creates a large number of tasks for each function. • Work is divided among tasks in a precise way.

  7. Map-Reduce Algorithms • Map tasks convert inputs to key-value pairs. • “keys” are not necessarily unique. • Outputs of Map tasks are sorted by key, and each key is assigned to one Reduce task. • Reduce tasks combine values associated with a key.

  8. Simple map-reduce example: Word Count • We have a large file of words, one word to a line • Count the number of times each distinct word appears in the file • Sample application: analyze web server logs to find popular URLs • Different scenarios: • Case 1: Entire file fits in main memory • Case 2: File too large for main mem, but all <word, count> pairs fit in main mem • Case 3: File on disk, too many distinct words to fit in memory

  9. Word Count • Map task: For each line, e.g. CAT output (CAT,1) • Total output: (w1,1), (w1,1), …., (w1,1) • (w2,1), (w2,1), …., (w2,1) • …… • Hash each (w,1) to bucket h(w) in [0,r-1] • in local intermediate file. • r is the number of reducers • Master: Group by key: (w1,[1,1,…,1]), (w2,[1,1,…,1]), • ……… • Push group (w,[1,1,..,1]) to reducer h(w) • Reduce task: Read : (wi,[1,1,…,1]), (wi,[1,1,…,1]),… • (wj,[1,1,…,1]), (wj,[1,1,…,1]), … • …….. • Aggregate each (w,[1,1,…,1]) into (w,sum) • Output: : (wi,sumi), (wj,sumj), …. • into common output file • Since addition is commutative and associative • the map task could have sent : (w1,sum1), (w2,sum2), … • Reduce task would receive: (wi,sumi,1), (wi,sumi,2), … • (wj,sumj,1), (wj,sumj,2), … • and output (wi,sumi), (wj,sumj), ….

  10. Partition Function • Inputs to map tasks are created by contiguous splits of input file • For reduce, we need to ensure that records with the same intermediate key end up at the same worker • System uses a default partition function e.g., hash(key) mod R • Sometimes useful to override • E.g., hash(hostname(URL)) mod R ensures URLs from a host end up in the same output file

  11. Coordination • Master data structures • Task status: (idle, in-progress, completed) • Idle tasks get scheduled as workers become available • When a map task completes, it sends the master the location and sizes of its R intermediate files, one for each reducer • Master pushes this info to reducers • Master pings workers periodically to detect failures

  12. Data flow • Input, final output are stored on a distributed file system • Scheduler tries to schedule map tasks “close” to physical storage location of input data • Intermediate results are stored on local FS of map and reduce workers • Output is often input to another map-reduce task • Master data structures • Task status: (idle, in-progress, completed) • Idle tasks get scheduled as workers become available • When a map task completes, it sends the master the location and sizes of its R intermediate files, one for each reducer • Master pushes this info to reducers • Master pings workers periodically to detect failures

  13. Failures • Map worker failure • Map tasks completed or in-progress at worker are reset to idle (result sits locally at worker) • Reduce workers are notified when task is rescheduled on another worker • Reduce worker failure • Only in-progress tasks are reset to idle • Master failure • Map-reduce task is aborted and client is notified

  14. How many Map and Reduce jobs? • M map tasks, R reduce tasks • Rule of thumb: • Make M and R much larger than the number of nodes in cluster • One DFS chunk per map is common • Improves dynamic load balancing and speeds recovery from worker failure • Usually R is smaller than M, because output is spread across R files

  15. Matrix-vector multiplication using map-reduce

  16. If vector doesn’t fit in main memory Divide matrix and vector into stripes: Each maptask gets a chunk of stripe iof the matrix and the entire stripe iof the vector and produces pairs Reduce task igets all pairs and produces pairs

  17. Example: MAPPERS: REDUCERS:

  18. Relational operators with map-reduce Selection Map task: If C(t) is true output pair (t,t) Reduce task: With input (t,t) output t Selection is not really suitable for map-reduce, everything could have been done in the map task

  19. Relational operators with map-reduce Projection Map task: Let t’ be the projection of t. Output pair (t’,t’) Reduce task: With input (t’,[t’,t’,…,t’] ) output t’ Here the duplicate elimination is done by the reduce task

  20. Relational operators with map-reduce • Union RS • Map task: for each tuple t of the chunk of R or S output (t, t) • Reduce task: input is (t,[t]) or (t,[t, t]). Output t • Intersection R  S • Map task: for each tuple t of the chunk output (t,t) • Reduce task: if input is (t,[t,t]), output t • if input is (t,[t]) , output nothing • Difference R–S • Map task: for each tuple t of R output (t,R) • for each tuple t of S output (t,S) • Reduce task: if input is (t,[R]), output t • if input is (t,[R,S]) , output nothing

  21. Joining by Map-Reduce • Suppose we want to compute • R(A,B) JOIN S(B,C), using k Reduce tasks. • I.e., find tuples with matching B-values. • R and S are each stored in a chunked file. • Use a hash function h from B-values to k buckets. • Bucket = Reduce task. • The Map tasks take chunks from R and S, and send: • Tuple R(a,b) to Reduce task h(b). • Key = b value = R(a,b). • Tuple S(b,c) to Reduce task h(b). • Key = b; value = S(b,c).

  22. Map tasks send R(a,b) if h(b) = i All (a,b,c) such that h(b) = i, and (a,b) is in R, and (b,c) is in S. Map tasks send S(b,c) if h(b) = i Reduce task i • Key point: If R(a,b) joins with S(b,c), then both tuples are sent to Reduce task h(b). • Thus, their join (a,b,c) will be produced there and shipped to the output file.

  23. Mapping tuples in joins Mapper for R(1,2) R(1,2) (2, (R,1)) Reducer for B = 2 (2, [(R,1), (R,4), (S,3)]) Mapper for R(4,2) R(4,2) (2, (R,4)) Mapper for S(2,3) S(2,3) (2, (S,3)) Reducer for B = 5 (5, [(S,6)]) Mapper for S(5,6) S(5,6) (5, (S,6))

  24. Output of the Reducers Reducer for B = 2 (2, [(R,1), (R,4), (S,3)]) (1,2,3), (4,2,3) Reducer for B = 5 (5, [(S,6)])

  25. Relational operators with map-reduce Grouping and aggregation:  A,agg(B)(R(A,B,C)) Map task: for each tuple (a,b,c)output (a,b) Reduce task: if input is (a,[b1, b2, …,bn]), output (a,agg(b1, b2, …,bn)) for example (a, b1+b2+ …+bn)

  26. Relational operators with map-reduce Three-Way Join • We shall consider a simple join of three relations, the natural join R(A,B) ⋈ S(B,C) ⋈ T(C,D). • One way: cascade of two 2-way joins, each implemented by map-reduce. • Fine, unless the 2-way joins produce large intermediate relations.

  27. Another 3-Way Join • Reduce processes use hash values of entire S(B,C) tuples as key. • Choose a hash function h that maps B- and C-values to k buckets. • There are k2 Reduce processes, one for each (B-bucket, C-bucket) pair.

  28. Job of the Reducers • Each reducer gets, for certain B-values b and C-values c : • All tuples from R with B = b, • All tuples from T with C = c, and • The tuple S(b,c) if it exists. • Thus it can create every tuple of the form (a, b, c, d) in the join.

  29. Keys Values Mapping for 3-Way Join Aside: even normal map-reduce allows inputs to map to several key-value pairs. • We map each tuple S(b,c) to ((h(b), h(c)), (S, b, c)). • We map each R(a,b) tuple to ((h(b), y), (R, a, b)) for all y = 1, 2,…,k. • We map each T(c,d) tuple to ((x, h(c)), (T, c, d)) for all x = 1, 2,…,k.

  30. T(c,d), where h(c)=3 S(b,c) where h(b)=1; h(c)=2 R(a,b), where h(b)=2 Assigning Tuples to Reducers h(c) = 0 1 2 3 h(b) = 0 1 2 3

  31. DB = Mapper Mapper S(1,2) R(1,1) (1,1,(R,1)) (1,2,(R,1)) (1,2,(S,1,2)) ..etc R(1,1) S(1,1) T(1,1) R(2,1) S(1,1) T(1,1) R(2,1) S(1,1) T(1,2) R(1,1) S(1,1) T(1,2)

More Related