Introduction to MapReduce: Programming Model, Optimization, and Fault Tolerance

ecs251 Spring 2012:Operating System#4: Map/Reduce, HDFS, Applications Dr. S. Felix Wu Computer Science Department University of California, Davis Map and Reduce

Map and Reduce

fork fork fork Master assign map assign reduce Worker Output File 0 write Worker local write Split 0 read Worker Split 1 Output File 1 Split 2 Worker Worker remote read, sort User Program input data from distributed file system Map and Reduce

Map and Reduce

MapReduce Programming Model • Data type: key-value records • Map function: (Kin, Vin)  list(Kinter, Vinter) • Reduce function: (Kinter, list(Vinter))  list(Kout, Vout)

Map Reduce • Parallelism • Save Network Bandwidth, and Data Locality • Failure and Transparency • Simple Programming Model Map and Reduce

Example: Word Count def mapper(line): foreach word in line.split(): output(word, 1) def reducer(key, values): output(key, sum(values))

Word Count Execution Input Map Shuffle & Sort Reduce Output the, 1 brown, 1 fox, 1 the quick brown fox brown, 2 fox, 2 how, 1 now, 1 the, 3 Map Reduce the, 1 fox, 1 the, 1 the fox ate the mouse Map quick, 1 how, 1 now, 1 brown, 1 ate, 1 cow, 1 mouse, 1 quick, 1 ate, 1 mouse, 1 Reduce how now brown cow Map cow, 1

An Optimization: The Combiner • Local reduce function for repeated keys produced by same map • For associative ops. like sum, count, max • Decreases amount of intermediate data • Example: local counting for Word Count: def combiner(key, values): output(key, sum(values))

Word Count with Combiner Input Map Shuffle & Sort Reduce Output the, 1 brown, 1 fox, 1 the quick brown fox brown, 2 fox, 2 how, 1 now, 1 the, 3 Map Reduce the, 2 fox, 1 the fox ate the mouse Map quick, 1 how, 1 now, 1 brown, 1 ate, 1 cow, 1 mouse, 1 quick, 1 ate, 1 mouse, 1 Reduce how now brown cow Map cow, 1

MapReduce Execution Details • Mappers preferentially scheduled on same node or same rack as their input block • Minimize network use to improve performance • Mappers save outputs to local disk before serving to reducers • Allows recovery if a reducer crashes • Allows running more reducers than # of nodes

Fault Tolerance in MapReduce 1. If a task crashes: • Retry on another node • OK for a map because it had no dependencies • OK for reduce because map outputs are on disk • If the same task repeatedly fails, fail the job or ignore that input block • Note: For the fault tolerance to work, user tasks must be deterministic and side-effect-free

Fault Tolerance in MapReduce 2. If a node crashes: • Relaunch its current tasks on other nodes • Relaunch any maps the node previously ran • Necessary because their output files were lost along with the crashed node

Fault Tolerance in MapReduce 3. If a task is going slowly (straggler): • Launch second copy of task on another node • Take the output of whichever copy finishes first, and kill the other one • Critical for performance in large clusters (many possible causes of stragglers)

Takeaways • By providing a restricted data-parallel programming model, MapReduce can control job execution in useful ways: • Automatic division of job into tasks • Placement of computation near data • Load balancing • Recovery from failures & stragglers

Map and Reduce

Amazon Elastic MapReduce • Web interface and command-line tools for running Hadoop jobs on EC2 • Data stored in Amazon S3 • Monitors job and shuts machines after use

Elastic MapReduce UI

Map: A Higher Order Function • F(x: int) returns r: int • Let V be an array of integers. • W = map(F, V) • W[i] = F(V[i]) for all I • i.e., apply F to every element of V Map and Reduce

Map Examples in Haskell • map (+1) [1,2,3,4,5] == [2, 3, 4, 5, 6] • map (toLower) "abcDEFG12!@#“ == "abcdefg12!@#“ • map (`mod` 3) [1..10] == [1, 2, 0, 1, 2, 0, 1, 2, 0, 1] Map and Reduce

Word Count Example • Read text files and count how often words occur. • The input is text files • The output is a text file • each line: word, tab, count • Map: Produce pairs of (word, count) • Reduce: For each word, sum up the counts. Map and Reduce

map map k k k v v v kn k2 k1 v2 vn v1 Input key-value pairs Intermediate key-value pairs … … k v E.g. (doc—id, doc-content) E.g. (word, wordcount-in-a-doc) Map and Reduce

Key-value groups Intermediate key-value pairs reduce reduce k k v v k v v v k k k v v v k v v group k v … … k v k v Output key-value pairs … (word, list-of-wordcount) E.g. (word, wordcount-in-a-doc) (word, final-count) ~ SQL Group by ~ SQL aggregation Map and Reduce

I,1 am,1 a,1 a,2 also,1 am,1 are,1 tiger,1 you,1 are,1 I, 1 tiger,2 you,1 also,1 a, 1 tiger,1 I am a tiger, you are also a tiger a,2 also,1 am,1 are,1 I,1 tiger,2 you,1 map reduce a, 1 a,1 also,1 am,1 are,1 I,1 tiger,1 tiger,1 you,1 map reduce map Map and Reduce

MapReduce : Execution Map and Reduce

Inverted Index Example Generate an inverted index of words from a given set of files Map: parses a document and emits <word, docId> pairs Reduce: takes all pairs for a given word, sorts the docId values, and emits a <word, list(docId)> pair Map and Reduce

{"id":"204722549606084_230354117042927","from":{"name":"Rufino Beniga","id":"100000037203591"},"to":{"data":[{"version":1,"name":"ecs30 Programming and Problem Solving","id":"204722549606084"}]},"message":"i'm at the \"help Command: man\" part and i typed \"man ls\" and it listed a bunch of stuff about what it is, how do i get out of it? it says i need to type exit but nothing is happening","actions":[{"name":"Comment","link":"http:\/\/www.facebook.com\/204722549606084\/posts\/230354117042927"},{"name":"Like","link":"http:\/\/www.facebook.com\/204722549606084\/posts\/230354117042927"}],"type":"status","created_time":"2012-01-11T05:48:09+0000","updated_time":"2012-01-11T05:49:17+0000","comments":{"data":[{"id":"204722549606084_230354117042927_230354333709572","from":{"name":"Chris Schwarz","id":"5100058"},"message":"Q","created_time":"2012-01-11T05:48:50+0000","likes":1},{"id":"204722549606084_230354117042927_230354447042894","from":{"name":"Rufino Beniga","id":"100000037203591"},"message":"haha i forgot about that. Thanks!!!! :D","created_time":"2012-01-11T05:49:17+0000"},{"id":"204722549606084_230354117042927_230354443709561","from":{"name":"Connor Wilson","id":"1596499591"},"message":"try ^c? lol","created_time":"2012-01-11T05:49:17+0000"}],"count":3}} Map and Reduce

Map and Reduce

HDFS Architecture Cluster Membership NameNode 1. filename Secondary NameNode 2. BlckId, DataNodes o Client 3.Read data Cluster Membership NameNode : Maps a file to a file-id and list of MapNodes DataNode : Maps a block-id to a physical location on disk SecondaryNameNode: Periodic merge of Transaction log DataNodes Map and Reduce

Hadoop Distributed File System Namenode File1 • Files split into 128MB blocks • Blocks replicated across several datanodes (often 3) • Namenode stores metadata (file names, locations, etc) • Optimized for large files, sequential reads • Files are append-only 1 2 3 4 1 2 1 3 2 1 4 2 4 3 3 4 Datanodes

Map and Reduce

HDFS Architecture Namenode Metadata(Name, replicas..) (/home/foo/data,6. .. Metadata ops Client Block ops Read Datanodes Datanodes B replication Blocks Rack2 Rack1 Write Client

NameNode Metadata • Meta-data in Memory – The entire metadata is in main memory – No demand paging of meta-data • Types of Metadata – List of files – List of Blocks for each file – List of DataNodes for each block – File attributes, e.g creation time, replication factor • A Transaction Log – Records file creations, file deletions. etc

DataNode • A Block Server – Stores data in the local file system (e.g. ext3) – Stores meta-data of a block (e.g. CRC) – Serves data and meta-data to Clients • Block Report – Periodically sends a report of all existing blocks to the NameNode • Facilitates Pipelining of Data – Forwards data to other specified DataNodes

Block Placement • Current Strategy -- One replica on local node -- Second replica on a remote rack -- Third replica on same remote rack -- Additional replicas are randomly placed • Clients read from nearest replica • Would like to make this policy pluggable

Data Correctness • Use Checksums to validate data – Use CRC32 • File Creation – Client computes checksum per 512 byte – DataNode stores the checksum • File access – Client retrieves the data and checksum from DataNode – If Validation fails, Client tries other replicas

NameNode Failure • A single point of failure • Transaction Log stored in multiple directories – A directory on the local file system – A directory on a remote file system (NFS/CIFS) • Need to develop a real HA solution

Introduction to MapReduce: Programming Model, Optimization, and Fault Tolerance