CS 540 Database Management Systems

CS 540 Database Management Systems Map/Reduce

Cluster Computing • Large number of commodity servers, connected by high speed, commodity network • Rack holds a small number of servers • Data center holds many racks • Massive parallelism: • 100s, or 1000s, or 10000s servers – Many hours • Failure becomes a fact of life: • If medium-time-between-failure is 1 year • Then 10000 servers have one failure / hour

Distributed File System (DFS) • Large files in order of TBs or PBs • Each file is partitioned into chunks, e.g. 64MB • Each chunk is replicated multiple times over different racks for fault tolerance • DFS implementations • Google’s DFS (GFS) • Hadoop’s DFS (HFS).

Map/Reduce • Google researchers introduced Map/Reduce framework in a paper published in 2004. • A high level programming model and implementation for large scale parallel data processing. • Apache Hadoop is an open source variant of Map/Reduce.

Map/Reduce Programs • Read and process a lot of data • MAP: • Extract some relevant information each tuple. • Shuffle and Sort the output tuples. • Reduce: • Aggregate the information over a bag of tuples • Summarize, filter, transform • Write the results

Data Model • File as a bag of (key, value) • Like key/value stores • A map/reduce program • Input: a bag of (input_key, value) • Output: a bag of (output_key, value) • Input and output may have different keys.

Map Step • User provides the MAP function • Input: (input key, value) • Output: bag of (intermediate key, value) • System applies the map function in parallel to all (input key, value) pairs in the input file.

Reduce Step • User provides the REDUCE function • Input: (intermediate key, bag of values) • Output: bag of output values • System groups all pairs with the same intermediate key, and passes the bag of values to the REDUCE function

Example • Counting the number of occurrences of each word in a large collection of documents map(String key, String value{ //key: document id // value: document contents for each word w in value Output-interim(w, ‘1’); } reduce(String key,Iteratorvalues){ //key: a word // values: a bag of counts for each v in values result += parseInt(v); Output(String.valueOf(result));

Schedule MAP REDUCE

Map Reduce Phases HDFS HDFS Local Storage

Map/Reduce Job versus Task • A Map/Reduce Job • One single “query”, e.g. count the words in all docs. • More complex programs may consists of multiple jobs. • A Map (or a Reduce) Task • A group of instantiations of the map ( or reduce) function, which are scheduled on a single worker.

Implementation • Master node: • partitions input file into M splits, by key. • assigns workers (=servers) to the M map tasks. • keeps track of their progress. • Workers write their output to local disk, partition into R regions. • Master assigns workers to the R reduce tasks. • Reduce workers read regions from the map workers’ local disks.

Implementation • Master pings workers periodically • If down then reassigns the task to another worker. • Straggler: a server that takes unusually long time to complete one of the last tasks. • The cluster scheduler has assigned other tasks on the server. • Bad disk forces frequent correctable errors • Stragglers are a main reason for slowdown • Map/Reduce solution: pre-emptive backup execution of the last few remaining in-progress tasks

Tuning • It is very difficult. • Choice of #M and #R: • Larger is better for load balancing • Limitation: master needs O(M×R) memory • Typical choice: • M: number of chunks • R: much smaller; rule of thumb: R=1.5 * number of servers • Over 100 other parameters: partition function, sort factor,…. around 50 of them affect running time. • Active research area

Map/Reduce Discussion • Advantage: hides scheduling and parallelization details. • Disadvantage: very limited queries • Difficult to write more complex tasks, thousands of line of code, debugging is not easy. • Usually need multiple map-reduce operations. • Solution: declarative query languages • PIG Latin (Yahoo!) • HiveQL( Facebook) • …

Declarative Languages over Map/Reduce • PIG Latin (Yahoo!) • New language, similar to Relational Algebra • Open source • HiveQL(Facebook) • SQL-like language • Open source • SQL: Big-Query (Google) • SQL on Map/Reduce • Proprietary

SQL Operations • Map: Group By • Reduce: Aggregate • How to compute join between R(A,B) and S(B,C) • Map: group R by R.B, group S by S.B • Input: a tuple R(a,b) or a tuple S(b,c) • Output: (b,R(a,b)) or (b,S(b,c)) • Reduce: • Input: (b,{R(a1,b),R(a2,b),...,S(b,c1),S(b,c2),...}) • Output: {R(a1,b),R(a2,b),...} × {S(b,c1),S(b,c2),...} • It relies on MR framework for partitioning • We can do better (covered in the next course).

More Operations • This simple algorithms relies on MR framework for partitioning • It does not know about join. • We can do better. • Other algorithms • Multi-way joins • Computational algorithms • Matrix multiplication • Covered in the next course: big data analytics.

Parallel DBMS versus Map/Reduce • Parallel DBMS • Relational data model and schema • Declarative query language: SQL • Many pre-defined operators • Can easily combine operators into complex queries • Query optimization, indexing, and physical tuning • Pipelines data from one operator to the next • Does more than just running queries • Updates and transactions • Constraints • security. • …

Parallel DBMS versus Map/Reduce • Map/Reduce • Data model is a file of (key, value) pairs. • No need to transform and load data. • Easy to write user-defined operators. • Can easily add more nodes to the cluster. • Intra-query fault-tolerance because it stored the of results on disk. • Handles problems such as stragglers • More scalable, but needs more nodes

Lessons • Usability, usability, usability! • The main cause of popularity of Map/Reduce. • It is easy to use by developers. • There is still a lot of space for improvement. • Sometimes, we have to re-build a framework • They did not extend parallel databases

CS 540 Database Management Systems