530 likes | 689 Views
Map Reduce Basics Chapter 2. Basics. Divide and conquer Partition large problem into smaller subproblems Worker work on subproblems in parallel Threads in a core, cores in multi-core processor, multiple processor in a machine, machines in a cluster
E N D
Basics • Divide and conquer • Partition large problem into smaller subproblems • Worker work on subproblems in parallel • Threads in a core, cores in multi-core processor, multiple processor in a machine, machines in a cluster • Combine intermediate results from worker to final result • Issues • How break up into smaller tasks • Assign tasks to workers • Workers get data needed • Coordinate synchronization among workers • Share partial results • Do all if SE errors and HW faults?
Basics • MR – abstraction that hides system-level details from programmer • Move code to data • Spread data across disks • DFS manages storage
Topics • Functional programming • MapReduce • Distributed file system
Functional Programming Roots • MapReduce = functional programming plus distributed processing on steroids • Not a new idea… dates back to the 50’s (or even 30’s) • What is functional programming? • Computation as application of functions • Computation is evaluation of mathematical functions • Avoids state and mutable data • Emphasizes application of functions instead of changes in state
Functional Programming Roots • How is it different? • Traditional notions of “data” and “instructions” are not applicable • Data flows are implicit in program • Different orders of execution are possible • Theoretical foundation provided by lambda calculus • a formal system for function definition • Exemplified by LISP, Scheme
Overview of Lisp • Functions written in prefix notation (+ 1 2) 3 (* 3 4) 12 (sqrt ( + (* 3 3) (* 4 4))) 5 (define x 3) x (* x 5) 15
Functions • Functions = lambda expressions bound to variablesExample expressed with lambda:(+ 1 2) 3 λxλy.x+y • Above expression is equivalent to: • Once defined, function can be applied: (define foo (lambda (x y) (sqrt (+ (* x x) (* y y))))) (define (foo x y) (sqrt (+ (* x x) (* y y)))) (foo 3 4) 5
Functional Programming Roots • Two important concepts in functional programming • Map: do something to everything in a list • Fold: combine results of a list in some way
Functional Programming Map • Higher order functions – accept other functions as arguments • Map • Takes a function f and its argument, which is a list • applies to all elements in list • Returns a list as result • Lists are primitive data types • [1 2 3 4 5] • [[a 1] [b 2] [c 3]]
Map/Fold in Action • Simple map example: (map (lambda (x) (* x x)) [1 2 3 4 5]) [1 4 9 16 25]
Functional Programming Reduce • Fold • Takes function g, which has 2 arguments: an initial value and a list. • The g applied to initial value and 1st item in list • Result stored in intermediate variable • Intermediate variable and next item in list 2nd application of g, etc. • Fold returns final value of intermediate variable
Map/Fold in Action • Simple map example: • Fold examples: • Sum of squares: (map (lambda (x) (* x x)) [1 2 3 4 5]) [1 4 9 16 25] (fold + 0 [1 2 3 4 5]) 15 (fold * 1 [1 2 3 4 5]) 120 (define (sum-of-squares v) // where v is a list (fold + 0 (map (lambda (x) (* x x)) v))) (sum-of-squares [1 2 3 4 5]) 55
Functional Programming Roots • Use map/fold in combination • Map – transformation of dataset • Fold- aggregation operation • Can apply map in parallel • Fold – more restrictions, elements must be brought together • Many applications do not require g be applied to all elements of list, fold aggregations in parallel
Functional Programming Roots • Map in MapReduce is same as in functional programming • Reduce corresponds to fold • 2 stages: • User specified computation applied over all input, can occur in parallel, return intermediate output • Output aggregated by another user-specified computation
Mappers/Reducers • Key-value pair (k,v) – basic data structure in MR • Keys, values – int, strings, etc., user defined • e.g. keys – URLs, values – HTML content • e.g. keys – node ids, values – adjacency lists of nodes Map: (k1, v1) -> [(k2, v2)] Reduce: (k2, [v2]) -> [(k3, v2)] Where […] denotes a list
General Flow Map • Apply mapper to every input key-value pair stored in DFS • Generate arbitrary number of intermediate (k,v) • Group by operation on intermediate keys within mapper (really a sort? But called a shuffle)) • Distribute intermediate results by key – not across reducers but across the network(really a shuffle? But called a sort) • Aggregate intermediate results • Generate final output to DFS – one file per reducer Reduce
Another Example: unigram (word count) • (docid, doc) on DFS, doc is text • Mapper tokenizes (docid, doc), emits (k,v) for every word – (word, 1) • Execution framework all same keys brought together in reducer • Reducer – sums all counts (of 1) for word • Each reduce writes to one file • Words within file sorted, file same # words • Can use output as input to another MR
Combine - Bandwidth Optimization • Issue: Can be a large number of key-value pairs • Example – word count (word, 1) • If copy across network intermediate data > input • Solution: use Combiner functions • allow local aggregation (after mapper) before shuffle sort • Word Count - Aggregate (count each word locally) • intermediate = # unique words • Executed on same machine as mapper – no output from other mappers • Results in a “mini-reduce” right after the map phase • (k,v) of same type as input/output • If operation associative and commutative, reduce can be same as combiner • Reduces key-value pairs to save bandwidth
Partitioners – Load Balance • Issue: Intermediate results can all be on one reducer • Solution: use Partitioner functions • divide up intermediate key space and assign (k,v) to reducers • Specifies task to which copy (k,v) • Reducer processes keys in sorted order • Partitionerapplies function to key • Hopefully same number of each to each reducer • But may be- Zipfian
MapReduce • Programmers specify two functions: map (k, v) → <k’, v’>* reduce (k’, v’) → <k’, v’>* • All v’ with the same k’ are reduced together • Usually, programmers also specify: partition (k’, number of partitions ) → partition for k’ • Often a simple hash of the key, e.g. hash(k’) mod n • Where n is the number of reducers • Allows reduce operations for different keys in parallel
Its not just Map and Reduce Map • Apply mapper to every input key-value pair stored in DFS • Generate arbitrary number of intermediate (k,v) • Aggregate locally • Assign to reducers • Group by operation on intermediate keys • Distribute intermediate results by key not across reducers • Aggregate intermediate results • Generate final output to DFS – one file per reducer Combine Partition Reduce
Execution Framework • MapReduce program (job) contains • Code for mappers • Combiners • Partitioners • Code for reducers • Configuration parameters (where is input, store output) • Execution framework takes care of everything else • Developer submits job to submission node of cluster (jobtracker)
Recall these problems? • How do we assign work units to workers? • What if we have more work units than workers? • What if workers need to share partial results? • How do we aggregate partial results? • How do we know all the workers have finished? • What if workers die?
Execution Framework • Scheduling • Job divided into tasks (certain block of (k,v) pairs) • Can have 1000s jobs need to be assigned • May exceed number that can run concurrently • Task queue • Coordination among tasks from different jobs
Execution Framework • Speculative execution • Map phase only as fast as? • slowest map task • Problem: Stragglers, flaky hardware • Solution: Use speculative execution: • Exact copy of same task on different machine • Uses result of fastest task in attempt to finish • Better for map or reduce? • Can improve running time by 44% (Google) • Doesn’t help if skewed distribution of values
Execution Framework • Data/code co-location • Execute near data • It not possible must stream data • Try to keep within same rack
Execution Framework • Synchronization • Concurrently running processes join up • Intermediate (k,v) grouped by key, copy intermediate data over network, shuffle/sort • Number of copy operations? Worst case: • M X R copy operations • Each mapper may send intermediate results to every reducer • Reduce computation cannot start until all mappers finished, (k,v) shuffled/sorted • Differs from functional programming • Can copy intermediate (k,v) over network to reducer when mapper finishes
Execution Framework • Error/fault handling • The norm • Disk failures, RAM errors, datacenter outages • Software errors • Corrupted data
Map Reduce • Implementations: • Google has a proprietary implementation in C++ • Hadoop is an open source implementation in Java (lead by Yahoo)
Differences in MapReduce Implementations • Hadoop (Apache) vs. Google • Hadoop - Values arbitrarily ordered, can change key in reducer • Google – program can specify 2ndary sort, can’t change key in reducer • Hadoop • Programmer can specify number of map tasks, but framework makes final decision • In reduce, programmer specified number of tasks is used
Hadoop • Careful using external resources (e.g. bottleneck querying SQL DB) • Mappers can emit arbitrary number of intermediate (k,v), can be of different type • Reduce can emit arbitrary number of final (k,v) and can be of different type than intermediate (k,v) • Different from functional programming, can have side effects (state change internal – may cause problems, external may write to files) • MapReduce can have no reduce, but must have mapper • Can just pass identity function to reducer • May not have any input • compute pi
Other Sources • Other source can serve as source/destination for data from MapReduce • Google – BigTable • Hbase – BigTable clone • Hadoop – integrated RDB with parallel processing, can write to DB tables
Distributed File System (DFS) • In HPC, storage distinct from computation • NAS (network attached storage) and SAN are common • Separate, dedicated nodes for storage • Fetch, load, process, write • Bottleneck • Higher performance networks $$ (10G Ethernet), special purpose interconnects $$$ (InfiniBand) • $$ increases non-linearly • In GFS Computation and storage not distinct components
Hadoop Distributed File System - HDFS • GFS supports proprietary MapReduce • HDFS – supports Hadoop • Don’t have to run GFS on DFS, but misses advantages • Difference in GFS and HDFS vs. DFS: • Adapted to large data processing • divide user data into chunks/blocks – LARGE (was) • Replicate these across the local disk nodes in cluster • Master-slave architecture
HDFS vs GFS (Google File System) • Difference in HDFS: • Master-slave architecture • GFS: Master (master), slave (chunkserver) • HDFS: master (namenode), slave (datanode) • Master – namespace (metadata, directory structure, file to block mapping, location of blocks, access permission) • Slaves – manage actual data blocks • Client contacts namespace, gets data from slaves, 3 copies of each block, etc. • Block is 64 MB • Initially Files were immutable – once closed cannot be modified
HDFS • Namenode • Namespace management • Coordinate file operations • Lazy garbage collection • Maintain file system health • Heartbeats, under-replication, balancing • Supports subset of POSIX API, pushed to application • No Security
Hadoop Cluster Architecture • HDFS namenode runs daemon • Job submission node runs jobtracker • point of contact run MapReduce • Monitors progress of MapReduce jobs, coordinates Mappers and reducers • Slaves run tasktracker • Runs users code, datanode daemon, serve HDFS data • Send heartbeat messages to jobtracker
Hadoop Cluster Architecture • Number of reduce tasks depends on reducers specified by programmer • Number of map tasks depends on • Hint from programmer • Number of input files • Number of HDFS data blocks of files
Hadoop Cluster Architecture • Map tasks assigned • (k,v) called input split • Input splits computed automatically • Aligned on HDFS boundaries so associated with single block, simplifies scheduling • Data locality, if not stream across network (same rack if possible)
Hadoop Cluster Architecture • Mappers in Hadoop • Javaobjects with a MAP method • Mapper object instantiated for every map task by tasktracker • Life cycle – instantiation, hook in API for program specified code • Mappers can load state, static data sources, dictionaries, etc. • After initialization: MAP method called by framework on all (k,v) in input split • Method calls within same Java object, can preserve state across multiple (k,v) in same task • Can run programmer specified termination code