660 likes | 820 Views
CSC 536 Lecture 3. Outline. Akka example: mapreduce Distributed transactions. MapReduce Framework: Motivation. Want to process lots of data ( > 1 TB) Want to parallelize the job across hundreds/thousands of commodity CPUs connected by a commodity networks
E N D
Outline • Akka example: mapreduce • Distributed transactions
MapReduce Framework: Motivation Want to process lots of data ( > 1 TB) Want to parallelize the job across hundreds/thousands of commodity CPUs connected by a commodity networks Want to make this easy, re-usable
Example Uses at Google • Pagerank • wordcount • distributed grep • distributed sort • web link-graph reversal • term-vector per host • web access log stats • inverted index construction • document clustering • machine learning • statistical machine translation • …
Programming Model Users implement interface of two functions: mapper(in_key, in_value) -> list((out_key, intermediate_value)) reducer (out_key, intermediate_valueslist) -> (out_key, out_value)
Map phase Records from the data source are fed into the mapperfunction as (key, value) pairs (filename, content) (goal: wordcount) (web page URL, web page content)(goal: web link-graph reversal) mapperproduces one or more intermediate (output key, intermediate value) pairs from the input (word, 1) (link URL, web page URL)
Reduce phase After the Map phase is over, all the intermediate values for a given output key are combined together into a list (“hello”, 1), (“hello”, 1), (“hello”, 1) -> (“hello”, [1,1,1]) Done by intermediate aggregator step ofMapReduce reducer function combines those intermediate values into one or more final values for that same output key (“hello”, [1,1,1]) -> (“hello”, 3)
Parallelism mapperfunctions run in parallel, creating different intermediate values from different input data sets reducerfunctions also run in parallel, each working on a different output key All values are processed independently
MapReduce example: wordcount • Problem: Count the number of occurrences of words in a set of files • Input to any MapReduce job: A set of (input_key, input_value) pairs • In wordcount: (input_key, input_value) = (filename, content) filenames = ["a.txt", "b.txt", "c.txt"] content = {} for filename in filenames: f = open(filename) content[filename] = f.read() f.close()
MapReduce example: wordcount • The content of the input files a.txt: The quick brown fox jumped over the lazy grey dogs. b.txt: That's one small step for a man, one giant leap for mankind. c.txt: Mary had a little lamb, Its fleece was white as snow; And everywhere that Mary went, The lamb was sure to go.
MapReduce example: wordcount • Map phase: • Function mapperis applied to every (filename, content) pair • mappermoves through the words in the file • for each word it encounters, it returns the intermediate key and value (word, 1) • A call to mapper("a.txt",content["a.txt"]) returns: [('the', 1), ('quick', 1), ('brown', 1), ('fox', 1), ('jumped', 1), ('over', 1), ('the', 1), ('lazy', 1), ('grey', 1), ('dogs', 1)] • The output of theMapphase is the concatenation of the lists for map("a.txt", content["a.txt"]), map("b.txt", content[“b.txt"]), and map("c.txt", content[“c.txt"])
MapReduce example: wordcount • The output of the Map phase • [('the', 1), ('quick', 1), ('brown', 1), ('fox', 1), • ('jumped', 1), ('over', 1), ('the', 1), ('lazy', 1), • ('grey', 1), ('dogs', 1), ('mary', 1), ('had', 1), • ('a', 1), ('little', 1), ('lamb', 1), ('its', 1), • ('fleece', 1), ('was', 1), ('white', 1), ('as', 1), • ('snow', 1), ('and', 1), ('everywhere', 1), ('that', 1), • ('mary', 1), ('went', 1), ('the', 1), ('lamb', 1), • ('was', 1), ('sure', 1), ('to', 1), ('go', 1), • ('thats', 1), ('one', 1), ('small', 1), ('step', 1), • ('for', 1), ('a', 1), ('man', 1), ('one', 1), • ('giant', 1), ('leap', 1), ('for', 1), ('mankind', 1)]
MapReduce example: wordcount • The Map phase of MapReduce is logically trivial • But when the input dictionary has, say, 10 billion keys, and those keys point to files held on thousands of different machines, implementing the map phase is actually quite non-trivial. • The MapReduce library should handle: • knowing which files are stored on what machines, • making sure that machine failures don’t affect the computation, • making efficient use of the network, and • storing the output in a useable form. • The programmer only writes the mapper function • The MapReduce framework takes care of everything else
MapReduce example: wordcount • In preparation for the Reduce phase, the MapReduce library groups together all the intermediate values which have the same key to obtain this intermediate dictionary: • {'and': [1], 'fox': [1], 'over': [1], 'one': [1, 1], 'as': [1], 'go': [1], 'its': [1], 'lamb': [1, 1], 'giant': [1], 'for': [1, 1], 'jumped': [1], 'had': [1], 'snow': [1], 'to': [1], 'leap': [1], 'white': [1], 'was': [1, 1], 'mary': [1, 1], 'brown': [1], 'lazy': [1], 'sure': [1], 'that': [1], 'little': [1], 'small': [1], 'step': [1], 'everywhere': [1], 'mankind': [1], 'went': [1], 'man': [1], 'a': [1, 1], 'fleece': [1], 'grey': [1], 'dogs': [1], 'quick': [1], 'the': [1, 1, 1], 'thats': [1]}
MapReduce example: wordcount • In the Reduce phase, a programmer-defined function • reducer(out_key, intermediate_value_list) • is applied to each entry in the intermediate dictionary. • For wordcount, reducer • sums up the list of intermediate values, and • returns both out_key and the sum as the output. def reduce(out_key, intermediate_value_list): return (out_key, sum(intermediate_value_list))
MapReduce example: wordcount • The output from the Reduce phase, and from the complete MapReducecomputation, is: • [('and', 1), ('fox', 1), ('over', 1), ('one', 2), ('as', 1), • ('go', 1), ('its', 1), ('lamb', 2), ('giant', 1), ('for', 2), • ('jumped', 1), ('had', 1), ('snow', 1), ('to', 1), ('leap', 1), • ('white', 1), ('was', 2), ('mary', 2), ('brown', 1), ('lazy', 1), • ('sure', 1), ('that', 1), ('little', 1), ('small', 1), • ('step', 1), ('everywhere', 1), ('mankind', 1), ('went', 1), • ('man', 1), ('a', 2), ('fleece', 1), ('grey', 1), ('dogs', 1), • ('quick', 1), ('the', 3), ('thats', 1)]
MapReduce example: wordcount • Map and Reduce can be done in parallel... but how is the grouping step that takes place between the map phase and the reduce phase done? • For the reduce functions to work in parallel, we need to ensure that all the intermediate values corresponding to the same key get sent to the same machine
MapReduce example: wordcount • Map and Reduce can be done in parallel... but how is the grouping step that takes place between the Map phase and the Reduce phase done? • For the reducer functions to work in parallel, we need to ensure that all the intermediate values corresponding to the same key get sent to the same machine • The general idea: • Imagine you’ve got 1000 machines that you’re going to use to run reduce on. • As the mapperfunctions compute the output keys and intermediate value lists, they compute hash(out_key) mod 1000 for some hash function. • This number is used to identify the machine in the cluster that the corresponding reducer will be run on, and the resulting output key and value list is then sent to that machine. • Because every machine running mapperuses the same hash function, this ensures that value lists corresponding to the same output key all end up at the same machine. • Furthermore, by using a hash we ensure that the output keys end up pretty evenly spread over machines in the cluster
mapreduce example • project FirstAkkaApplication in lecture 3 code • (from Akka Essentials by Gupta • project mapreduce in lecture 3 code
MapReduce optimizations Locality Fault Tolerance Time optimization Bandwidth optimization
Locality Master program divvies up tasks based on location of data tries to have mappertasks on same machine as physical file data, or at least same rack mappertask inputs are divided into 64 MB blocks same size as Google File System chunks
Redundancy for Fault Tolerance Master detects worker failures via periodic heartbeats Re-executes completed & in-progress mappertasks Re-executes in-progress reducertasks
Redundancy for time optimization Reduce phase can’t start until Map phase is complete Slow workers significantly lengthen completion time • A single slow disk controller can rate-limit the whole process • Other jobs consuming resources on machine • Bad disks with soft errors transfer data very slowly • Weird things: processor caches disabled Solution: Near end of phase, spawn backup copies of tasks • Whichever one finishes first "wins” • Effect: Dramatically shortens job completion time
Bandwidth Optimizations “Aggregator” function can run on same machine as a mapperfunction Causes a mini-reduce phase to occur before the realReduce phase, to save bandwidth
Distributed transactions • Transactions, like mutual exclusion, protect shared data against simultaneous access by several concurrent processes. • Transactions allow a process to access and modify multiple data items as a single atomic transaction. • If the process backs out halfway during the transaction, everything is restored to the point just before the transaction started.
Distributed transactions: example 1 • A customer dials into her bank web account and does the following: • Withdraws amount x from account 1. • Deposits amount x to account 2. • If telephone connection is broken after the first step but before the second, what happens? • Either both or neither should be completed. • Requires special primitives provided by the DS.
The Transaction Model Primitive Description BEGIN_TRANSACTION Make the start of a transaction END_TRANSACTION Terminate the transaction and try to commit ABORT_TRANSACTION Kill the transaction and restore the old values READ Read data from a file, a table, or otherwise WRITE Write data to a file, a table, or otherwise • Examples of primitives for transactions
Distributed transactions: example 2 BEGIN_TRANSACTION reserve WP -> JFK; reserve JFK -> Nairobi; reserve Nairobi -> Malindi;END_TRANSACTION (a) BEGIN_TRANSACTION reserve WP -> JFK; reserve JFK -> Nairobi; reserve Nairobi -> Malindi full =>ABORT_TRANSACTION (b) • Transaction to reserve three flights commits • Transaction aborts when third flight is unavailable
ACID • Transactions are • Atomic: to the outside world, the transaction happens indivisibly. • Consistent: the transaction does not violate system invariants. • Isolated (or serializable): concurrent transactions do not interfere with each other. • Durable: once a transaction commits, the changes are permanent.
Flat, nested and distributed transactions • A nested transaction • A distributed transaction
Implementation of distributed transactions • For simplicity, we consider transactions on a file system. • Note that if each process executing a transaction just updates the file in place, transactions will not be atomic, and changes will not vanish if the transaction aborts. • Other methods required.
Atomicity • If each process executing a transaction just updates the file in place, transactions will not be atomic, and changes will vanish if the transaction aborts.
Solution 1: Private Workspace • The file index and disk blocks for a three-block file • The situation after a transaction has modified block 0 and appended block 3 • After committing
Solution 2: Writeahead Log x = 0; y = 0; BEGIN_TRANSACTION; x = x + 1; y = y + 2 x = y * y; END_TRANSACTION; (a) Log [x = 0 / 1] (b) Log [x = 0 / 1] [y = 0/2] (c) Log [x = 0 / 1] [y = 0/2] [x = 0/4] (d) • (a) A transaction • (b) – (d) The log before each statement is executed
Concurrency control (1) • We just learned how to achieve atomicity; we will learn about durability when discussing fault tolerance • Need to handle consistency and isolation • Concurrency control allows several transactions to be executed simultaneously, while making sure that the data is left in a consistent state • This is done by scheduling operations on data in an order whereby the final result is the same as if all transactions had run sequentially
Concurrency control (2) • General organization of managers for handling transactions
Concurrency control (3) • General organization of managers for handling distributed transactions.
Serializability • The main issue in concurrency control is the scheduling of conflicting operations (operating on same data item and one of which is a write operation) • Read/Write operations can be synchronized using: • Mutual exclusion mechanisms, or • Scheduling using timestamps • Pessimistic/optimistic concurrency control
The lost update problem Transaction T : Transaction U : balance = b.getBalance(); balance = b.getBalance(); b.setBalance(balance*1.1); b.setBalance(balance*1.1); a.withdraw(balance/10) c.withdraw(balance/10) balance = b.getBalance(); $200 balance = b.getBalance(); $200 b.setBalance(balance*1.1); $220 b.setBalance(balance*1.1); $220 a.withdraw(balance/10) $80 c.withdraw(balance/10) $280 Accounts a, b, and c start with $100, $200, and $300, respectively
The inconsistent retrievals problem : Transaction V : Transaction W a.withdraw(100) aBranch.branchTotal() b.deposit(100) a.withdraw(100); $100 total = a.getBalance() $100 total = total+b.getBalance() $300 total = total+c.getBalance() b.deposit(100) $300 Accounts a and b start with $200 each.
A serialized interleaving of T and U Transaction T : Transaction U : balance = b.getBalance() balance = b.getBalance() b.setBalance(balance*1.1) b.setBalance(balance*1.1) a.withdraw(balance/10) c.withdraw(balance/10) balance = b.getBalance() $200 b.setBalance(balance*1.1) $220 balance = b.getBalance() $220 b.setBalance(balance*1.1) $242 a.withdraw(balance/10) $80 c.withdraw(balance/10) $278
A serialized interleaving of V and W Transaction V : Transaction W : a.withdraw(100); aBranch.branchTotal() b.deposit(100) $100 a.withdraw(100); $300 b.deposit(100) $100 total = a.getBalance() $400 total = total+b.getBalance() total = total+c.getBalance() ...
Read and write operation conflict rules Operations of different Conflict Reason transactions read read No Because the effect of a pair of read operations does not depend on the order in which they are executed read write Yes Because the effect of a read and a write operation depends on the order of their execution write write Yes Because the effect of a pair of write operations depends on the order of their execution
Serializability • Two transactions are serialized • if and only if • All pairs of conflicting operations of the two transactions are executed in the same order at all objects they both access.
A non-serialized interleaving of operations of transactions T and U Transaction T : Transaction U : x = read(i) write(i, 10) y = read(j) write(j, 30) write(j, 20) z = read (i)
Recoverability of aborts • Aborted transactions must be prevented from affecting other concurrent transactions • Dirty reads • Cascading aborts • Premature writes
A dirty read when transaction T aborts Transaction T : Transaction U : a.getBalance() a.getBalance() a.setBalance(balance + 10) a.setBalance(balance + 20) balance = a.getBalance() $100 a.setBalance(balance + 10) $110 balance = a.getBalance() $110 a.setBalance(balance + 20) $130 commit transaction abort transaction
Cascading aborts • Suppose: • U delays committing until concurrent transaction T decides whether to commit or abort • Transaction V has seen the effects due to transaction U • T decides to abort