430 likes | 514 Views
Phase Reconciliation for Contended In-Memory Transactions. Neha Narula, Cody Cutler, Eddie Kohler, Robert Morris MIT CSAIL and Harvard. Cloud Computing and Databases. Two trends: multicore and in-memory databases
E N D
Phase Reconciliation for Contended In-Memory Transactions Neha Narula, Cody Cutler, Eddie Kohler, Robert Morris MIT CSAIL and Harvard
Cloud Computing and Databases Two trends: multicore and in-memory databases Multicore databases face increased contention due to skewed workloads and rising core counts
BEGIN Transaction ADD(x, 1) ADD(y, 2) ENDTransaction
Concurrency Control Forces Serialization core 0 ADD(x)ADD(y) core 1 ADD(x)ADD(y) core 2 ADD(x)ADD(y) time
BEGIN Transaction ADD(x, 1) ADD(y, 2) ENDTransaction
Multicore OS Scalable Counters Kernel can apply increments in parallel using per-core counters x0 = x0+1; core 0 core 1 x1 = x1+1; core 2 x2 = x2+1; time
Multicore OS Scalable Counters To read per-core data, system has to stop all writes and reconcile x x0 = x0+1; core 0 core 1 x1 = x1+1; core 2 print(x); x2 = x2+1; x=x0+x1+x2; print(x) time
BEGIN Transaction ADD(x, 1) ADD(y, 2) ENDTransaction BEGIN Transaction GET(x) GET(y) ENDTransaction
Challenges • Deciding what records should be split among cores • Transactions operating on some contentious and some normal data • Different kinds of operations on the same records
Phase Reconciliation • Database automatically detects contention to split a record among cores • Database cycles through phases: split and joined • OCC serializes access to non-split data • Split records have assigned operations for split phases Doppel, an in-memory transactional database
Outline • Phases • Splittable operations • Doppel optimizations • Performance evaluation
Split Phase split phase • A transaction can operate on split and non-split data • Split records are written to per-core (x) • Rest of the records use OCC (u, v, w) • OCC ensures serializability for the non-split parts of the transaction core 0 ADD(x) ADD(u) GET(x)GET(v) x0=x0+1 core 1 ADD(x) ADD(v) x1=x1+1 core 2 ADD(x) ADD(w) x2=x2+1
Reconciliation reconciliation split phase • Cannot correctly process a read of x in current state • Abort read transaction • Reconcile per-core data to global store x=x+x0 x0=0 core 0 ADD(x) ADD(u) GET(x)GET(v) x0=x0+1 core 1 ADD(x) ADD(v) x=x+x1 x1=0 x1=x1+1 core 2 ADD(x) ADD(w) x=x+x2 x2=0 x2=x2+1
Joined Phase joined phase reconciliation split phase • Wait until all processes have finished reconciliation • Resume aborted read transactions using OCC • Process all new transactions using OCC • No split data x=x+x0 x0=0 core 0 GET(x) GET(v) ADD(x) ADD(u) GET(x)GET(v) x0=x0+1 core 1 ADD(x) ADD(v) x=x+x1 x1=0 x1=x1+1 core 2 ADD(x) ADD(w) x=x+x2 x2=0 x2=x2+1
Resume Split Phase joined phase reconciliation split phase split phase • When done, transition to split phase again • Wait for all processes to acknowledge they have completed joined phase before writing to per-core data x=x+x0 x0=0 core 0 GET(x) GET(v) ADD(x) ADD(u) GET(x)GET(v) x0=x0+1 core 1 ADD(x) ADD(v) ADD(x) ADD(v) x=x+x1 x1=0 x1=x1+1 x1=x1+1 ADD(x) ADD(v) core 2 ADD(x) ADD(w) x=x+x2 x2=0 x2=x2+1 x2=x2+1
Outline • Phases • Splittable operations • Doppel Optimizations • Performance evaluation
Transactions and Operations BEGIN Transaction (y) ADD(x,1) ADD(y,2) ENDTransaction • Transactions are composed of one or more operations • Only some operations are amenable to splitting
Supported Operations Splittable ADD(k,n) MAX(k,n) MULT(k,n) OPUT(k,v,o) TOPKINSERT(k,v,o) Not Splittable GET(k) PUT(k,v)
MAX Example • Each core keeps one piece of summarized state xi • MAX is commutative so results can be determined out of order MAX(x,55) core 0 x0 = 55 x0 = MAX(x0,55) MAX(x,10) core 1 x1 = 10 x1 = MAX(x1,10) core 2 MAX(x,21) x2 = 21 x2 = MAX(x2,21)
MAX Example • Each core keeps one piece of summarized state xi • MAX is commutative so results can be determined out of order MAX(x,55) MAX(x,2) core 0 x0 = 55 x0 = MAX(x0,55) x1 = MAX(x1,2) MAX(x,10) MAX(x,27) core 1 x1 = 27 x1 = MAX(x1,10) x1 = MAX(x1,27) core 2 MAX(x,21) x2 = 21 x2 = MAX(x2,21) x = 55
MAX Example • Each core keeps one piece of summarized state xi • MAX is commutative so results can be determined out of order MAX(x,55) MAX(x,2) core 0 x0 = 55 x0 = MAX(x0,55) x1 = MAX(x1,2) MAX(x,27) MAX(x,10) core 1 x1 = 27 x1 = MAX(x1,27) x1 = MAX(x1,10) MAX(x,21) core 2 x2 = 21 x2 = MAX(x2,21) x = 55
What Can Be Split? Operations that can be split must be: • Commutative • Pre-reconcilable • On a single key
What Can’t Be Split? • Operations that return a value • ADD_AND_GET(x,4) • Operations on multiple keys • ADD(x,y) • Different operations in the same phase, even if arguments make them commutative • ADD(x,0) and MULT(x,1)
Outline • Phases • Splittable operations • Doppel Optimizations • Performance evaluation
Batching Transactions joined phase reconciliation split phase x=x+x0 x0=0 core 0 GET(x) GET(v) ADD(x) ADD(u) GET(x)GET(v) x0=x0+1 core 1 ADD(x) ADD(v) x=x+x1 x1=0 x1=x1+1 core 2 ADD(x) ADD(w) x=x+x2 x2=0 x2=x2+1
Batching Transactions joined phase reconciliation split phase • Don’t switch phases immediately; stash reads • Wait to accumulate stashed transactions • Batch for joined phase x=x+x0 x0=0 core 0 GET(x) GET(v) ADD(x) ADD(u) GET(x)GET(v) x0=x0+1 core 1 ADD(x) ADD(v) ADD(x) ADD(v) x=x+x1 x1=0 x1=x1+1 x1=x1+1 GET(x) GET(x) core 2 GET(x) ADD(x) ADD(w) GET(x) x=x+x2 x2=0 x2=x2+1
Ideal World • Transactions with contentious operations happen in the split phase • Transactions with incompatible operations happen correctly in the joined phase
How To Decide What Should Be Split Data • Database starts out with no split data • Count conflicts on records • Make key split if #conflicts > conflictThreshold • Count stashes on records in the split phase • Move key back to non-split if #stashes too high
Outline • Phases • Splittable operations • Data classification • Performance evaluation
Performance • Contention vs. parallelism • What kinds of workloads benefit? • A realistic application: RUBiS Doppel implementation: • Multithreaded Go server; worker thread per core • All experiments run on an 80 core intel server running 64 bit Linux 3.12 • Transactions are one-shot procedures written in Go
Parallel Performance on Conflicting Workloads Throughput (millions txns/sec) 20 cores, 1M 16 byte keys, transaction: ADD(x,1)
Increasing Performance with More Cores 1M 16 byte keys, transaction: ADD(x,1) all writing same key
Varying Number of Hot Keys 20 cores, transaction: ADD(x,1)
How Much Stashing Is Too Much? 20 cores, transactions: LIKE read, LIKE write
RUBiS • Auction application modeled after eBay • Users bid on auctions, comment, list new items, search • 1M users and 33K auctions • 7 tables, 26 interactions • 85% read only transactions • More contentious workload • Skewed distribution of bids • More writes
StoreBid Transaction BEGIN Transaction (bidder, amount, item) num_bids_item = num_bids_item + 1 if amount > max_bid_item: max_bid_item= amount max_bidder_item= bidder bidder_item_ts= Bid{bidder, amount, item, ts} ENDTransaction
StoreBid Transaction BEGIN Transaction (bidder, amount, item) ADD(num_bids_item,1) MAX(max_bid_item, amount) OPUT(max_bidder_item, bidder, amount) PUT(new_bid_key(), Bid{bidder, amount, item, ts}) ENDTransaction All commutative operations on potentially conflicting auction metadata
RUBiS Throughput Throughput (millions txns/sec) 20 cores, 1M users 33K auctions
Related Work • Split counters in multicore OSes • Linux kernel • Commutativity in distributed systems and concurrency control • [Shapiro ‘11] • [Li ‘12] • [Lloyd ‘12] • [Weihl‘88] • Optimistic concurrency control • [Kung ’81] • [Tu ‘13]
Summary • Contention is rising with increased core counts • Commutative operations are amenable to being applied in parallel, per-core • We can get good parallel performance on contentious transactional workloads by combining per-core split data with concurrency control Neha Narula http://nehanaru.la @neha