1 / 29

Concurrency control in transactional systems

Concurrency control in transactional systems. Jinyang Li. Some slides adapted from Stonebraker and Madden. What we ’ ve learnt so far…. All-or-nothing atomicity in transactions So that failures do not result in (undesirable) intermediate state How to realize all-or-nothing atomicity?

mayten
Download Presentation

Concurrency control in transactional systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Concurrency control in transactional systems Jinyang Li Some slides adapted from Stonebraker and Madden

  2. What we’ve learnt so far… • All-or-nothing atomicity in transactions • So that failures do not result in (undesirable) intermediate state • How to realize all-or-nothing atomicity? • Logging • REDO/UNDO logging vs. REDO-only logging

  3. Concurrency control • Many application clients are concurrently accessing a storage system. • Concurrent transactions might interfere • Similar to data races in parallel programs

  4. Transfer $100 from A to B Report sum of money Withdraw $50 from A A_bal = READ(A) If (A_bal >100) { B_bal = READ(B) B_bal += 100 A_bal -= 100 WRITE(A, A_bal) WRITE(B, B_bal) } A_bal = READ(A) B_bal = READ(B) Print(A_bal+B_bal) A_bal = READ(A) If (A_bal >50) { A_bal -= 50 WRITE(A, A_bal) dispense $50 to user } An example Storage system

  5. yfs client 1 (CREATE) d = GET(dir) newd = append “x” to d PUT(newd) Yfs lab example yfs client 2 (CREATE) d = GET(dir) newd = append “y”to d PUT(newd) extent server

  6. Solutions? • Leave it to application programmers • Application programmers are responsible for locking data items • Storage system performs automatic concurrency control • Concurrent transactions execute as if serially

  7. ACID transactions • A (Atomicity) • All-or-nothing w.r.t. failures • C (Consistency) • Transactions maintain any internal storage state invariants • I (Isolation) • Concurrently executing transactions do not interfere • D (Durability) • Effect of transactions survive failures

  8. What’s ACID? an example T1: Transfer $100 from A to B T2: Transfer $50 from B to A • AT1 completes or nothing (ditto for T2) • D once T1/T2 commits, stays done, no updates lost • Ino races, as if T1 happens either before or after T2 • C preserves invarants, e.g. account balance > 0

  9. Ideal isolation semantics: serializability • Definition: execution of a set of transactions is equivalent to some serial order • Two executions are equivalent if they have the same effect on database and produce same output.

  10. Conflict serializability • An execution schedule is the ordering of read/write/commit/abort operations A_bal = READ(A) B_bal = READ(B) B_bal += 100 A_bal -= 100 WRITE(A, A_bal) WRITE(B, B_bal) Commit A_bal = READ(A) B_bal = READ(B) Print(A_bal+B_bal) Commit A (serial) schedule: R(A),R(B),W(A),W(B),C,R(A),R(B),C

  11. Conflict serializability • Two schedules are equivalent if they: • contain same operations • order conflicting operations the same way • A schedule is serializable if it’s identical to some serial schedule Two ops conflict if they access the same data item and one is a write.

  12. Examples A_bal = READ(A) B_bal = READ(B) B_bal += 100 A_bal -= 100 WRITE(A, A_bal) WRITE(B, B_bal) A_bal = READ(A) B_bal = READ(B) Print(A_bal+B_bal) Serializable? R(A),R(B),R(A),R(B),C W(A),W(B),C Equivalent serial schedule: R(A),R(B),C,R(A),R(B),W(A),W(B),C Serializable? R(A),R(B), W(A), R(A),R(B),C, W(B),C

  13. Realize a serializable schedule • Locking-based approach • Strawman solution 1: • Grab global lock before transaction starts • Release global lock after transaction commits • Strawman solution 2: • Grab lock on item X before reading/writing X • Release lock on X after reading/writing X

  14. Strawman 2’s problem A_bal = READ(A) B_bal = READ(B) B_bal += 100 A_bal -= 100 WRITE(A, A_bal) WRITE(B, B_bal) A_bal = READ(A) B_bal = READ(B) Print(A_bal+B_bal) Possible with strawman 2? (short-duration locks on writes) R(A),R(B), W(A), R(A),R(B),C, W(B),C Read an uncommitted value Locks on writes should be held till end of transaction

  15. More Strawmans • Strawman 3 • Grab lock on item X before writing X • Release locks at the end of transaction • Grab lock on item X before reading X • Release locks immediately after reading Read locks must be held till commit time Non-repeatable reads Possible with strawman 3? (short-duration locks on reads) R(A),R(B), W(A), R(A),R(B),C, W(B),C R(A), R(A),R(B), W(A), W(B),C, R(B), C

  16. Strawman 3’s problem A_bal = READ(A) B_bal = READ(B) B_bal += 100 A_bal -= 100 WRITE(A, A_bal) WRITE(B, B_bal) A_bal = READ(A) B_bal = READ(B) Print(A_bal+B_bal) Possible with strawman 3? (short-duration locks on reads) R(A),R(B), W(A), R(A),R(B),C, W(B),C Non-repeatable reads Read locks must be held till commit time R(A), R(A),R(B), W(A), W(B),C, R(B), C

  17. Realize a serializable schedule • 2 phase locking (2PL) • A growing phase in which the transaction is acquiring locks • A shrinking phase in which locks are released • In practice, • The growing phase is the entire transaction • The shrinking phase is at the commit time • Optimization: • Use read/write locks instead of exclusive locks

  18. 2PL: an example R_lock(A) A_bal = READ(A) R_lock(B) B_bal = READ(B) B_bal += 100 A_bal -= 100 W_lock(A) WRITE(A, A_bal) W_lock(B) WRITE(B, B_bal) W_unlock(A) W_unlock(B) R_lock(A) A_bal = READ(A) R_lock(B) B_bal = READ(B) Print(A_bal+B_bal) R_unlock(A) R_unlock(B) Possible? R(A),R(B), W(A), R(A),R(B),C W(B),C

  19. More on 2PL • What if a lock is unavailable? • Deadlocks possible? • How to cope with deadlock? • Grab locks in order? No always possible • Transaction manager detects deadlock cycles and aborts affected transactions • Alternative: timeout and abort yourself wait detect & abort Use crash recovery mechanism (e.g. UNDO records) to clean up

  20. The Phantom problem T1: begin_tx update emp (set salary =1.1 * salary) where dept = ‘CS’ end_tx T2: begin_tx Update emp (set dept = ‘CS’) where emp.name = ‘jinyang’ end_tx T1: update Bob T1: update Herbert T2: move Jinyang T2: move Lakshmi T2 commits and releases all locks T1: update Lakshmi(!!!!!) T1: update Michael T1: commits

  21. The Phantom problem • Issue is lock things you touch – but must guarantee non-existence of any new ones! • Solutions: • Predicate locking • Range locks in a B-tree index (assumes an index on dept). Otherwise, lock entire table • Often ignored in practice

  22. Why less than serializability? T1: begin_tx Select avg (sal) from emp end_tx T2: begin_tx Update emp … end_tx • Performance of locking? • What to do: • run analytics on a companion data warehouse (common practice) • take the stall (rarely done) • run less than serializability

  23. Degrees of isolation • Degree 0 (read uncommitted) • no locks (guarantees nothing) • Degree 1 (read committed) • Short read locks, long write locks • Degree 2 • Long read/write locks. (serializable unless you squint) • Degree 3 • Long read/write locks plus solving the phantom problem

  24. Disadvantages of locking-based approach • Need to detect deadlocks • Distributed implemention needs distributed deadlock detection (bad) • Read-only transactions can block update transactions • Big performance hit if there are long running read-only transactions

  25. Multi-version concurrency control • No locking during transaction execution! • Each data item is associated with multiple versions • Multi-version transactions: • Buffer writes during execution • Reads choose the appropriate version • At commit time, system validates if okay to make writes visible (by generating new versions)

  26. Snapshot isolation • A popular multi-version concurrency control scheme • A transaction: • reads a “snapshot” of database image • Can commit only if there are no write-write conflict

  27. Implementing snapshot isolation • T is assigned a commit timestamp • System checks forall T’, • s.t. T’.cts > T.sts && T’.cts < T.cts • T’.wset and T.wset do not overlap • T is assigned • a start timestamp,T.sts R(A) W(B) time • T reads the biggest version • of A, A(i), such that i <= T.sts • T buffers writes to B and adds • B to its writeset, T.wset += {B}

  28. R(A) R(B) An example No read-uncommitted R(A) R(B) W(A) W(B) 50 55 60 10 5 R(A) R(B) No unrepeatable-read

  29. Snapshot isolation < serializability • The write-skew problem Possible under SI? R(A),R(B),W(B), W(A),C,C Serializable?

More Related