290 likes | 458 Views
Distributed Concurrency Control, Lecture 4 (BHG , Chap. 4 + Comp. Surveys Article). Motivation. Distributed usage. Local autonomy. Maintainability. Allows for growth. Reliability – a number of copies. Components: Reliable communication. Local DBs – may be identical. Problems:
E N D
Distributed Concurrency Control, Lecture 4 (BHG, Chap. 4 + Comp. Surveys Article) (c) Oded Shmueli 2004
Motivation • Distributed usage. • Local autonomy. • Maintainability. • Allows for growth. • Reliability – a number of copies. • Components: • Reliable communication. • Local DBs – may be identical. • Problems: • Query processing. • Maintaining multiple copies. • Concurrency control and recovery. • Distributed data dictionary. (c) Oded Shmueli 2004
Topics • Part I: Distributed 2PL • Part II: Distributed Deadlocks • Part III: Timestamp based Algorithms • Part IV: Optimistic Concurrency Control (c) Oded Shmueli 2004
Part I: Distributed 2PL • Each item may have a number of copies. • Intuitively – behave as if there is a single copy. • Mechanisms: • Writers lock all copies. • Central copy. • Central locking site. • Majority locking. • A generalization. • Moving central copy – not covered. (c) Oded Shmueli 2004
Writers lock all copies • Each copy may be locked individually. • Read[x]: lock some copy of x. • Write[x]: lock all copies of x. • Resulting executions are SR. • Problems: • writers tend to deadlock. • Many messages. (c) Oded Shmueli 2004
Central copy • A central copy per item. • Read[x]: read-lock the central copy. • Write[x]: write-lock the central copy. • Advantage: fewer messages. (c) Oded Shmueli 2004
Central locking site • A single site that maintains a global lock table. • Advantages: • few messages. • checking the WFG for deadlocks. • Disadvantages: a possible bottleneck. (c) Oded Shmueli 2004
Majority locking • The previous solutions are vulnerable to site failure (any in the first, a central in the other two). • Read[x]: lock a majority of x’s copies. • Write[x]: lock a majority of x’s copies. • Thus, for all x, no transactions that conflict on x can both have a majority – effective lock. • Disadvantage: many messages, can trade time for number of messages using “forwarding”. (c) Oded Shmueli 2004
A generalization • Suppose there are n copies of x. • Let k, l be s.t. k + l > n and l > n/2. • Read[x]: obtain k out of n. • Write[x]: obtain l out of n. • There can be no concurrent reader/writer and another writer of x effectively locking x. • Choose l, k: • Many readers: small k. • Many writers: small l. (c) Oded Shmueli 2004
Part II: Distributed Deadlocks • Left as reading material. (c) Oded Shmueli 2004
Part III: Timestamp based Algorithms • A system model. • Assumptions. • Operations in a distributed environment. • Timestamp Ordering (TO). • Conservative Timestamp Ordering (CTO). • Transaction classes. (c) Oded Shmueli 2004
Transaction Transaction Transaction Transaction Transaction Transaction TM TM TM DM DM DM DATA DATA DATA A system model (c) Oded Shmueli 2004
Assumptions • No concurrency within a transaction. • Write into private workspaces at the various DMs. • Each transaction is managed by a single TM. • Each item x may have a number of physical copies x1, … , xn. (c) Oded Shmueli 2004
Operations in a distributed environment • Begin: set up a private workspace. • Read[x]: If x is in the workspace, read it from there. Otherwise read x from some copy xi by issuing dm_read. • Write[x,v]: The single copy of x in the privateworkspace is assigned v. • END: perform a 2-phase commit: • For each updated x, for all copies of x: • Issue a pre-stable-write command to store x on stable storage. • Once all DMs confirm: issue dm-write commands to the DMs to install the new value in the database. (c) Oded Shmueli 2004
Timestamp Ordering (TO) - skip • Idea: conflict equivalent to a serial history in timestamp order. • Item = <S_read, S_write, stable, value> • S_read – set of readers’ timestamps of the item. • S_write – set of writers’ timestamps of the item. • stable – a flag indicating a committed value. (c) Oded Shmueli 2004
Timestamp Ordering (TO) - skip • On accessing an item with stable = no: • wait possible deadlock. • abort may be wasteful. • DM_Read with ts: • if ts < max {t | t S_write } abort. • otherwise, read and add ts to S_read. • DM_Write with ts: • if ts < max {t | t S_read} abort. • if ts < max {t | t S_write } ignore (TWR). • otherwise, set stable = no; write and add ts to S_write. • Commit: After all writes are performed, set stable = yes. • Abort: Remove ts from S_read and S_write. Make all items the transaction updated stable=yes. (c) Oded Shmueli 2004
Another Timestamp Ordering Algorithm • Terminology • DM_r: read item. • DM_w: write item at transaction end. • p[x]: synchronization due to private write. • WTS(x), RTS(x): ts of latest dm-write, dm-read. • Buffering: Delaying operations for future execution. • min_r(x): ts of earliestts buffered read op. • min_w(x), min_p(x): same idea. • DM_r[x] is ready if ts(DM_r[x]) < min_p(x) in the buffer, if any. • DM_w[x] is ready if ts(DM_w[x]) < min_r(x), if any, and min_p(x) = ts(DM_w[x] ), in the buffer. (c) Oded Shmueli 2004
Another Timestamp Ordering Algorithm • DM_r[x]: • if ts(r[x]) < WTS[x] abort. • if ts(r[x]) > min_p[x] put in buffer. • otherwise, perform and update RTS(x). • p[x]: • if ts(p[x]) < RTS[x] abort. • if ts(p[x]) < WTS[x] abort. • otherwise, put in buffer. • DM_w[x]: (note: a p[x] previously executed, no abort) • if ts(w[x]) > min_r[x] put in buffer. • if ts(w[x]) > min_p[x] put in buffer. • otherwise, perform, update WTS(x) and throw p[x]. • Occasionally check if actions change min_r(x) and min_p(x) some buffer operation is now ready. • Observations: • No deadlocks are possible (why?). • Upon abort, discard the private workspace and all transaction’s operations. Need to update RTS(x). (c) Oded Shmueli 2004
Conservative Timestamp Ordering (CTO) • To prevent aborts do the following. • Perform an operation only when certain that a later operation will not be restarted due to small ts. • No aborts and no deadlocks, less concurrency. • But, how long to wait? • Solution – use CTO. • Operations: DM_r, DM_w. (c) Oded Shmueli 2004
transaction transaction transaction transaction TM1 TMn DM1 DMk Conservative Timestamp Ordering Architecture queue – ts order queue – ts order (c) Oded Shmueli 2004
Conservative Timestamp Ordering Algorithm • TMs must submit dm-read operations in ts-order, if an operation with ts t is issued, one with ts s < t will not be issued in the future. • Similarly for dm-writes. • Achieve by: • each TM works serially. • each transaction first reads all its data and then writes all results at the end (still in ts order but allows execution parallelism). Termination of transactions in ts order. • Data items need no associated timestamps. (c) Oded Shmueli 2004
CTO - Ready Operations • Maintain at each DM a queue for read and write ops. • Buffer DM_r and DM_w operations. • Output a DM_r operation if: • There is a DM_w operation from each TMi and all such operations have higher timestamps. • Output a DM_w operation if: • There is a DM_r operation from each TMi and all such operations have higher timestamps. • There is a DM_w operation from each TMi and all such operations have higher timestamps. • DM_w operations are never rejected! • Overall effect: a serial execution in timestamp order! (c) Oded Shmueli 2004
Conservative Timestamp Ordering - Problems • Problem: What if a TM issues no operation to this queue? • Solution: null operations (have a ts but are no-ops). • Can send ‘infinite time stamps’ to indicate expected long inactivity. (c) Oded Shmueli 2004
Transaction classes • CTO synchronizes everything, an overkill! • Transaction class= <readset, writeset> • If transactions are known in advance, each transaction can be assigned to one or more classes. • If T reads X and writes Y, T belong to a class c=<rs, ws> if X rs and Y ws. • A TM manages a single class. • A transaction must belong to the class of the TM managing it. • Run CTO, only ops of relevant TMs are considered: • To output DM_r[x], wait until there are DM_w operations from all TMs (classes) that have x in their write sets with higher ts. • To output DM_w[x], wait until there are DM_r operations with higher ts from all TMs (classes) that have x in their read sets and DM_w operations with higher ts from all TMs (classes) that have x in their write sets. (c) Oded Shmueli 2004
Part IV: Optimistic Concurrency Control • Can be based on locking or timestamps. • First, a centralized algorithm. • We show a timestamp-based algorithm (Kung-Robinson). • Then, adaptation to a distributed environment. (c) Oded Shmueli 2004
Rules for Validation: Centralized • A transaction has read, validate and write phases. During the read phase it also computes and writes to a private space. • Executions will be serializable in timestamp order. • To ensure this, for all transactions Tk s.t. ts(Tk) < ts(T), one of the following should hold: • Tk completed its write phase prior to T starting its read phase. • Tk completed its write phase while T is in its read phase and write-set(Tk) read-set(T) = . • Tk completed its read phase before T completes its read-phase, write-set(Tk) read-set(T) = and write-set(Tk) write-set(T) = . • A timestamp is assigned only once validation succeeds. Do it after the write phase. • Different validations can be executed in parallel. • So, each transaction T uses START(T) and FINISH(T) to determine the transactions against which it should be validated. (c) Oded Shmueli 2004
Rules for T and Tk Tk completed its write phase prior to T starting its read phase. ts(Tk) < start(T). rule a R V W R V W Tk completed its write phase while T is in its read phase and write-set(Tk) read-set(T) = . start(T) < ts(Tk) < finish(T). R V W rule b R V W Tk completed its read phase before T completes its read-phase, write-set(Tk) read-set(T) = and write-set(Tk) write-set(T) = . finish(Tk) < finish(T). R V W rule c R V W (c) Oded Shmueli 2004
Distributed Setting • A transaction can execute at many sites. • Perform a validation phase at each site in which T operated. This is called local validation. • The local site may have purely local as well as sub-transactions of global transactions. • If validation is successful at all sites, ensure global consistency: • Build HB(Tj) for each sub-transaction of T at site j. This is a set of id’s of global transactions that must precede T. Built during local validation. • Global validation is done by making sure that each transaction in the HB set is either committed or aborted. • Deadlocks are possible. • After the global validation phase, a timestamp can now be issued. It will be the same one for all local sub-transactions of a global transaction. • Use 2-phase commit. Notify local sub-transactions. (c) Oded Shmueli 2004