760 likes | 927 Views
Transaction. Before-or-after. Principles of Computer System (2012 Fall). Where are we?. System Complexity Modularity & Naming Enforced Modularity Network Fault Tolerance Transaction All-or-nothing Before-or-after. Review. Atomicity All-or-nothing: to tolerant crash By shadow-copy
E N D
Transaction Before-or-after Principles of Computer System (2012 Fall)
Where are we? • System Complexity • Modularity & Naming • Enforced Modularity • Network • Fault Tolerance • Transaction • All-or-nothing • Before-or-after
Review • Atomicity • All-or-nothing: to tolerant crash • By shadow-copy • By log • Before-or-after: to get more concurrency
Logging Protocol • CHANGE Record • The identity of the all-or-nothing action • A component action redo • installs the intended value in cell storage • After commit, if the system crashes , the recovery procedure can perform the install on behalf of the action • A second component action undo • reverses the effect on cell storage of the install • After aborts or the system crashes, it may be necessary for the recovery procedure to reverse the effect
Logging Protocol • The application • NEW_ACTION • log a BEGIN record that contains just the new identity • As the all-or-nothing action proceeds through its pre-commit phase, it logs CHANGE records • To implement COMMIT or ABORT • logs an OUTCOME record • commit point
Logging Protocol • procedure TRANSFER (debit_account, credit_account, amount) • my_id ← LOG (BEGIN_TRANSACTION) • dbvalue.old ← GET (debit_account) • dbvalue.new ← dbvalue.old - amount • crvalue.old ← GET (credit_account, my_id) • crvalue.new ← crvalue.old + amount • LOG (CHANGE, my_id, • “PUT (debit_account, dbvalue.new)”, //redo action • “PUT (debit_account, dbvalue.old)”) //undo action
Logging Protocol • LOG ( CHANGE, my_id, • “PUT (credit_account, crvalue.new)” //redo action • “PUT (credit_account, crvalue.old)”)//undo action • PUT (debit_account, dbvalue.new) // install • PUT (credit_account, crvalue.new) // install • if dbvalue.new> 0 then • LOG ( OUTCOME, COMMIT, my_id) • else • LOG (OUTCOME, ABORT, my_id) • signal(“Action not allowed. Would make debit account negative.”) • LOG (END_TRANSACTION, my_id)
Logging Protocol • procedure ABORT (action_id) • starting at end of log repeat until beginning • log_record ← previous record of log • if log_record.id = action_id then • if (log_record.type = OUTCOME) • then signal(“Can’t abort an already completed action.”) • if (log_record.type = CHANGE) • then perform undo_action of log_record • if (log_record.type = BEGIN) • then break repeat • LOG (action_id, OUTCOME, ABORTED) // Block future undos. • LOG (action_id, END)
Summary • Logging is a general technique for achieving all-or-nothing atomicity. • Widely used: databases, file systems, .. • Can achieve reasonable performance with logging. • Writes are always fast: sequential. • Reads can be fast with cell storage. • Key idea 1: write-ahead logging, makes it safe to update cell storage. • Key idea 2: recovery protocol, undo losers / redo winners.
Before-or-after atomicity • All-or-nothing + before-or-after => transaction • All-or-nothing: crash • Before-or-after: concurrency • In the midst of multiple-step atomic action • Definition of “correctness” (as in 9.1.5) • If every result is guaranteed to be one that could have been obtained by some purely serial application of those same actions
Simple serialization • Simple serialization: similar as lock-step • Transaction n waits transaction n-1 to complete • Drawbacks: too strict • Prevents all concurrency among transactions • Suitable for applications without many transactions • Next of this chapter are nothing but optimizations
More relaxed disciplines that still guarantee correctness • We don’t care when things happen • Transaction-3 can create new version of C before transaction-2 • Transaction-4 can run concurrently with transaction-3
Mark-point discipline • Step-1: wait for pending version • Step-2: mark data to be updated • Create a pending versions of every variable it intends to modify --- mark point • Announce when it is finished doing so • By MARK_POINT_ANNOUNCE, simple set a flag • Step-3: keep discipline of mark point: • No transaction can begin reading its inputs until the preceding transaction has reached its mark point or is no longer pending
Mark-point discipline • Distribute delays • Some in BEGIN_TRANSACTION() • Some in READ_CURRENT_VALUE() • Bootstrapping • Goal: before-or-after for general programs • Solution: special case of a “new outcome record” • Three layers • MARK_POINT -> before-or-after • NEW_OUTCOME_RECORD • TICKET, REQUIRE, RELEASE
Mark-point: no deadlock • Wait only earlier transaction • Earliest ones wait no one • Guarantee progress • Lock will not guarantee progress • Require additional mechanisms to ensure no deadlock • Two minor points • Reduce to simple serialization discipline • If wait to announce mark point until commit or abort • Two possible errors • Never call NEW_VERSION after mark point • Never try to write a value without new version
Read-capture: optimistic atomicity • Pessimistic methods • Presume that interference is likely • Prevent any possibility of interference actively • Simple serialization & mark-point • Optimistic methods • Allow write in any order and at any time • With the risk that “sorry, interfere write, you must abort, clear the history and then retry” • Read-capture discipline
Transaction-4 was late Transaction-6 has already read A Transaction-4 has to redo as 7
Read-capture’s correctness • Correctness • 1. WAIT for PENDING in READ ensures that transaction n will wait for k to commit or abort (k<n) • 2. High-water mark in READ and test in NEW_VERSION ensures transaction j will abort if n has read the object (j < n) • 3. Therefore, every value that READ returns to transaction n will include effect of 1…n-1 • 4. Therefore, every transaction n will act as if it serially follows transaction n-1 • Price of optimism • Later transaction may cause earlier ones to abort • Suitable for those application without a lot of data interference
Use version histories for before-or-after atomicity • Register renaming • Pentium-4 only has 8 architectural register • Has 128 physical register • Renaming by reorder buffer • Assigning a slot in the reorder buffer • NEW_OUTCOME_RECORD & NEW_VERSION • Committing instruction • WRITE_VALUE & COMMIT
Use version histories for before-or-after atomicity • Oracle database’s serializable • Snapshot isolation • When a transaction begins • System takes a snapshot of every committed value • Read all of its inputs from that snapshot • If two concurrent transaction modify the same variable • The first one to commit wins • Aborts the other one with “serialization error”
Use version histories for before-or-after atomicity • Transaction memory • Allow concurrent threads without locks • Mark the beginning of an atomic instruction sequence with a “begin transaction” instruction • Direct all STORE to a hidden copy • Check interference at end • Even more optimistic than read-capture • Most useful if interference is possible but unlikely • Hardware or software implementation
Pragmatics: lock • Lock: a flag associated with a data object to warn concurrent actions not to read or write the object • ACQUIRE (A.lock) / RELEASE (A.lock) • Only one will succeed • Problems • Easy to make error to race • Difficult to find out why • Three steps • Discipline specifies which locks must be acquired and when • Establish a compelling line of reasoning that concurrent transactions that follow the discipline will ensure before-or-after • Interpose a lock manager to enforces the discipline
System-wide locking • System-wide lock • begin_transactionACQUIRE (System.lock)… • …RELEASE (System.lock)end_transaction • Allow only one transaction to run at a time • Serialize potentially concurrent transactions in the order that they call ACQUIRE
Simple locking • Simple locking • 1. Acquire a lock for every shared data in advance • 2. Release locks only after commit or abort • Lock point (similar as mark-point) • Lock set: locks acquired when reaches lock point • Lock manager’s enforcement • Intercept read/write/commit/abort, and check • Problems • How to enumerate all shared object to access? • The set of might access may be larger than does access
Two-phase locking • Two-phase locking (similar as read-capture) • Avoids to know lock set in advance • Acquire locks as it proceeds, access data as soon as it acquires the lock • Constraints • Not release any locks until passes lock point • Only release a lock if never need to read/write again • Problems • Widely used, but hard to argue correct
Two-phase locking • Unnecessary blocking • Example • T1: READ X • T2: WRITE Y • T1: WRITE Y • T1’s & T2’s lock sets intersect at Y • Two-phase locking prevents interleaving • But T1/T2/T1 = T2/T1/T1 • NP-complete
Interactions between locks and logs • Transaction abort • Restore its changed data before release lock • Just like committed transactions doing nothing • System recovery • Whether the locks themselves should be logged? • No pending after recovery, thus no locks • Locks are in volatile memory • Non-complete transactions have no overlapping lock sets at the moment of crash
Performance optimizations • Physical locking VS. logical locking • Choose lock granularity • E.g. change 6-byte object of a 1000-byte disk sector,or change 1500-byte object on two disk sectors • Which to lock: the object or sector? • Logical locking • If objects are small: more concurrency, more logging • Physical locking • New logical layer between app and disk • E.g. data object management and garbage collection • Tailor the logging and locking design to match disk granularity • Disk sectors rather than object is a common practice
Two-phase locking • Two-phase locking (similar as read-capture) • Avoids to know lock set in advance • Acquire locks as it proceeds, access data as soon as it acquires the lock • Constraints • Not release any locks until passes lock point • Only release a lock if never need to read/write again • Problems • Widely used, but hard to argue correct
Two-phase locking • Unnecessary blocking • Example • T1: READ X • T2: WRITE Y • T1: WRITE Y • T1’s & T2’s lock sets intersect at Y • Two-phase locking prevents interleaving • But T1/T2/T1 = T2/T1/T1 • NP-complete
Interactions between locks and logs • Transaction abort • Restore its changed data before release lock • Just like committed transactions doing nothing • System recovery • Whether the locks themselves should be logged? • No pending after recovery, thus no locks • Locks are in volatile memory • Non-complete transactions have no overlapping lock sets at the moment of crash
Performance optimizations • Physical locking VS. logical locking • Choose lock granularity • E.g. change 6-byte object of a 1000-byte disk sector,or change 1500-byte object on two disk sectors • Which to lock: the object or sector? • Logical locking • If objects are small: more concurrency, more logging • Physical locking • New logical layer between app and disk • E.g. data object management and garbage collection • Tailor the logging and locking design to match disk granularity • Disk sectors rather than object is a common practice
Performance optimizations • Lock compatibility modes • Multiple-reader, single-writer protocol • Any number of readers is safe • Only one writer, wait for all reader to finish • Suitable for applications with a lot of reading • A writer may be delayed indefinitely • More specific, more complex
Deadlock & making progress • Inevitable if using locks in concurrency • 1. Waiting for one another • 2. Waiting for a lock by some deadlocked one • Correctness arguments ensures correctness, but no progress • Methods • Pessimistic ones: take a priori action to prevent • Optimistic ones: detect deadlocks then fix up
Methods for solving deadlock • Lock ordering (pessimistic) • Number the locks uniquely • Require transactions acquire locks in order • Problem: some app may not predict all of the locks they need before acquiring the first one • Backing out (optimistic) • Allow acquire locks in any order • If it encounters an already-acquired lock with an number lower than one it has previously acquired itself, then • UNDO: Back up to release its higher-numbered locks • Wait for the lower-numbered lock and REDO
Methods for solving deadlock • Timer expiration (optimistic) • Set a timer at begin_transaction, abort if timeout • If still no progress, another one may abort • Problem: how to chose the interval? • Cycle detection (optimistic) • Maintain a wait-for-graph in the lock manager • Shows owner and waiting ones • Check when transaction tries to acquire a lock • Prevent cycle (deadlock) • Select some cycle member to be a victim
Deadlock & making progress • Live-lock still possible • Two transactions with the same timeout value • Exponential random backoff • Delays the thread for a random time • Repeated retries failing indicate deeper problem • End-to-end argument • Worth the effect?
Two problems: multi-layer & multi-site • How to commit in multi-layer transaction? • UNDO: require the results of low-layer transaction be visible only within the higher-layer one • Delay: delay low-layer commitment to when high-layer commitment commits • How to provide atomicity in multi-site? • Communication delay & reliability, independent failure • Solution • Nested transaction => two-phase commit • Specialized form of RPC to coordinate steps
Hierarchical composition of transactions • Three layers all need atomicity • TRANSFER, PAY_INTEREST, MONTH_END_INTEREST • What if transferring money from A to B between two PAY_INTEREST in one MONTH_END_INTEREST? MONTH_END_INTEREST PAY_INTEREST TRANSFER