460 likes | 475 Views
Ch. 10. Transaction Manager Concepts. Dr. Hua COP 6730. Transaction Manager Concepts. The transaction manager (TM) furnishes the A, C, and D of ACID.
E N D
Ch. 10. Transaction Manager Concepts Dr. Hua COP 6730
Transaction Manager Concepts • The transaction manager (TM) furnishes the A, C, and D of ACID. • It provides the all-or-nothing property (atomicity) by undoing aborted transactions, redoing committed ones, and coordinating commitment with other TMs if a transaction happens to be distributed. • It provides consistency by aborting any transactions that fail to pass the RM consistency tests at commit. • It provides durability by forcing all log records of committed transactions to durable memory as part of commit processing, redoing any recently committed work at restart. • The TM together with the log manager and the lock manager supplies the mechanism to build RMs and computations with the ACID properties.
Normal Execution Lock Requests Lock Records new transaction Begin_Work ( ) TRID Lock Manager Work Requests Normal Functions 4. Write Commit log record and ? Log Manager Callback Functions UNDO, REDO, COMMIT 1. Want to Commit Commit_Work ( ) 2. Commit Phase 1? 3. YES to Phase 1 5. Commit Phase 2 6. Acknowledge RMs TM
Transaction Abort Normal Functions Callback Functions 3. UNDO (log record) 4. Aborted (TRID) Application Rollback_Work ( ) 1. rollback transaction 2. Read transaction’s log records Log Manager 5. write abort records Note: Rollback to a savepoint has similar logic. RMs TM
DO-UNDO-REDO Protocol • The DO-UNDO-REDO protocol is a programming style for RMs implementing transactional objects DO program: UNDO program: REDO program: • RM have following structure: New State Old State DO Log Record New State Old State UNDO DO Log Record Old State New State REDO Log Record Normal Function: DO program Callback Functions: UNDO & REDO program RM
Restart • The TM regularly invokes checkpoints during normal processing it informs each RM to checkpoint its state to persistent memory. • At restart, the transaction mgr. scans the log table forward from the most recent checkpoint to the end. • For each transaction that has not committed (e.g., T ? ) the TM calls the UNDO( ) callback of the RMs to undo it to the most recent persistent savepoint. Checkpoint Crash T1 T2 T3
Value Logging Each log record contains the old and the new states of the object. UNDO Program: set the object to the old state. REDO Program: set the object to the new state. Example: struct value_log_record_for_page_update { int opcode; /* opcode will say page update */ filename fname; /* name of file that was updated */ long pageno; /* page that was updated */ char old_value[PAGESIZE]; /* old value of page */ char new_value[PAGESIZE]; /* new value of page */ };
Logical Logging • Value logging is often called physical logging because it records the physical addresses and values of objects • Logical (or operation) logging records the name of an UNDO-REDO function and its parameter It assume that each action is atomic and that in each failure situation the system state will be action consistent: each logical action will have been completely done or completely undone.
Logical Logging (Cont’d) Problem: Partially complete actions can fail, and the UNDO of these partial actions will not be presented with an action-consistent state. step 1 step 2 step 3 step 1 step 2 step 3 logical action 1 logical action 2 a transaction
Physiological LoggingMotivation • Physiological logging is a compromise between logical and physical logging. It uses logical logging where possible. • There are the ideas that motivate physiological logging: Page actions: Complex actions can be structured as a sequence of page actions. Mini-transaction: Page actions can be structured as mini-transactions that use logical logging. • When the action completes, the object is updated. • An UNDO-REDO log record is created to cover that action. • These actions are atomic, consistent, and isolated.
Physiological LoggingMotivation (Cont’d) Log-object consistency: It is possible to structure the system so that at restart, the persistent state is page-action consistent. • The log can then be used to transform this action-consistent state into a transaction-consistent state at restart. Note: Physiological log records are physical to a page, and logical within a page.
Physiological LoggingAn Example Consider the insert that has the following logical log record: <insert op, tablename = A, record value = r> index 1 index 2 File C File B Table T (File A) Key B Key C
Physiological LoggingAn Example (Cont’d) This insert operation involves three page actions (we assume that B-tree splits do not happen). The corresponding physiological record bodies are: <insert op, base filename = A, page number = 508, record value = r> <insert op, base filename = B, page number = 72, index record value = s> <insert op, index filename = C, page number = 94, index record value = t> Fundamental idea: Log records are generated on a per-page basis. Log records are designed to make logical transformation of pages.
Physiological LoggingDuring Online Operation • We call normal operations without failures online operations. • To allow updates, all page changes must be structured as mini-transactions of this form: Mini_trans() lock the object in exclusive mode transform the object generate an UNDO-REDO log record unlock the object.
Physiological LoggingDuring Online Operation (Cont’d) • The mini-transaction approach ensures online consistency Page-action consistency: volatile and persistent memory are in a page-consistent state, and each page reflects the most recent updates to it. Log consistency: The log contains a history of all updates to pages.
One-Bit Resource MgrRequirements This RM manages an array of bits stored in a single page. Each bit is either free (TRUE) or busy (FALSE). One-Bit RM Client 1 get_bit ( ) one page “3” locked get_bit ( 5) Client 2 unlocked lsn
One-Bit Resource MgrRequirements (Cont’d) Requirements: • Page Consistency • No clean free bit has been given to any transaction. • Every clean busy bit has been given to exactly one transaction. • Dirty bits are locked in exclusive mode by the transaction that modified them. • The log sequence number (page lsn) reflects the most recent log record for this page. • Log Consistency • The log contains a log record for every completed mini-transaction update to the page.
One-Bit Resource Mgrgive_bit( ) #1 give_bit (int i) /* force a bit */ get XLOCK on the bit; if the XLOCK is granted then{ get the page semaphore; free the bit; generate log record saying bit is free; write log record and update lsn; /* page is now consistent */ free page semaphore; } else abort caller’s transaction; /* caller does not own the bit */
One-Bit Resource Mgrgive_bit( ) #1 (Cont’d) Note: This code has all the elements of a mini-transaction. • It is well formed and two-phased with respect to the page semaphore. • It provides a page action-consistent transformation of the page.
One-Bit Resource Mgrgive_bit( ) #2 get_bit (void) /* allocate a free bit to and returns bit index */ get the page semaphore; repeat_until end of bit array{ find the next free bit and XLOCK it; if lock is granted /* the bit is free */ then{ mark the bit busy; generate log record describing update; write log record and update lsn; /* page is now consistent */ give up semaphore; return the bit index to caller;} if no free bits were found during the repeat loop then { abort transaction; return “-1” to caller; }
The FIX Rule While the semaphore is set, the page is said to be fixed, and releasing the page is called unfixing it. Fixed Rule: • Get the page semaphore in exclusive mode prior to altering the page. • Get the semaphore in shared or exclusive mode prior to reading the page. • Hold the semaphores until the page and log are again consistent, and read or update is complete.
The FIX Rule (Cont’d) Note: This is just two-phase locking at the page-semaphore level. • Isolation Theorem tells us that all read and write actions on page will be isolated. • Page updates are actually min-transactions. • When the page is unfixed, the page should be consistent and the log record should allow UNDO or REDO of the page transformation.
Multi-Page Actions • Some actions modify several pages at once. Examples: • Inserting a multi-page record. • Splitting a B-tree node. • These actions are structured as follows: • Fix all the relevant pages • Do all the modifications and generate many log records. • Unfix the page.
Dealing with Failures • Page actions provide page consistency even if they fault. • Copy the page at the beginning of the page action; then • if anything goes wrong with the page action prior to writing the log record, the page action just returns the page to its original values by copying it back. • Complex operations depend on transaction UNDO to roll back. • Each complex action should start by declaring a savepoint. • If anything goes wrong during a page action, the operation first makes that page consistent. • The action can then call Roll_work () to return to the savepoint Note: The save point wraps the complex action within a subtransaction so that the complex action can be undone if it fails.
Online Consistency & Restart Consistency lsn time stamp • Online log consistency requires that volatile log contain all log records up to and including vvlsn: VVlsn VLlsn Volatile Page Versions . . . VVlsn Volatile Log Records . . . VLlsn Durable Log Records . . . DLlsn Persistent Page Versions . . . PPlsn
Online Consistency & Restart Consistency (Cont’d) • Restart consistency ensures that if a transaction has committed with commit_lsn, then that commit record is in the durable log: commit_lsn DLlsn In addition, restart consistency guarantees that if version X of the volatile copy overwrites the durable copy, then the log records for version X are already present in the durable log: VVlsn DLlsn Note: At restart, all volatile memory is reset and must be reconstructed from persistent memory. We must have: PVlsn DLlsn commit_lsn DLlsn
Write Ahead Log (WAL) Protocol Protocol: • Each volatile page has a LSN field naming the log record of the most recent update to the page. • Each update must maintain the page LSN field. • When a page is about to be copied to persistent memory, the copier must first use the log manager to copy all log records up to and including the page’s LSN to durable memory (force them). • Once the force completes, the volatile version of the page can overwrite the persistent version of the page. • The page must be fixed during the writes and during the copies, to guarantee page action consistency. Effect: The log record of a page must be moved to durable memory prior to overwriting the page in persistent memory.
Force-Log-at-Commit Question: What if no pages were copied to persistent memory, and the transaction committed? If the system were to restart immediately, there would be no record of the transaction’s updates, and the transaction could not be undone. Solution: Force-Log-at-Commit rule. Rule: The transaction’s log records must be moved to durable memory as part of commit Implementation: When a transaction commits, the TM writes a commit log record and requests the log manager to flush the log. As a consequence, all the log records prior to the commit record are flushed as well.
Physiological Logging: Summary The RM must observe the following three rules: Fix rule: Cover all page reads and page writes with the page semaphore. Write-ahead log (WAL): Force the page’s log records prior to overwriting its persistent copy. Force-log-at-commit: Force the transaction’s log records as part of commit. Note: many systems use the physiological design.
UNDO: Compensation Log Records Question: what should the page LSN become when an action is undone? If subsequent updates to the page by other transactions have advanced the log sequence number, the LSN should not be set back to its original value. Strategy: the UNDO looks just like a new action that generates a new log record called a compensation log record. • This approach makes page LSNs monotonic, an essential property for write-ahead log. • A transaction that produced n new log records during forward processing will produce n new log records when the transaction is aborted.
Idempotence and Testable • Idempotent operation: If the UNDO or REDO operation can be repeated an arbitrary number of times and still result in the correct state, the separation is idempotent. Example: The operation “move the reactor rods to position 35” is idempotent. The operation “move the reactor rods down 2 cm” is not idempotent. Note: Repeated REDOs can arise from repeated failures.
Idempotence and Testable (Cont’d) • Testable state: If the old and new states can be discriminated by the system, the state is testable. Old State Test Unknown State New State If an operation is not idempotent and the state is not testable, the operation cannot be made atomic.
Idempotence of Physiological REDO • Repeated REDOs can arise from repeated failures during restart. Example: Suppose the following physiological log record were redone many times: <insert op, base filename, page number, record value > If no special care were taken, this repeated REDO would result in many inserts of the record into the page.
Idempotence of Physiological REDO(Cont’d) • The following logic makes physiological REDOs idempotent: idempotent_physiologic_redo (page, logrec) { if (page_lsn < logrec_lsn) redo (page, logrec); } Note: The first successful REDO will advance the page LSN and cause all subsequent REDO of this log record to be null operations.
The Need for the 2-Phase Commit Protocol Cancel key: The client may hit the cancel key at any time during the transaction. Server Logic: A server may require that a certain set of steps be performed in order to make a complete transaction. Example: At commit, many forms-processing systems check the completeness of the data. Integrity check: SQL has the option to defer referential integrity checks to transaction commit. If any integrity checks are violated at commit, the transaction changes cannot be committed, and SQL wants to abort the transaction.
The Need for the 2-Phase Commit Protocol (Cont’d) Field calls: It is possible that field calls cannot acquire the locks or that the predicates become false at the end of the transaction. In such cases, the RM waits to abort the transaction. 2-Phase Commit Protocol: When a transaction is about to commit, each participant in the transaction is given a chance to vote on whether the transaction is a consistent state transformation. If all the RMs vote yes, the transaction can commit. If any vote no, the transaction is aborted.
2-Phase Commit Commit Phase I: • Prepare: Invoke each RM asking for its vote. • Decide: If all vote yes, durably write the transaction commit log record. Note: The commit record write is what makes a transaction atomic and durable. If the system fails prior to that instant, the transaction will be undone at restart; otherwise, phase 2 will be carried forward by the restart logic.
2-Phase Commit Commit (Cont’d) Phase II: • Commit: Invoke each RM telling it the commit decision. Note: The RM can now release locks, deliver real messages, and perform other clean-up tasks. • Complete: When all acknowledge the commit message, write a commit completion record to the log, indicating that phase 2 ended. When the completion message is durable, deallocate the live transaction state. Note: Phase 2 completion record, is used at restart to indicate that the RM have all been informed about the transaction
Performance Advantage of Logging Commit copies no objects only log records to durable memory. • logging converts random write I/Os to sequential write I/Os.
2-Phase Commit Abort • If any RM votes no during the prepare step, or if it does not respond at all, then the transaction cannot commit. • The simplest thing to do in this case is to roll back the transaction by calling Abort_work ( ).
2-Phase Commit Abort (Cont’d) • The logic for Abort_work ( )is as follows: Undo: Read the transaction’s log backwards, issuing UNDO of each record. The RM that wrote the record is invoked to undo the operation. Broadcast: At each savepoint, invoke each RM telling it the transaction is at the savepoint. Abort: Write the transaction abort record to the log (UNDO ofbegin_work( )). Complete: Write a complete record to the log indicating that abort ended. Deallocate the live transaction state.
Transaction Trees • How does a transaction manager first hear about a distributed transaction? There are two cases: • Outgoing case: a local transaction sends a request to another node. • Incoming case: a new transaction request arrives from a remote transaction manager.
Transaction Trees (Cont’d) • The TM involved in a transaction form the transaction tree. TM Root TM (coordinator) performs the original Began_work( ). RM a session RM A participant TM TM TM RM • This TM has one incoming session and two outgoing sessions. • It is both a participant (on the incoming session) and a coordinator (on the outgoing session). a local RM TM TM
Distributed 2-Phase CommitCommit Coordinator The root commit coordinator executes the following logic when a successful commit_work ( ) is invoked on a distributed transaction. Local prepare: Invoke each local RM to prepare for commit. Distributed prepare: Send a prepare request on each of the transaction’s outgoing sessions. Decide: If all RM vote yes and all outgoing sessions respond yes, then durably write the transaction commit log record containing a list of participating RMs and TMs.
Distributed 2-Phase CommitCommit Coordinator (Cont’d) Commit: Invoke each participating RM, telling it the commit decision. Send “commit” message on each of the transaction’s outgoing sessions. Complete: When all local RMs and all outgoing sessions acknowledge commit, write a completion record to the log indicating that phase 2 completed. When the completion record is durable, deallocate the live transaction state.
Distributed 2-Phase CommitCommit Participant • When the prepare message arrives, the participant executes the following logic: Prepare ( ) Local Prepare: Invoke each local RM to prepare for commit Distributed prepare: Send prepare requests on the outgoing sessions. Decide: If all RMs vote yes and all outgoing sessions respond yes, then the local node is almost prepared. Prepared: Durably write the transaction prepare log record containing a list of participating RMs, participating TMs, and the parent TM. Respond: Send yes as response (vote) to the prepare message on the incoming session. Wait: Wait (forever) for a commit message coordinator.