450 likes | 468 Views
Explore the Grid Atomic Commit Protocol for ensuring transaction atomicity in distributed Grid databases. Understand its state diagram, algorithm, failure handling, and impact on ACID properties. Discover challenges and solutions for achieving durability.
E N D
Chapter 12Grid Transaction Atomicity and Durability 12.1 Motivation 12.2 Grid Atomic Commit Protocol (Grid-ACP) 12.3 Handling Failure of Sites with Grid-ACP 12.4 Summary 12.5 Bibliographical Notes 12.6 Exercises
Grid Transaction Atomicity and Durability • ‘A’ and ‘D’ of the ACID properties will be discussed • Two Phase Commit (2PC), 3PC and other variants used for homogeneous and heterogeneous distributed DBMS • These protocols are synchronous and tightly coupled • Need global management layer • This chapter will describe • An Atomic Commitment Protocol (ACP) for Grid environment • Effect of failure on transaction execution D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.1. Motivation Homogeneous Distributed DBMS • 2PC is the most widely accepted ACP • 2PC is a consensus based protocol • It is a blocking protocol • Needs 3n messages for n participants and 3 rounds message exchange to reach a final decision • Coordinator’s broadcast to request vote • Participant’s vote message and • Coordinators broadcast of final decision Heterogeneous / multi-DBMS • Mainly designed for shot-live / non-collaborative transactions • Design philosophy is bottom up D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.1 Motivation (Cont’d) Grid DBMS • Synchronous communication is not practical • Dynamic in nature and hence design should be adaptable • Absence of global management layer presents extra design challenges • Multi-database employs redo, retry and compensate approach to achieve Atomic Commitment. These approaches cannot be implemented in Grid Database. • Grid database will need to access distributed data over the WWW • Traditionally most of the distributed data architecture uses distributed objects for communication, e.g. CORBA, which is not designed to work in WWW D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.2 Grid Atomic Commit Protocol (Grid-ACP) Grid Atomic Commit Protocol (Grid-ACP) • Grid-ACP uses semantic atomicity • Semantic atomicity can be defined as follows D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.2 Grid-ACP (Cont’d) State Diagram of Grid-ACP • Pre-abort state is introduced for originator • Sleep and compensate state are introduced for participating sites • Participants will enter in sleep state after execution and can release all resources • Resources are released with waiting for global decision (Requirement of autonomous sites) D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.2 Grid-ACP (Cont’d) • If the global decision is to abort after the site has entered in sleep state then local site starts the compensating process • If the compensating transaction commits successfully then the sub-transaction is semantically aborted • If global decision is to commit then the originator can directly commit • If global decision is to abort then originator enters pre-abort state • When all sub-transactions have successfully compensated only then the originator can enter abort state D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.2 Grid-ACP (Cont’d) Grid-ACP Algorithm • Step-1) Global transaction is divided into sub-transactions and submitted to participating sites • Step-2) After successful completion, participants enter into sleep state and informs the originator. Necessary logs are recorded • Step-3) if all participants decided to go to ‘sleep’ then originator decides to commit and informs all participants. Even if one participant decided to abort then the decision is to abort and participants who decided to commit are informed to compensate • Step-4a) if the participant is in ‘sleep’ state and global decision is to commit; participants can directly go to commit state • Step-4b) if the participant is in ‘sleep’ state and global decision is to abort; compensating transactions must be executed. Abort is not possible as all resources such as locks are already released D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.2 Grid-ACP (Cont’d) • Originator’s algorithm for Grid-ACP D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.2 Grid-ACP (Cont’d) • Participant’s algorithm for Grid-ACP D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.2 Grid-ACP (Cont’d) Early-Abort Grid-ACP • Step-3 of Grid-ACP can be modified to improve the performance • Originator can decide to abort as soon as it receives first abort rather than waiting for all responses • But with this strategy, abort response need to be sent to all participants • Participant’s algorithm will be same as Grid-ACP • Modified algorithm for the Originator for Grid-ACP is shown here: D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.2 Grid-ACP (Cont’d) Example • Case 1 (Atomicity of single transaction): consider execution of same sub-transactions from previous chapter, equation 11.1 and 11.2 ST12 = r12(O1) r12(O2) w12(O1) C12 ST13 = w13(O3) C13 • Sub-transactions running to successful completion: • Sub-transactions will execute and enter ‘sleep’ state • As both transactions decided to commit, the global decision is to commit • Originator decides to commit and informs participants • Participants directly enter in commit state • Any sub-transaction decides to abort: • Suppose ST13 decides to abort. This information is sent to originator and ST13 unilaterally decides to abort • The originator only need to inform the site 2 where ST12 is executing • As ST12 is in ‘Sleep’ state, it needs to execute the compensating transaction. If the compensating transaction is not successful it is resubmitted till it competes successfully to achieve semantic autonomy D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.2 Grid-ACP (Cont’d) Example • Case2 (Atomicity in presence of multiple transactions): • This scenario is more complicated as other transactions may have read from data written by the ‘sleeping’ transactions • Case of aborting transactions is discussed (if all transactions execute to successful completion then it is similar to case-1) • Consider global transactions from chapter-11: T1 = r1(O1) r1(O2) w1(O3) w1(O1) C1 T2 = r2(O1) r2(O3) w2(O4) w2(O1) C2 • Consider the following history: H1 = r12(O1) r12(O2) w12(O1) S12 r22(O1) w22(O1) (S12 means ST12 is in sleep state) • In the above history, ST12 is waiting for final decision of T2 • Suppose T2 decides to abort, then ST22 will also abort as it has read from ST12 • This may lead to cascading abort, which is undesirable. But at the same time considering the autonomous behavior of grids this is unavoidable • As the preventive measure: (a) either a cap can be implemented to restrict number of cascades or (b) a conflicting global transaction may not be submitted D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.2 Grid-ACP (Cont’d) Discussion • Grid-ACP deals with heterogeneity and autonomy between sites • Due to autonomy, synchronous communication is not possible • Algorithm has to pay a price of releasing resources (e.g. locks) unilaterally; as other transactions may read intermediate values • To avoid cascading aborts any transaction that reads from a ‘sleeping’ transaction should also not commit • Tradition DBMS solves this issue by using global DBMS • Grids cannot use global DBMS • ‘Sleep’ state has two-fold purpose: a) acts as intermediate step before commit; b) can be used to cap the number of cascading aborts • ‘Sleep’ state is defined in the interface and hence do not need any modification in the local transaction manager D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.2 Grid-ACP (Cont’d) Message and Time Complexity Comparison Analysis • Following table summarizes message and time complexity of different ACP D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.2 Grid-ACP (Cont’d) Correctness of Grid-ACP • An ACP ensures that all process reach a decision such that • AC1: All processes that reach a decision reach the same one • AC2: A process cannot reverse its decision after it has reached one • AC3: The commit decision can only be reached if all processes voted ‘yes’ • AC4: If there are no failures and all processes voted ‘yes’, then the decision will be to commit • Condition AC2 is not valid for our discussion as Grid-ACP does not use ‘wait’ state • Lemma 12.1: If one sub-transaction aborts, all participating sub-transactions also abort. • Step-3 ensures that even if one participant decided to abort, the global decision is to abort. The abort decision is conveyed to all participants decided to ‘sleep’ • If there is no communication failure, eventually all participants will receive the abort message and all sub-transactions will abort / semantically abort D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.2 Grid-ACP (Cont’d) • Theorem 12.1: All participating sites reach the same final decision. • Part I (consistent commit): from Step-2 of the algorithm it is clear that participants execute autonomously • if local decision is to commit, the information is stored in local log and the sub-transaction goes to ‘sleep’ state • If the originator receives positive feedback from all sub-transactions, then it decides to commit and final decision is sent to all ‘sleeping’ sub-transactions • Participants do not need to do anything as all resources such as locks etc are already released. Transaction’s state is simply marked as ‘commit’. • Thus, consistent commit state is achieved by all participants • Part I (consistent abort): participants have to do more computation to achieve atomicity in case the global decision is to abort • Participants who decided to abort have already aborted unilaterally • From Lemma 12.1, it is clear that all participants who decided to commit now receive an ‘abort’ decision from the originator • As those participants who are in ‘sleep’ state have already release the locks they cannot be aborted. Hence compensating transactions are constructed using event-condition-action or compensation rules. • Compensating transactions are executed to achieve semantic atomicity (step-4b of the algorithm) D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.2 Grid-ACP (Cont’d) • Compensating transactions must commit successfully to achieve semantic atomicity. If compensating transaction aborts, it must be re-submitted till it commits successfully (Line 2 of participant’s algorithm) • Thus all participant will eventually abort consistently if the global decision is to abort D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.3 Handling Failure of Sites with Grid-ACP • Earlier Grid-ACP was discussed in a failure free environment • Failures are inevitable in real life. Grid-ACP is extended here to handle failures Model for Storing Log Files at the Originator and Participating Sites • Traditional DBMS stores global logs • Grid DBMS cannot store global logs • In absence of global logs, data may become corrupted • Thus, local sites must store logs to recover from failures • Following figure shows the model for storing logs a various sites D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.3 Handling Failure of Sites with Grid-ACP D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.3 Handling Failure of Sites with Grid-ACP • Any site can act as participant or originator simultaneously • But for simplicity the figure distinguishes between the two • Information of active global transactions must be stored at participants’ as well as at the originator’s site • These logs are in addition to local logs • Implemented in the interface without any modification of local transaction managers D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.3 Handling Failure of Sites with Grid-ACP (Cont’d) • Logs Required at the Originator Site • Global transaction active log: When a Global Transaction (GT) is submitted to Grid middleware, it generates globally unique identifier (GTID). Sub-transactions are created and the Global Transaction Active Log at the originator is updated with GTID • Global Transaction ready log: A global sub-transaction decides to commit and it informs the originator. If the sub-transaction is not the last cohort of the global transaction then Global Sub-transaction (GST) ready log is updated • Global Transaction Termination log: If last sub-transaction, along with all others decide to commit, then Global transaction termination log is updated for the respective GTID. If the global decision is to abort then the GT termination log is updated accordingly D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.3 Handling Failure of Sites with Grid-ACP (Cont’d) Logs Required at the Participant Site • Global sub-transaction active log: • As soon as the participant receives the sub-transaction, it becomes active and the sub-transaction is added in the GST Active log. When the transaction executes successfully and enters ‘sleep’ state, GST Active log is updated • Global sub-transaction termination log: • If the GST has to abort, it can do so unilaterally and GST termination log is updated. Otherwise when a global decision is received, then the GST termination log is updated • Participants do not need a ready log; this can be figured out from the combination of GST active and GST termination log • The ‘sleep’ state is updated in the active log, which indicates that the local decision is to commit the sub-transaction, hence the ready state D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.3 Handling Failure of Sites with Grid-ACP (Cont’d) Failure Recovery Algorithm for Grid-ACP • Sites may fail anytime during transaction execution • Various combination is possible: a) participant failure b) originator failure c) combination of two • Recovery procedure can handle failure while GT is executing in different states Participant Recovery Procedure: • Step-1: Restart the participating DBMS. • Step-2: Recover local transactions by using information stored in the log. Local transactions access only single database site. Hence, local transactions can be recovered using centralised database system’s techniques. • Step-3: The participating site then checks in Global sub-transaction active log, whether it is executing sub-transaction of any global transaction. • Step-3A: If the site does not have any cohort of global transactions, then the site can recover independently by using local logs. • Step-3B: If the site is executing local sub-transactions of any global transactions, the originator site of the respective global transaction is informed. The site is then in a global recovery mode and normal access to the site is blocked until the recovery process is completed. D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.3 Handling Failure of Sites with Grid-ACP (Cont’d) • Step 3B Case-I • Participating site failed in running state: The sub-transaction is aborted, the GTID from Global sub-transaction active log is removed, ‘abort’ is appended in the Global sub-transaction termination log and the originator is informed of the decision • Step 3B Case-II • Participating site failed during compensate state: Participant failed after the global decision was received but before the compensation was successfully executed. Hence the sub-transaction must be semantically aborted. After recovery, if the GST termination log contains abort but GTID still exists in GST active log, then the participant knows it failed during compensate state. The compensating transaction is then rerun to completion. After successful compensation, GTID is removed from Globalsub-transaction active log and acknowledgement is sent to the originator. • Step 3B Case-III • Participating site failed during sleep state: The participant in this state may not proceed unless it receives the decision from the originator. If GTID exists in the GST active log and no decision (commit or abort) could be found in the GST termination log regarding that GTID, then the participant knows that it failed during sleep state D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.3 Handling Failure of Sites with Grid-ACP (Cont’d) • Case-III A • The GT termination log at originator contains commit: All other participants decided to commit. The failed participant also decided to commit as it was in ‘Sleep’ state. The originator replies with ‘commit’, the participant recovers and updates the GST termination log and removes GTID from GST active log. • Case-III B • The GT termination log at originator contains abort: the failed participant decided to commit but some other site must have decided to abort hence the final decision was to abort. The originator replies with ‘abort’ and the participant executes the compensating transaction. The GST termination log is then appended with ‘abort’ and the GTID is removed from the GST active log. • Case-III C • The GT is active (i.e. GT termination log has no information on transaction termination): If the GT is active, this implies that the originator is still waiting for the decision of other participants. The originator replies with ‘active’ and the participant can safely recover to the state where it failed i.e. sleep. No new entry in the participant’s log is required. D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.3 Handling Failure of Sites with Grid-ACP (Cont’d) • Case-III D • The GT termination log at originator contains pre-abort: This indicates the global decision to abort has been made and the originator is waiting for acknowledgements. If ‘abort’ is not found in the GST termination log at the participant, then it appends ‘abort’ in GST termination log. The participant should then execute the compensation rules and acknowledge the abortion of sub-transaction and remove the GTID from the GST active log. • The originator makes the final decision as to when all sub-transactions of the global transaction have ready entry in the GST ready log or any of the sub-transactions decide to abort • Step-4:Decision is made depending on the message that the participant receives in step-2 or step-3 from the originator. Participants’ logs are updated accordingly. • Step-5: The participating DBMS regains normal operations and starts accepting external requests. • Step-6: The participant’s recovery process is terminated. D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.3 Handling Failure of Sites with Grid-ACP (Cont’d) • Figure-1: Recovery algorithm of Grid-ACP for participant site D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.3 Handling Failure of Sites with Grid-ACP (Cont’d) Figure-1: Recovery algorithm of Grid-ACP for participant site D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.3 Handling Failure of Sites with Grid-ACP (Cont’d) Description of figure 1: The algorithm first checks whether the site could recover locally, i.e. if no active sub-transactions could be found at the participant (line (1) of Figure 1, step-2 of the recovery procedure). If the participant had any active sub-transaction at the time of failure (line (2)), it checks the state of global sub-transaction. If the sub-transaction executing at the participant is in ‘running’ state (line (3), step-3B (case-I)) the decision is to abort the sub-transaction and the originator is informed. If the sub-transaction was in ‘compensate’ state during failure (line (4), step-3B (case-II)) then the compensating transaction is rerun to completion. If the sub-transaction was in ‘sleep’ state during failure (line (5), step-3B (case-III)) then the participant checks the status of the originator before making any decision. The originator could be in commit (line (6)), abort/ pre-abort ((line (7)) or active state (line (8)). D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.3 Handling Failure of Sites with Grid-ACP (Cont’d) Originator Recovery Procedure: • Step-1: Restart the originator site and restore the values from the log. • Step-2: Determine the status of outstanding sub-transactions executing in multiple participants • Step-2 Case I: The originator is in running state (sub-transaction running at originator is active): If the sub-transaction of the global transaction executing at the originator is active during the failure, the originator decides to abort, informs all participants to abort and appends ‘abort’ in GT termination log. • Step-2 Case II: The originator is in wait state (sub-transaction executing at the originator has successfully executed but waiting for response of other participants), i.e. GTID can be found in GT active log and no entry regarding the GTID in GT termination log. Number of ‘ready’ entries in the GST ready log is also less than the number of sub-transactions. The originator checks the status of participants before taking the final decision • Case II A: If all the participating sub-transactions for the corresponding global transaction are in running state, then the originator allows it to continue normally. • Case II B: If all the participating sub-transactions are in sleep state, then the originator decides to commit the global transaction. If some participants are running and some are in sleep state, then the originator records the information in the GST ready log for sub-transactions in ‘sleep’ state and lets the active sub-transactions complete normal execution. D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.3 Handling Failure of Sites with Grid-ACP (Cont’d) • Case II C: If any of the participating sites are either in ‘abort’ or ‘compensate’ state, then this signifies that the originator failed after the global decision to abort the transaction was made but could not update the log. The GT termination log is updated with ‘pre-abort’. The originator then informs all participants and it waits for acknowledgement from the participants. • Case II D: If the originator does not receive any status information from the participant, then originator assumes that the participating DBMS has failed and it is not operational. The recovery process is then blocked and it waits for the participant to recover. For performance reasons originator may be designed to wait only for pre-decided amount of time, i.e. a timeout period is fixed. The originator starts the abort procedure if the participant does not recover in the specified timeout period. • Step-2 Case III: The originator is in commit state, i.e. ‘commit’ entry found in GT termination log, but the global transaction is still active, i.e. GTID still exists in GT active log. Since the originator decided to commit, this indicates that all sub-transactions executed to successful completion. Hence all sub-transactions can only be in sleep or commit state. • Case III A: If the participant is in sleep state, then the originator instructs the participant about successful completion of the global transaction and updates the originator’s log. The participant then enters the commit state. • Case III B: If the participant is already in commit state then the originator just has to update its log. D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.3 Handling Failure of Sites with Grid-ACP (Cont’d) • After the response is sent to all participants, the GTID is removed from the GT active log. This case is also valid if the ‘commit’ entry is not found in the GT termination log, but the number of ‘ready’ entries in the GST ready log is equal to the number of executing sub-transactions. • Step-2 Case IV: The originator is in pre-abort/abort state, i.e. ‘pre-abort’ or ‘abort’ entry found in the GT termination log. Since the originator decided to abort, this indicates that any of the sub-transactions must have decided to abort. If the originator is in ‘abort’ state, then all participants must be in ‘abort’ state. Since the originator enters the ‘abort’ state only after receiving all acknowledgements. • If the originator is in pre-‘abort’ state, then it is waiting for acknowledgement from some of the participants. Thus the participants can be either in ‘sleep’ or ‘abort’ state. • Case IV A: If the participant is in ‘sleep’ state, it communicates ‘abort’ decision to the originator. The participant then sends an acknowledgement to the originator after successful execution of the compensation procedure. • Case IV B: If the participant is in ‘abort’ state, acknowledgement from the participant is updated in the originator site. When all acknowledgements are received from the participants, the originator moves from ‘pre-abort’ to ‘abort’ state and the GTID is removed from the GT active log. D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.3 Handling Failure of Sites with Grid-ACP (Cont’d) • Step-3: Depending on above-mentioned scenario, responses from all participants are collected. If all participants’ response was to commit, i.e. they are in ‘sleep’ state, then global decision is to commit, which is conveyed to all participants. If any of the participants decided to abort, then the global abort decision is conveyed to all participants. The GT termination log is updated accordingly. • Step-4: The global recovery process terminates. D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.3 Handling Failure of Sites with Grid-ACP (Cont’d) • Figure 2: Recovery algorithm of Grid-ACP for originator site D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.3 Handling Failure of Sites with Grid-ACP (Cont’d) • Figure 2: Recovery algorithm of Grid-ACP for originator site D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.3 Handling Failure of Sites with Grid-ACP (Cont’d) Description of Figure 2: • If GTID is in GT active log and no termination decision is made (line (1)), then the global transaction is active and thus the abort procedure is commenced. If the termination log has a wait entry in the originator site and the transaction is active (line (2)), then the originator must check the status of sub-transactions executing at all participants. All sub-transactions of the global transaction could be active and running (line (3)); some or all of the sub-transactions can be in a sleep state (line (4) and line (5)); any of the sub-transaction can be in an abort or pre-abort state (line (6)); or there can be no reply from the participants (line (7)). If the GTID exists in the originator’s termination log and the state of the global transaction was ‘commit’ (line (8)), then the participants can be either in ‘commit’ or ‘sleep’ state. D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.3 Handling Failure of Sites with Grid-ACP (Cont’d) • If the failure occurred during the pre-abort state of the global transaction (line (9)), then participants can be in ‘sleep’ or ‘abort’ state. If the transaction was active and the termination log had abort entry (line (10)), then the site is recovered to its earlier state. The transaction enters in abort state only after receiving acknowledgements from all participants. This implies that the failure occurred after the global transaction was aborted, but before the GTID was removed from the active log. Thus GTID is removed from the active log after recovery (line (11)). Hence the pre-abort state is important in Grid-ACP. Pre-abort state acts as an intermediary state while the global transaction receives acknowledgement from all sub-transactions D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.3 Handling Failure of Sites with Grid-ACP (Cont’d) Comparison of Recovery Protocols Recovery model in centralised DBMS D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.3 Handling Failure of Sites with Grid-ACP (Cont’d) Recovery model for DBMSs with global recovery manager (distributed DBMS and multidatabase) D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.3 Handling Failure of Sites with Grid-ACP (Cont’d) Recovery model for Grid database architecture D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.3 Handling Failure of Sites with Grid-ACP (Cont’d) Correctness of Recovery Algorithm • recovery protocol is correct if it maintains the consistency of data and resumes the database state before failure • Three possible combinations of failure are: (i) only participant site failure (ii) only originator site failure and (iii) originator and participant failure simultaneously • Transaction Submission Procedure: • The log is updated with a begin operation upon the transaction arrival. The global transaction is then subdivided into multiple logical sub-transactions. A transaction identifier along with sub-transaction identifier is also recorded in the active log. • The sub-transactions are then submitted to the respective database sites. A sub-transaction active log is updated at the participating site of the global transaction. The originator must wait if the sub-transaction cannot be submitted, i.e. if the participant is not operational. • The decision to commit or abort the global transaction is made after a response from all participants has been gathered (similar to 2PC, but the participants do not wait for the global decision). If all participants’ responses were positive, i.e. they are all in ‘sleep’ state, then the originator decides to commit, or else the decision is to abort. The decision is recorded in the global transaction termination log and all sites participating in this global transaction are informed D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.3 Handling Failure of Sites with Grid-ACP (Cont’d) • Correctness • Lemma 12.2: The effect of only the committed transaction is reflected in all databases. Uncommitted data is not reflected either in the participant(s), or in the originator after failure recovery • Correctness of lemma can be proved with three different possibilities: Case I: Only Participant Site Failure, Case II: Only Originator Site Failure and Case III: Originator and Participant Fails simultaneously D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008
12.4 Summary • Atomicity properties in Grid Databases is addressed • Describes the Grid-Atomic Commit Protocol for global transactions in absence of global management layer • Failure recovery procedure is discussed D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008