1 / 16

Recovery in Distributed Systems : Transaction Recovery (see Coulouris et al.)

Recovery in Distributed Systems : Transaction Recovery (see Coulouris et al.). Transaction Recovery-1.

genehansen
Download Presentation

Recovery in Distributed Systems : Transaction Recovery (see Coulouris et al.)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recovery inDistributed Systems: Transaction Recovery (see Coulouris et al.)

  2. Transaction Recovery-1 • Atomic Property of Transactions means that the effect of performing a transaction on behalf of one client is free from interference from concurrent transactions being performed on behalf of other clients • It requires the effects of all committed transactions reflected in data items, but none of the effects of incomplete/aborted transactions are reflected in the data items

  3. Transaction Recovery-2 • Two Aspects to consider • Durability - requires that data items are saved in permanent storage and will be available indefinitely, at the servers, or the sites of storage. • Failure Atomicity - requires that the effects of the transaction are atomic even when the server fails • These two aspects are not completely independent and they can be handled by a so called recovery manager, which is based on a two-phase commit protocol.

  4. Recovery Manager (RM)-1 • Restores the server’s database from Recovery File (RF) after a crash, which needs to be resilient to media failure - stable storage • Reorganizes the RF to improve the performance of recovery • Reclaims storage space in the RF, through the execution of the application

  5. Recovery Manager (RM)-2 • Recovery File (as a log) is used to deal with recovery of a server involved in a distributed transaction. • The RF contains: • Trans Id and the status of the transaction - prepared, committed, aborted • Data items that are part of the transaction and their values • Intentions List for the transaction

  6. Recovery Manager (RM)-3 • RF represents a log containing the history of all the transactions performed • Contains a Checkpoint • Order of entries reflects the order in which transactions have prepared, committed and aborted

  7. Intentions List • Contains a list of data item names and the position in the RF were the values of the data items that are altered by that transaction reside • When a server is prepared to commit a transaction, the RM must save the intentions list in the RF, this ensures the server is able to carry out the commitment later, even if it crashes in the interim • When a transaction is aborted the RM uses the intentions list to delete all the tentative versions of data items made by that transaction

  8. Example-1 • Recovery File (as Log) - fig 15.1 on a Banking Service transactions T and U, Refer fig 12. 6 • In fig 15.1, left of double line is the Checkpoint starting at P0, which represents a snapshot of values A, B, C before transactions T and U started • Server crashes after RM records that U has indicated it is prepared to commit and written the intentions list • In this case, the values of A, B and C must be restored

  9. Example-2 • RM is responsible for restoring the data items so that they include the effects of all the committed transactions and none of the effects of incomplete or aborted transactions • RM starts recovery from End of Log at entry P7 • Concludes that U has not committed and its effects can be ignored • Moves to P4 and concludes T has committed • To recover data items affected by T it moves to entry P3 and finds the intentions list for T • It restores data items A and B from values at P1 and P2 • To restore C it moves to P0 and uses the checkpoint value

  10. Example-3 • Recovery Manager for each transaction with status prepared, adds aborted and completes a new Checkpoint and creates a new RF

  11. Check pointing • The process of writing the current committed values of a server’s data items to a new RF, together with transaction status entries and intentions lists of transactions that have not yet been fully resolved • Its purpose is to reduce number of transactions to be dealt with during recovery and reclaim file space • The failed checkpoint itself must be able to recovered too…

  12. Recovery of Two- Phase Commit Protocol-1 • In a Distributed Transaction, each server (worker or coordinator) keeps its own RF • Recovery Management must be extended to deal with distributed transactions performed using the Two- Phase Commit protocol at a time when a server fails • The RM at coordinator records a coordinator entry - (Trans Id, list of workers) in coordinator’s RF

  13. Recovery of Two- Phase Commit Protocol-2 • RMs use two new transaction status values done and uncertain which can be written to the RF. Both done and uncertain are used when the RF is re-organized • RM of coordinator uses done to indicate two- phase commit is complete • RM of worker uses uncertain to indicate the worker has voted Yes but does not know the outcome • The RM at coordinator records a coordinator entry - (Trans Id, list of workers) in coordinator’s RF • The RM at worker records a worker entry - (Trans Id, coordinator) in worker’s RF

  14. Recovery of Two- Phase Commit Protocol-3 • During Phase 1 - Voting • When coordinator is prepared to commit, its RM writes prepared and a coordinator entry to RF • If worker votes Yes, its RM writes prepared, a worker entry and uncertain to the RF • If worker votes No, its RM writes aborted to the RF

  15. Recovery of Two- Phase Commit Protocol-4 • During Phase 2 - Completion • RM of Coordinator writes either committed or aborted to the RF according to the decision made • RMs of Workers write committed or aborted to their RFs depending on message received from coordinator • RM of Coordinator writes done to RF when coordinator has received a have committed message from all its workers

  16. Recovery of Two- Phase Commit Protocol-5 • Recovery of Two- Phase Commit Protocol • Refer to fig 15. 2, which shows entries in a RF for transaction: T where server is coordinator U where server is worker • Action of the RM after a server restarts after a crash is shown in fig 15.3 • Reorganization of RF When performing Checkpoint: • coordinator entries of transactions without status done are not removed • worker entries with status uncertain are not removed

More Related