1 / 9

Distributed Commit

Distributed Commit. Example. Consider a chain of stores and suppose a manager wants to query all the stores, find the inventory of toothbrushes at each, and issue instructions to move toothbrushes from store to store in order to balance the inventory.

bueche
Download Presentation

Distributed Commit

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Commit

  2. Example • Consider a chain of stores and suppose a manager • wants to query all the stores, • find the inventory of toothbrushes at each, and • issue instructions to move toothbrushes from store to store in order to balance the inventory. • The operation is done by a global transaction T that has component Ti at the ith store and a component T0 at the office where the manager is located. • The sequence of activities performed by T is as follows: • Component T0 is created at the site of the manager. • T0 sends messages to all the stores instructing them to create components Ti. • Each Tiexecutes a query at store i to discover the number of toothbrushes in inventory and reports this number to T0. • T0 takes these numbers and determines, by some algorithm, what shipments of toothbrushes are desired. • T0 then sends messages such as "store 10 should ship 500 toothbrushes to store 7" to the appropriate stores. • Stores receiving instructions update their inventory and perform the shipments.

  3. What can go wrong? Example 1 • Suppose a bug in the algorithm to redistribute toothbrushes might cause store 10 to be instructed to ship more toothbrushes than it has. • T10 will abort, and no toothbrushes will be shipped from store 10; • Neither will the inventory at store 10 be changed. • However, T7 detects no problems and commits at store 7, updating its inventory to reflect the supposedly shipped toothbrushes. • Now, not only has T failed to execute atomically (since T10 never completes), but it has left the distributed database in an inconsistent state. Example 2 • Suppose T10 replies to T0'sfirst message by telling its inventory of toothbrushes. • However, the machine at store 10 then crashes, and the instructions from T0are never received by T10. • Can distributed transaction T ever commit? What should T10 do when its site recovers?

  4. Two­Phase Commit Assume a transaction T has parts executing at several sites. How can they agree to commit T? Phase 1 1. Coordinator (site originating the transaction) does: • Log <Prepare T>. • Send prepare T messages to all sites involved in T . 2. Each site receiving this message decides to commit T or abort T , and • Logs either <Ready T> or <Don't commit T>, and • Sends to the coordinator the corresponding message: either ready T or don't commit T.

  5. Phase 2 Phase 2 1. Coordinator decides to commit or abort. Commit if all ready messages received; abort if at least one abort or timeout. • Log <Commit T> or <Abort T>. • Send commit T or abort T messages to all sites involved in T . 2. Sites receiving these messages log them and follow instruction.

  6. Exercise Site 0 is the coordinator. Sites 1 and 2 are the components. Give an example of the sequence of messages that would occur if site 1 wants to commit and site 2 wants to abort. (0,1,P) (0,2,P) (1,0,R) (2,0,D) (0,1,A) (0,2,A)

  7. Recovery • If a site fails with <Commit T> or <Abort T> on its log, it can redo or undo, respectively. • If a site fails with <Don't commit T> as its last T entry, it knows the coordinator could not have signaled commit, so it may abort T and undo. • If a site fails with <Ready T> as its last T entry, it doesn't know what has happened to T; it must ask the coordinator (or any other site).

  8. Recovery When Coordinator Fails Other sites elect a leader and try to figure out what happened. • Some site has <Commit T>on its log. • Then the original coordinator must have wanted to send commit Tmessages everywhere, and it is safe to commit T. • Some site has <Abort T>on its log. • Then the original coordinator must have decided to abort T, and it is safe for the new coordinator to order that action. • No site has <Commit T>or <Abort T>on its log, but at least one site does not have <Ready T>on its log. • Then we know that the old coordinator never received ready Tfrom this site and therefore could not have decided to commit. • So, it’s safe for the new coordinator to decide to abort T.

  9. Recovery When Coordinator Fails 4. There is no <Commit T>or <Abort T>to be found, but every surviving site has <Ready T>. • We can’t be sure whether the old coordinator found some reason to abort Tor not; • it could have decided to do so because of actions at its own site, or • because of a don't commit Tmessage from another failed site. • Or the old coordinator may have decided to commit Tand already committed its local component of T. • Thus, the new coordinator is not able to decide whether to commit or abort Tand must wait until the original coordinator recovers.

More Related