Distributed Transactions in Chain Stores Management

Distributed Commit

Example • Consider a chain of stores and suppose a manager • wants to query all the stores, • find the inventory of toothbrushes at each, and • issue instructions to move toothbrushes from store to store in order to balance the inventory. • The operation is done by a global transaction T that has component Ti at the ith store and a component T0 at the office where the manager is located. • The sequence of activities performed by T is as follows: • Component T0 is created at the site of the manager. • T0 sends messages to all the stores instructing them to create components Ti. • Each Tiexecutes a query at store i to discover the number of toothbrushes in inventory and reports this number to T0. • T0 takes these numbers and determines, by some algorithm, what shipments of toothbrushes are desired. • T0 then sends messages such as "store 10 should ship 500 toothbrushes to store 7" to the appropriate stores. • Stores receiving instructions update their inventory and perform the shipments.

What can go wrong? Example 1 • Suppose a bug in the algorithm to redistribute toothbrushes might cause store 10 to be instructed to ship more toothbrushes than it has. • T10 will abort, and no toothbrushes will be shipped from store 10; • Neither will the inventory at store 10 be changed. • However, T7 detects no problems and commits at store 7, updating its inventory to reflect the supposedly shipped toothbrushes. • Now, not only has T failed to execute atomically (since T10 never completes), but it has left the distributed database in an inconsistent state. Example 2 • Suppose T10 replies to T0'sfirst message by telling its inventory of toothbrushes. • However, the machine at store 10 then crashes, and the instructions from T0are never received by T10. • Can distributed transaction T ever commit? What should T10 do when its site recovers?

TwoPhase Commit Assume a transaction T has parts executing at several sites. How can they agree to commit T? Phase 1 1. Coordinator (site originating the transaction) does: • Log <Prepare T>. • Send prepare T messages to all sites involved in T . 2. Each site receiving this message decides to commit T or abort T , and • Logs either <Ready T> or <Don't commit T>, and • Sends to the coordinator the corresponding message: either ready T or don't commit T.

Phase 2 Phase 2 1. Coordinator decides to commit or abort. Commit if all ready messages received; abort if at least one abort or timeout. • Log <Commit T> or <Abort T>. • Send commit T or abort T messages to all sites involved in T . 2. Sites receiving these messages log them and follow instruction.

Exercise Site 0 is the coordinator. Sites 1 and 2 are the components. Give an example of the sequence of messages that would occur if site 1 wants to commit and site 2 wants to abort. (0,1,P) (0,2,P) (1,0,R) (2,0,D) (0,1,A) (0,2,A)

Recovery • If a site fails with <Commit T> or <Abort T> on its log, it can redo or undo, respectively. • If a site fails with <Don't commit T> as its last T entry, it knows the coordinator could not have signaled commit, so it may abort T and undo. • If a site fails with <Ready T> as its last T entry, it doesn't know what has happened to T; it must ask the coordinator (or any other site).

Recovery When Coordinator Fails Other sites elect a leader and try to figure out what happened. • Some site has <Commit T>on its log. • Then the original coordinator must have wanted to send commit Tmessages everywhere, and it is safe to commit T. • Some site has <Abort T>on its log. • Then the original coordinator must have decided to abort T, and it is safe for the new coordinator to order that action. • No site has <Commit T>or <Abort T>on its log, but at least one site does not have <Ready T>on its log. • Then we know that the old coordinator never received ready Tfrom this site and therefore could not have decided to commit. • So, it’s safe for the new coordinator to decide to abort T.

Recovery When Coordinator Fails 4. There is no <Commit T>or <Abort T>to be found, but every surviving site has <Ready T>. • We can’t be sure whether the old coordinator found some reason to abort Tor not; • it could have decided to do so because of actions at its own site, or • because of a don't commit Tmessage from another failed site. • Or the old coordinator may have decided to commit Tand already committed its local component of T. • Thus, the new coordinator is not able to decide whether to commit or abort Tand must wait until the original coordinator recovers.

Distributed Transactions in Chain Stores Management

Distributed Transactions in Chain Stores Management

Presentation Transcript

Formal Models for Distributed Negotiations Commit Protocols

Paxos Commit

Multi-phase Commit Protocols

Commit Protocols

Two-Phase Commit

Why Not Commit Crime?

commit, communicate, concentrate

SQL*plus commit policy

Commit Algorithms

PRACTICAL DISTRIBUTED COMMIT IN MODERN ENVIRONMENTS

Pray. Commit. Give.

Two phase commit

Commit” ” to Program Reviews

Two-Phase Commit

Two-Phase Commit

Concurrency Control and Reliable Commit Protocol in Distributed Database Systems

Two phase commit

Why People Commit Crime

Commit Algorithms

CS 194: Distributed Systems Distributed Commit, Recovery

Commit-based Flag Commit (CFC)

PRACTICAL DISTRIBUTED COMMIT IN MODERN ENVIRONMENTS