480 likes | 497 Views
Explore the extension of all-or-nothing atomicity to transactions on multiple servers in a distributed system. Learn about the necessity of commit protocols and guidelines for coordinated transaction recovery in case of failures.
E N D
Goal • All_or_nothing semantics of atomicity must be extended to a transaction’s updates on multiple servers • Distributed system can fail partially • The solution is to set up a special handshake protocol between servers from a family of distributed commit protocols • The protocol implies that there are certain circumstances under which a failed server must communicate to other servers during its restart to find out the systemwide decisionabout the termination status of in-doubt transactions (uncertain transactions)
Goal • Servers are no longer autonomous in that they can always independently recover „in splendid isolation” • Commit protocols make very few assumptions about how the various servers implement their local recovery • The requirement there is that the rules of the commit protocol are followed by all parties of a distributed system • The 2PC protocol has been standardized and is generally accepted in the software industry
Distributed Commit • The problem: How can we ensure atomicity of a distributed transaction? Example: Consider a chain of stores belonging to a supermarket. Suppose a manager of the chain wants to query all the stores, find the inventory of pants at each, and issue instructions to move pants from store to store in order to balance the inventory. The whole operation is done by a single global transaction T that has component Ti at each ith store and component To at the office where the manager is located
Distributed Commit The sequence of activity performed by T are summarized below: • Component To is created at the site of the manager • To sends messages to all the stores instructing them to create components Ti • Each Ti executes a query at store i to discover the number of pants in inventory and reports this number to To • To takes these numbers and determines what shipments of pants are desired. To sends messages such as “store 10 should ship 100 pants to store 7” to the appropriate stores • Stores receiving instructions update their inventory and perform the shipments
Failed Coordination of a Distributed Transaction(1) Committed T1 Active Aborted Crash/ recover T2 Committed Active Aborted Both T1 and T2 are ready to commit Crash/ recover
Failed Coordination of a Distributed Transaction (2) Committed T1 T1 committed; T2: crashes and recovers Active Aborted Crash/ recover T2 Committed What we need is some new state that has more flexibility than this simple state diagram Active Aborted Crash/ recover
Transaction State Diagram with Durable Prepared State Committed Active Precommitted Aborted Crash/ recover
Precommitted state • Precommitted: a transaction becomes precommitted as a result of a request by a distributed transaction coordinator (TM). From a precommitted state, the transaction can enter either committed state or aborted state. In the event of a system crash and subsequent recovery, a transaction in a precommitted state returns to precommitted state.
Two-Phase Commit Protocol • We assume that the atomicity mechanisms at each site assure that either the local component commits or it has no effect on the database state at that site – i.e. local components of a global transaction are atomic • By enforcing the rule that either all components of a distributed transaction commit or none does, we make the distributed transaction itself atomic
Two-Phase Commit Protocol • Some assumptions about the 2PC protocol: • Each site logs actions at that site, but there is no global log • One site, coordinator, plays a special role in deciding whether or not the distributed transaction can commit • 2PC protocol involves sending certain messages between the coordinator and the other sites. As the message is sent, it is logged at the sending site, to aid in recovery if should it be necessary
Two-Phase Commit Protocol – Phase 1 • The coordinator places a log record <Prepare T> on the log at its site • The coordinator sends to each component’s site the message <prepare T> • Each site receiving the message <prepare T> decides whether to commit or abort its component of T. The site can delay if the component has not yet completed its activity, but must eventually send a response • If a site wants to commit its component of T, it must enter a state called precommitted. Once in the precommitted state, the site cannot abort its component of T without a directive to do so from the coordinator.
Two-Phase Commit Protocol – Phase 1 The following steps are done to become precommitted: • Perform whatever steps are necessary to be sure the local component of T will not have to abort, even if there is a system failure followed by recovery at the site. So, all appropriate actions regarding the log must be taken so that T will be redone rather than undone in a recovery • Place the record <Ready T> on the local log and flush the log to disk • Send to the coordinator the message <ready T> • If a site wants to abort its component of T, then it logs the record <Don’t commit T> and sends the message <don’t commit T> to the coordinator. It is safe to abort the component at this time, since T will surely abort even if only one component wants to abort
Two-Phase Commit Protocol – Phase 2 • If the coordinator has received <ready T> from all components of T, then it decides to commit T. The coordinator: • Logs <Commit T> at its site, and • Sends message <commit T> to all sites involved in T • If the coordinator has received <don’t commit T> from one or more sites, then it decides to abort T. The coordinator: • Logs <Abort T> at its site, and • Sends message <abort T> to all sites involved in T
Two-Phase Commit Protocol – Phase 2 • If a site receives a <commit T> message, it commits the component of T at that site, logging <Commit T> as it does • If a site receives the message <abort T>, it aborts the component of T at that site, and writes the log record <Abort T>
Failure model • Message losses: a message does nor arrive at the destination process because of a network failure • Message duplications: some network component may end up duplicating a message • Transient process failures: one or more of the involved processes, participants or coordinators, exhibit a soft crash and need to be restarted, but without any damage to data on secondary storage • Idea: distribute the responsibilities for handling various failure classes among transactional federation and the underlying communication system
Failure model • 2PC does not make any assumptions about the underlying communication system (datagrams as well as session oriented protocols) • Omission failures versus commission failures – the later ones would lead to the class of distributed consensus protocols known as Byzantine agreement • We take into account only omission failures
Prepared log entry for in-doubt transactions • Participant that replies „yes” to the coordinator’s poll in the first message round can then crashes. • The participant can no longer perform local crash recovery as if there were no distributed transactions at all • The restarted participant needs to check back with the coordinator first before it can decide to consider the transaction commited. • Thus, every participant that votes „yes” must actually be prepared to go either way – redo or undo teh updates.
Two Phase Commit (cd) There are three places in 2PC where a process is waiting for a message: • the participant is waiting for the <prepare T> message from the coordinator • the coordinator is waiting for the <ready T> message from all the participants • the participant is waiting for the <commit T> or <abort T> message from the coordinator
Two Phase Commit (cd) Timeout Actions: • ad.1.unilateral abort after time_out (participant) • ad.2.unilateral abort after time_out (coordinator) • ad.3.the site can be blocked
Restart and Termination Protocols • 2PC is robust with respect to failures in the sense that each failed and restarted process resumes its work in the last remembered state • 2PC is not robust with respect to failures in the sense that all processes guarantee active progress toward a global commit or rollback • Two extensions of the basic 2PC • A restart protocol – specifies how failed and restarted protocol should proceed • A termination protocol – specifies how a process should behave upon a timeout while it is waiting for some message
Restart and Termination Protocols • As all participants follow the same protocol, we have to specify four cases: • The coordinator restart protocol – the continuation of the coordinator’s protocol after a coordinator restart • The coordinator termination protocol – the coordinator behavior upon timeout • The participant restart protocol – the continuation of the participant’s protocol after a participant restart • The participant termination protocol – the participant behavior upon timeout
Termination protocol for 2PC • The simplest termination protocol is the following: The participant P remains blocked until it can re-establish communication with the coordinator. Then the coordinator can tell P the appropriate decision Drawback: P may be unnecessarily blocked for a long time • Participants can exchange messages directly (without the mediation of the coordinator) • We assume that coordinator attaches the list of the participant’s identities to the <prepare T> message it sends to each of them • The fact that participants do not know each other before receiving the message is of no consequence, since a participant may unilaterally decide to abort transaction
Termination protocol for 2PC • The cooperative termination protocol: A participant P (initiator) that times out while in its uncertainty period sends a <decision request> message to every other process Q (responder) to inquire whether Q either knows the decision or can unilaterally reach one • Q has already decided Commit (or Abort) – Q sends a commit or abort to P, and P decides accordingly • Q has not voted yet – Q can unilaterally decide Abort. It then sends an <abort T> to P, and P decides Abort • Q has voted Yes but has not yet reached a decision
2PC with restart and termination protocols • Statechart for 2PC with restart and termination protocols: • F transitions are triggered during restart after a process failure. Once the process’s last state is determined from the log entries on the process’s stable log, the transition is made without any further preconditions • T transitions are triggered upon timeout and are also made without further preconditions
2PC statechart - coordinator T or F initial T or F prepare_to_commit variant collecting commit abort abort commit T or F Ack T or F C-pending A-pending forgotten
2PC statechart - participant T or F initial Prepared/ no Prepared/ yes T or F prepared commit/ ack abort/ ack committed aborted Commit/ack Abort/ack
Communication Topologies for 2PC • The communication topology of a protocol is the specification of who sends messages to whom Centralized 2PC Coordinator Participants
Communication Topologies for 2PC Decentralized 2PC (reduce time complexity) Linear 2PC (reduce message complexity)
Three Phase Commit (3PC) • The protocol is non-blocking in the absence of total site failures • The protocol may cause inconsistent decisions to be reached in the event of communication failures We assume that only site failures occur • All operational sites can communicate with each other • Process that times out waiting for a message from process q knows that q is down and therefore that no processing can be taking place there (no other process can be communicating with q)
Three Phase Commit (3PC) • Which of the processes involved in a transaction takes the role of the coordinator ? • The choice of the coordinator depends on: • Transaction initiator client PC, application server, Web server, database server, ORB • Reliability and speed of participants – how many participants does the transaction involve? • Communication topology and protocol • The simplest way – transaction initiator is chosen as the coordinator • If the initiator is a client, it often makes more sense to choose a data server as a coordinator
Three Phase Commit (3PC) Nonblocking property: If any operational process is uncertain then no process (whether operational or failed) can have decided to Commit If the operational sites discover they are all uncertain, they can decide Abort, safe in their knowledge that the failed processes had not decided Commit. When the failed processes recover they can be told to decide Abort. This way blocking is prevented • Does 2PC obeys NB property? NO
Three Phase Commit (3PC) • The coordinator sends a <prepare T> message to all participants • When a participant receives a <prepare T> message, it responds with YES or NO message, depending on its vote. If a participant sends NO, it decides Abort and stops • The coordinator collects the vote messages from all participants. If any of them was NO or if the coordinator voted No, then the coordinator decides Abort, sends <abort T> to all participants that voted YES, and stops. Otherwise, the coordinator sends <pre-commit T> messages to all participants
Three Phase Commit (3PC) • A participant that voted Yes waits for <pre-commit T> or <abort T> message from the coordinator. If it receives an <abort T> , the participant decides Abort and stops. If it receives a <pre-commit T> , it responds with an ACK message to the coordinator • The coordinator collects the ACKs. When they have all been received, it decides Commit, sends <commit T> to all participants, and stops • A participant waits for a <commit T> message from the coordinator. When it receives that message, it decides Commit and stops
3PC – timeout actions What a process should do when it times out depends on the message it is was waiting for: • In step (2) participants wait for <prepare T> message • In step (3) the coordinator waits for the votes • In step (4) participants wait for a <pre-commit T> or <abort T> message • In step (5) the coordinator waits for ACKs • In step (6) participants wait for <commit T> message
3PC – timeout actions • Cases (1) and (2) are handled exactly as in 2PC • In case (4) the coordinator may decide Commit while some failed participant is uncertain • In case (3) the participant must communicate with other processes to reach a consistent decision • In case (5) the participant must communicate with other processes to reach a consistent decision. Why can participant ignore the timeout and simply decide commit? The participant could violates condition NB (the coordinator failed after sending pre-commit to p but before sending it to q
Termination protocol for 3PC • The state of the process relative to the message it has sent or received: • Aborted: the process has not voted No, has voted No, or has received <abort T> • Uncertain: the process is in its uncertain period • Committable: the process has received a <pre-commit T>, but has not yet received a <commit T> • Committed: the process has received a <commit T> • When a participant times out in cases (3) or (5) it initiates an election protocol • The election protocol involves all processes that are operational and results in the “election” of a new coordinator (the old one must have failed, otherwise no participant would have timed out!)
Termination protocol for 3PC The coordinator sends a <state-req> message to all participants that participated in the election. Each participant responds to this message by sending its state. The coordinator collects these states and proceeds according to the following termination protocol: TR1: If some process is Aborted, the coordinator decides Abort, sends <abort T> messages to all participants, and stops. TR2: If some process is Committed, the coordinator decides Commit, sends <commit T> messages to all participants, and stops.
Termination protocol for 3PC TR3: If all processes that reported their state are Uncertain, the coordinator decides Abort, sends <abort T> messages to all participants, and stops. TR4: If some process is Committable but none is Committed, the coordinator first sends <pre-commit T> messages to all processes that reported Uncertain, and waits for acknowledgements from these processes. After having received these acknowledgements the coordinator decides Commit, sends <commit T> messages to all participants, and stops. A participant that receives a <commit T> (or <abort T>) message, decides Commit (or Abort), and stops
Correctness of 3PC and its Termination Protocol Theorem: In the absence of total failures, 3PC and its termination protocol never cause processes to block Theorem: Under 3PC and its termination protocol, all operational processes reach the same decision
Tree 2PC algorithm • Which of the processes involved in a transaction takes the role of the coordinator ? • LAN vs WAN • Everybody can talk to everybody • The initiator does not know all participants (and probably cannot communicate with them directly and efficiently) • Observation: • During the execution of a transaction, the involved processes dynamically form a tree with the transaction initiator as the root; each edge in the tree corresponds to a dynamically established communication link
Tree 2PC algorithm • In the hierarchical commit protocol the message flow and writing of log entries follows from the two roles of an intermediate node, participant with regard to its caller and coordinator for its subtree Process 0 initiator Process 3 Process 1 Process 4 Process 5 Process 2 Communication during commit protocol
initiator Process 1 Process 2 Process 3 Process 4 Process 5 prepare prepare prepare prepare prepare Yes Yes Yes Yes Yes commit commit ..........
Optimized Algorithms for Distributed Commit What we can do: • Reducing the number of messages and the number of log writes • Shortening the „critical path” from the begin of the commit protocol to the point when local locks can be released • Reducing the probability of blocking
Optimized Algorithms for Distributed Commit • The number of messages and forced log writes can be reduced by introducing specific conventions for the presumed behavior of a process (i.e. default reaction) in the absence of more explicit information e.g. If a participant did not force a commit or abort log entry and thus could lose this info in a local crash, this participant could by default contact the coordinator to obtain missing information (but coordinator has already truncated its log and forgotten the transaction)
Optimized Algorithms for Distributed Commit • Basic 2PC: • 4n messages; 2N+2 forced log entries (coordinator - begin and commit or rollback log entry) • When a certain piece of information about transaction’s behavior is missing – two variations of 2PC • Presumed abort (PA) • Presumed commit (PC) • Presumed abort 2PC has been considered as the method of choice and has been selected for the industry standard XA
Heterogeneous Commit Coordinators • As networks grow together, applications increasingly must access servers and data residing on heterogeneous systems, performing transactions that involve multiple nodes of a network. These nodes have different TP monitors and different commit protocols • If each transaction manager supports a standard and open commit protocol, then portability and interoperability among them are relatively easy to achieve • There are two standard commit protocols and application programming interfaces: • LU6.2 (IBM) - embodied by CICS and CICS API • OSI-TP - combined with X/Open DTP
Closed versus Open Transaction Managers • Some transaction managers have an open commit protocol: resource managers can participate in the commit decision, and the commit message formats and protocols are public CICS, Decdtm, TOPEND (NCR), Transarc TM • Some transaction managers have a closed commit protocol: resource managers cannot participate in the commit decision. The term is used to describe TP systems that have only private protocols and therefore cannot cooperate with other transaction processing systems IBM’s IMS, Tandem’s TMF