330 likes | 458 Views
ICS 214B: Transaction Processing and Distributed Data Management. Lecture 13: 3PC (cont) and Project Part 2 Professor Chen Li. Next: 3PC with communication failures. Simple 3PC is unsafe with communication failures!. P. W. P. W. W. abort. commit. Majority 3PC.
E N D
ICS 214B: Transaction Processing and Distributed Data Management Lecture 13: 3PC (cont) and Project Part 2 Professor Chen Li
Next: 3PC with communication failures Simple 3PC is unsafe with communication failures! P W P W W abort commit Notes 13
Majority 3PC • Coordinator can reach decision only if it can communicate with a majority of processes (including itself) • N processes (including coordinator) • Majority =? Notes 13
Example 1: Coord P2 W P1 P3 W P4 W • N=5, Majority=3 • Since P2, P3, P4 have majority, they know Coord and P1 could not have gone to “C” without at least one of their votes • Therefore, T can be aborted! Notes 13
P P C C C Example 2: P 0 2 W W 1 3 W 4 Notes 13
C Example 2: Go directly to “C”? Problem? P 0 2 W W 1 3 W 4 Notes 13
C Example 2: What if network repartitoned? P 0 2 W W 1 3 W 4 Notes 13
C Example 2: What if network repartitoned? Blocked. P 0 2 W W 1 3 W 4 Notes 13
P P C C C Example 2: Need majority in P state before committing! P 0 2 W W 1 3 W 4 Notes 13
P P C Example 2: Thus: new group can also try to commit P 0 2 W W 1 3 W 4 Notes 13
Summary: Majority rule ensures that any decision will be known to any future group making a decision decision # 1 2 4 1 decision # 2 5 3 Notes 13
REQUEST-TO-PREPARE PRECOMMIT PREPARED PRECOMMIT-ACK COMMIT Participant Coordinator Improvement: “Precommit” versus“Preabort” Notes 13
REQUEST-TO-PREPARE PREABORT NO PREABORT-ACK ABORT Participant Coordinator Improvement: “Precommit” versus“Preabort” Notes 13
Termination Protocol • Process states • Abortable (A) • Uncertain (W) • Precommitted (PC) • Preaborted (PA) • Committed (C) Notes 13
Termination Rules • First rule that matches: • If any state is C, decide to commit • If any state is A, decide to abort • If survivors have majority and • states in {W,PC}, with at least one PC try to commit • states in {W,PA} try to abort • Otherwise block Notes 13
Important note • Recovered nodes can safely participate in majority 3PC • As if a network failure has been corrected • Impossible to distinguish between communication failure and node failure! Notes 13
Simple 3PC vs Majority 3PC • Simple (or Basic) 3PC • Only operational nodes can participate • Any size group can commit/abort (even one node) • After total failure, may have to wait until ALL nodes recover (blocking) • Not tolerant to communication failures • Majority 3PC • Operational and recovered nodes can participate • Need majority to commit/abort (blocking) • Tolerant to communication failures Notes 13
Project Part 2 • A little more “design”, but not much more coding Notes 13
Client Client Client Client interface ResourceManager Part 1: Simple Travel Resource Manager • start(); • queryFlightPrice(); • reserveFlight(); • queryCarPrice(); • reserveCar(); • … • commit(); Resource Manager Flights, Hotels, Cars, Customers Notes 13
Client Client Client Client Workflow Controller Transaction Manager Resource Manager Resource Manager Resource Manager Resource Manager Flights Hotels Cars Customers Part 2: Distributed Travel Reservation System • start(); • queryFlightPrice(); • reserveFlight(); • queryCarPrice(); • reserveCar(); • … • commit(); Notes 13
Overview • WorkflowController: • “front end” to the Client • Forwards calls to either RM or TM • ResourceManager: • Query/reserve (read/write/lock data) • Participants of 2PC • TransactionManager: • Start/commit/abort • Coordinator of 2PC Notes 13
WorkflowController.java (interface) • Only interface Client sees • Do NOT modify • Very similar to RM interface in Part 1 • reserveItinerary(int xid, String custName, List flightNumList, String location, boolean needCar, boolean needRoom); Notes 13
ResourceManager.java (interface) • Need to modify • Called by WC: • Query/reserve • Called by TM: • Prepare/commit/abort Notes 13
TransactionManager.java (interface) • Need to modify • Called by WC: • Start/commit/abort • Called by RM: • Enlist, didTransactionCommit Notes 13
Simplifications • Single site: single TM • Fixed data partitioning: • 4 tables -> 4 RMs • Regular 2-phase commit • Centralized: TM is the coordinator • No cooperative termination: RM contacts only TM at recovery Notes 13
Simplifications • “Atomic” writes: • Update transaction list/log • Swap master pointer • Transaction IDs: No garbage collection • Ok to remember status forever • No reuse (even if committed) • Don’t worry about wrap-around Notes 13
Xid • Committed ID: • Query/reserve, commit, abort: InvalidTransactionException • Aborted ID (user- or focibly): • Query/reserve: TransactionAbortedException (or even success in some cases) • Commit: TransactionAbortedException • Abort: just return Notes 13
When Things Go Wrong • Your responsibility to figure out what to log and when • Abort if: • TM dies before logging “committed” • Any participating RM dies before replying “prepared” Notes 13
Recovery • RM recovery: • Find out status about all prepared transactions; keep trying if TM down • No need to acquire locks • TM recovery: • Not required to tell RMs to commit or abort • No need to save participant list Notes 13
RemoteException (RMI) • Means component has died • WC from RM or TM: • forwards to Client • RM from TM: • Enlist: forwards to WC • didTransactionCommit: keep trying • TM from RM: • Should not forward to WC Notes 13
RMI References • On startup: • RM has reference to TM • WC has reference to both RM & TM • TM only obtains references to RMs through enlist • If referenced component dies and restarts, reference is no longer valid • Need to “Naming.lookup” again • Client calls wc.reconnect() after all components are back up • Which calls RM’s reconnect to tell it to reconnect to TM Notes 13
Testing • Client: wc.start(); wc.reserve…(); … wc.dieTMAfterCommitLog(); // returns true wc.commit(); // RemoteException ”make runtm” wc.reconnect(); wc.start(); Notes 13
Hints • WorkflowController: • No logging needed • Thus, no “recovery” • Implement one roundtrip messages as one method call. Ex. rm.prepare() • Calling: request_prepare • Normal return: prepared • RemoteException: RM died before prepared Notes 13