830 likes | 976 Views
CS 347: Parallel and Distributed Data Management Notes06: Reliable Distributed Database Management. Hector Garcia-Molina. Reliable distributed database management. Reliability Failure models Scenarios. Reliability. Correctness Serializability Atomicity Persistence Availability.
E N D
CS 347: Parallel and DistributedData ManagementNotes06: Reliable DistributedDatabase Management Hector Garcia-Molina Notes06
Reliable distributed database management • Reliability • Failure models • Scenarios Notes06
Reliability • Correctness • Serializability • Atomicity • Persistence • Availability Notes06
Types of failures • Processor failures • Halt, delay, restart, bezerk, ... • Storage failures • Volatile, non-volatile, atomic write, transient errors, spontaneous failures • Network failures • Lost message, out-of-order messages, partitions, bounded delay Notes06
More Types of failures • Malevolent failures • Multiple failures • Detectable failures Notes06
Failure models • Cannot protect against everything • Unlikely failures (e.g., flooding in the Sahara) • Expensive to protect failures (e.g., earthquake) • Failures we know how to protect against (e.g., message sequence numbers; stable storage) Notes06
Failure model: Desired Events Expected Undesired Unexpected Notes06
Node models (1) Fail-stop nodes time perfect halted recovery perfect Volatile memory lost Stable storage ok Notes06
Node models (2) Byzantine nodes A Perfect Perfect Arbitrary failure Recovery B C At any given time, at most some fraction f of nodes failed (typically f < 1/2 or f < 1/3) Notes06
Network models (1) Reliable network - in order messages - no spontaneous messages - timeout TD I.e., no lost messages, except for node failures Destination down (not paused) If no ack in TD sec. Notes06
Variation of reliable net: • Persistent messages • If destination down, net will eventually deliver message • Simplifies node recovery, but leads to inefficiencies (hides too much) • Not considered here Notes06
Network models (2) Partitionable network - In order messages - No spontaneous messages - no timeout; nodes can have different view of failures Notes06
Scenarios • Reliable network • Fail-stop nodes • No data replication (1) • Data replication (2) • Partitionable network • Fail-stop nodes (3) Notes06
No Data Replication • Reliable network, fail-stop nodes • Basic idea: node P controls X P net Item X Notes06
No Data Replication • Reliable network, fail-stop nodes • Basic idea: node P controls X P net Item X - Single control point simplifies concurrency control, recovery - Not an availability hit: if P down, X unavailable too! Notes06
“P controls X” means - P does concurrency control for X - P does recovery for X Notes06
Say transaction T wants to access X: req PT is process that represents T at this node PT Local DMBS X Lock mgr LOG Notes06
Process models (A) Cohorts Spawn process Communication Data Access USER T1 T2 T3 Local DMBS Local DMBS Local DMBS Notes06
Process models (B) Transaction servers (manager) USER Trans MGR Trans MGR Trans MGR Local DMBS Local DMBS Local DMBS Notes06
Cohorts: application code responsible for remote access • Transaction manager: “system” handles distribution, remote access Notes06
Distributed commit problem Transaction T . Action: a1,a2 Action: a3 Action: a4,a5 Notes06
Centralized two-phase commit Coordinator Participant I I exec nok go exec* nok abort* exec ok W A W A ok* commit * abort - commit - C C Notes06
Notation: Incoming message Outgoing message ( * = everyone) • When participant enters “W” state: • it must have acquired all resources • it can only abort or commit if so instructed by a coordinator • Coordinator only enters “C” state if all participants are in “W”, i.e., it is certain that all will eventually commit Notes06
Handling node failures • Coordinator and participant logs are used to reconstruct state before failure Notes06
-> Example: after participant fails: Log: T1 X undo/redo info T1 Y info T1 “W” state ... ... Notes06
At recovery: • T1 is in “W” state • Obtain X,Y write locks (no read locks!) • Wait for message from coordinator (or ask coordinator for outcome) Notes06
Other examples: • No “W” record on log abort T1 • See “C” record on log finish T1 Notes06
Next • Add timeouts to cope with messages lost during crashes • Add finish (“F”) state for coordinator – all done, can forget outcome Notes06
I nok abort* _go_ exec* Coordinator W A ok* commit* nok* - C F c-ok* - t=timeout cping=coord. ping Notes06
ping abort ping - _t_ cping _t_ abort* ping commit _t_ cping I nok abort* _go_ exec* Coordinator W A ok* commit* nok* - C F c-ok* - t=timeout cping=coord. ping Notes06
exec nok I Participant exec ok abort nok W A commit c-ok C Notes06
cping done cping , _t - ping cping done “done” message counts as either c-ok or n-ok for coordinator exec nok I Participant exec ok abort nok W A commit c-ok C Notes06
cping done cping , _t - ping equivalent to finish state cping done “done” message counts as either c-ok or n-ok for coordinator exec nok I Participant exec ok abort nok W A commit c-ok C Notes06
Presumed abort protocol • “F” and “A” states combined in coordinator • Saves persistent space (forget quicker) • Presumed commit is analogous Notes06
Presumed abort-coordinator(participant unchanged) I ping abort nok, t abort* _go_ exec* ping - A/F W ok* commit* c-ok* - C ping commit _t_ cping Notes06
T1 start part={a,b} T1 OK from a RCV ... ... Remember: all state transitions must be logged Example: tracking who has sent “OK” msgs Log at coord: • After failure, we know still waiting for OK from node b • Alternative: do not log receipts of “OK”s abort T1 ... Notes06
Example: logging receipt of C-OK messages • If logged, can recover state • If not logged: • resend commit * • participants reply “done” if duplicate Notes06
2PC is blocking Sample scenario: Coord P2 W P1 P3 W P4 W Notes06
coord P2 w P3 P1 w w P4 Case I: P1 “W”; coordinator sent commits P1 “C” Case II: P1 NOK; P1 A P2, P3, P4 (surviving participants) cannot safely abort or commit transaction Notes06
Variants of 2PC ok ok ok • Linear Coord • Hierarchical commit commit commit Notes06
Variants of 2PC • Distributed • Nodes broadcast all messages • Every node knows when to commit Notes06
3PC = non-blocking commit • Assume: failed node is down forever • Key idea: before committing, coordinator tells participants everyone is ok Notes06
3PC I I exec nok _exec_ ok _go_ exec* Coordinator Participant nok abort* W A W A abort - pre ack ok* pre * P P ack** commit* commit - ** means allnon-failed nodes C C Notes06
3PC recovery rules: termination protocol • Survivors try to complete transaction, based on their current states • Goal: • If dead nodes committed or aborted, then survivors should not contradict! • Else, survivors can do as they please... survivors Notes06
survivors • Let {S1,S2,…Sn} be survivor sites • If one or more Si = COMMIT COMMIT T • If one or more Si = ABORT ABORT T • If one or more Si = PREPARE T could not have aborted COMMIT T • If no Si = PREPARE (or COMMIT) T could not have committed ABORT T Notes06
? ? W W P Example: Notes06
? ? W W I Example: Notes06
? ? C P P Example: Notes06
? ? W A P Example: Notes06
Once survivors make decision, they must select new coordinator to continue 3PC P P C C W P C C W P P C Decide to commit Time 1 Time 2 Time 3 Time 4 Notes06