1 / 83

Hector Garcia-Molina

CS 347: Parallel and Distributed Data Management Notes06: Reliable Distributed Database Management. Hector Garcia-Molina. Reliable distributed database management. Reliability Failure models Scenarios. Reliability. Correctness Serializability Atomicity Persistence Availability.

chelsa
Download Presentation

Hector Garcia-Molina

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 347: Parallel and DistributedData ManagementNotes06: Reliable DistributedDatabase Management Hector Garcia-Molina Notes06

  2. Reliable distributed database management • Reliability • Failure models • Scenarios Notes06

  3. Reliability • Correctness • Serializability • Atomicity • Persistence • Availability Notes06

  4. Types of failures • Processor failures • Halt, delay, restart, bezerk, ... • Storage failures • Volatile, non-volatile, atomic write, transient errors, spontaneous failures • Network failures • Lost message, out-of-order messages, partitions, bounded delay Notes06

  5. More Types of failures • Malevolent failures • Multiple failures • Detectable failures Notes06

  6. Failure models • Cannot protect against everything • Unlikely failures (e.g., flooding in the Sahara) • Expensive to protect failures (e.g., earthquake) • Failures we know how to protect against (e.g., message sequence numbers; stable storage) Notes06

  7. Failure model: Desired Events Expected Undesired Unexpected Notes06

  8. Node models (1) Fail-stop nodes time perfect halted recovery perfect Volatile memory lost Stable storage ok Notes06

  9. Node models (2) Byzantine nodes A Perfect Perfect Arbitrary failure Recovery B C At any given time, at most some fraction f of nodes failed (typically f < 1/2 or f < 1/3) Notes06

  10. Network models (1) Reliable network - in order messages - no spontaneous messages - timeout TD I.e., no lost messages, except for node failures Destination down (not paused) If no ack in TD sec. Notes06

  11. Variation of reliable net: • Persistent messages • If destination down, net will eventually deliver message • Simplifies node recovery, but leads to inefficiencies (hides too much) • Not considered here Notes06

  12. Network models (2) Partitionable network - In order messages - No spontaneous messages - no timeout; nodes can have different view of failures Notes06

  13. Scenarios • Reliable network • Fail-stop nodes • No data replication (1) • Data replication (2) • Partitionable network • Fail-stop nodes (3) Notes06

  14. No Data Replication • Reliable network, fail-stop nodes • Basic idea: node P controls X P net Item X Notes06

  15. No Data Replication • Reliable network, fail-stop nodes • Basic idea: node P controls X P net Item X - Single control point simplifies concurrency control, recovery - Not an availability hit: if P down, X unavailable too! Notes06

  16. “P controls X” means - P does concurrency control for X - P does recovery for X Notes06

  17. Say transaction T wants to access X: req PT is process that represents T at this node PT Local DMBS X Lock mgr LOG Notes06

  18. Process models (A) Cohorts Spawn process Communication Data Access USER T1 T2 T3 Local DMBS Local DMBS Local DMBS Notes06

  19. Process models (B) Transaction servers (manager) USER Trans MGR Trans MGR Trans MGR Local DMBS Local DMBS Local DMBS Notes06

  20. Cohorts: application code responsible for remote access • Transaction manager: “system” handles distribution, remote access Notes06

  21. Distributed commit problem Transaction T . Action: a1,a2 Action: a3 Action: a4,a5 Notes06

  22. Centralized two-phase commit Coordinator Participant I I exec nok go exec* nok abort* exec ok W A W A ok* commit * abort - commit - C C Notes06

  23. Notation: Incoming message Outgoing message ( * = everyone) • When participant enters “W” state: • it must have acquired all resources • it can only abort or commit if so instructed by a coordinator • Coordinator only enters “C” state if all participants are in “W”, i.e., it is certain that all will eventually commit Notes06

  24. Handling node failures • Coordinator and participant logs are used to reconstruct state before failure Notes06

  25. -> Example: after participant fails: Log: T1 X undo/redo info T1 Y info T1 “W” state ... ... Notes06

  26.  At recovery: • T1 is in “W” state • Obtain X,Y write locks (no read locks!) • Wait for message from coordinator (or ask coordinator for outcome) Notes06

  27.  Other examples: • No “W” record on log  abort T1 • See “C” record on log  finish T1 Notes06

  28. Next • Add timeouts to cope with messages lost during crashes • Add finish (“F”) state for coordinator – all done, can forget outcome Notes06

  29. I nok abort* _go_ exec* Coordinator W A ok* commit* nok* - C F c-ok* - t=timeout cping=coord. ping Notes06

  30. ping abort ping - _t_ cping _t_ abort* ping commit _t_ cping I nok abort* _go_ exec* Coordinator W A ok* commit* nok* - C F c-ok* - t=timeout cping=coord. ping Notes06

  31. exec nok I Participant exec ok abort nok W A commit c-ok C Notes06

  32. cping done cping , _t - ping cping done “done” message counts as either c-ok or n-ok for coordinator exec nok I Participant exec ok abort nok W A commit c-ok C Notes06

  33. cping done cping , _t - ping equivalent to finish state cping done “done” message counts as either c-ok or n-ok for coordinator exec nok I Participant exec ok abort nok W A commit c-ok C Notes06

  34. Presumed abort protocol • “F” and “A” states combined in coordinator • Saves persistent space (forget quicker) • Presumed commit is analogous Notes06

  35. Presumed abort-coordinator(participant unchanged) I ping abort nok, t abort* _go_ exec* ping - A/F W ok* commit* c-ok* - C ping commit _t_ cping Notes06

  36. T1 start part={a,b} T1 OK from a RCV ... ... Remember: all state transitions must be logged Example: tracking who has sent “OK” msgs Log at coord: • After failure, we know still waiting for OK from node b • Alternative: do not log receipts of “OK”s abort T1 ... Notes06

  37. Example: logging receipt of C-OK messages • If logged, can recover state • If not logged: • resend commit * • participants reply “done” if duplicate Notes06

  38. 2PC is blocking Sample scenario: Coord P2 W P1 P3 W P4 W Notes06

  39. coord P2 w P3 P1 w w P4 Case I: P1 “W”; coordinator sent commits P1 “C” Case II: P1 NOK; P1 A  P2, P3, P4 (surviving participants) cannot safely abort or commit transaction Notes06

  40. Variants of 2PC ok ok ok • Linear Coord • Hierarchical commit commit commit Notes06

  41. Variants of 2PC • Distributed • Nodes broadcast all messages • Every node knows when to commit Notes06

  42. 3PC = non-blocking commit • Assume: failed node is down forever • Key idea: before committing, coordinator tells participants everyone is ok Notes06

  43. 3PC I I exec nok _exec_ ok _go_ exec* Coordinator Participant nok abort* W A W A abort - pre ack ok* pre * P P ack** commit* commit - ** means allnon-failed nodes C C Notes06

  44. 3PC recovery rules: termination protocol • Survivors try to complete transaction, based on their current states • Goal: • If dead nodes committed or aborted, then survivors should not contradict! • Else, survivors can do as they please... survivors Notes06

  45. survivors • Let {S1,S2,…Sn} be survivor sites • If one or more Si = COMMIT  COMMIT T • If one or more Si = ABORT  ABORT T • If one or more Si = PREPARE  T could not have aborted  COMMIT T • If no Si = PREPARE (or COMMIT) T could not have committed  ABORT T Notes06

  46. ? ? W W P Example: Notes06

  47. ? ? W W I Example: Notes06

  48. ? ? C P P Example: Notes06

  49. ? ? W A P Example: Notes06

  50.  Once survivors make decision, they must select new coordinator to continue 3PC P P C C W P C C W P P C Decide to commit Time 1 Time 2 Time 3 Time 4 Notes06

More Related