780 likes | 987 Views
Transaction. All-or-nothing. Principles of Computer System (2012 Fall). Where are we?. System Complexity Modularity & Naming Enforced Modularity Network Fault Tolerance Transaction All-or-nothing Before-or-after. Review. Goal Building reliable system out of unreliable components
E N D
Transaction All-or-nothing Principles of Computer System (2012 Fall)
Where are we? • System Complexity • Modularity & Naming • Enforced Modularity • Network • Fault Tolerance • Transaction • All-or-nothing • Before-or-after
Review • Goal • Building reliable system out of unreliable components • Fault, Error, Failure • MTTF, MBTF, MTTR • Bath curve • RAID
Review • Disk Failure Tolerance • Use replication • Checksum used to detect faults • If cannot read one disk, read data from replica(s) instead • Weaker (but still very effective): error-correcting codes on disk platter • Replication allows masking component failure
All-or-Nothing and Before-or-After Atomicity • An action is atomic • If there is no way for a higher layer to discover the internal structure of its implementation
All-or-Nothing and Before-or-After Atomicity • From the point of view of a procedure that invokes an atomic action • The atomic action always appears either to complete as anticipated, or to do nothing • This consequence is the one that makes atomic actions useful in recovering from failures
All-or-Nothing and Before-or-After Atomicity • From the point of view of a concurrent thread • An atomic action acts as though it occurs either completely before or completely after every other concurrent atomic action • This consequence is the one that makes atomic actions useful for coordinating concurrent threads • Atomicity hides • not just the details of which steps form the atomic action • but the very fact that it has structure
All-or-Nothing and Before-or-After Atomicity • 1. Data abstraction • Hide the internal structure of data • 2. Client/server organization • Hide the internal structure of major subsystems • 3. Atomicity • Hide the internal structure of an action • Enforce industrial-strength modularity • Guarantee absence of unanticipated interactions among components of a complex system • The implementer’s point of view • Painful
Atomic actions’ benevolent side effects • Audit log • atomic actions that run into trouble record • the nature of the detected failure and • the recovery sequence • for later analysis • Data management system when insert a record • rearrange the file into a better physical order • Cache • Garbage collection • They are all hidden from upper levels
Overall system fault tolerance model • 1. Error-free operation: • All work goes according to expectations • The user initiates actions and the system confirms the actions by displaying messages to the user • 2. Tolerated error: • The user who has initiated an action notices that the system failed before it confirmed completion of the action • when the system is operating again, checks to see whether or not it actually performed that action
Overall system fault tolerance model • 3. Un-tolerated error: • The system fails without the user noticing • the user does not realize that he or she should check or retry an action that the system may not have completed
Disk Storage System Fault Tolerance Model • A perfect-disk assumption • a disk never decays and that it has no hard errors • only one thing can go wrong: a system crash at just the wrong time
Disk Storage System Fault Tolerance Model • The fault tolerance model • Error-free operation: • CAREFUL_GET returns the result of the most recent call to CAREFUL_PUT • at sector_number on track, with status = OK. • Detectable error: • The operating system crashes during a CAREFUL_PUT • and corrupts the disk buffer in volatile storage • and CAREFUL_PUT writes corrupted data on one sector of the disk.
ALL_OR_NOTHING_PUT • procedure ALMOST_ALL_OR_NOTHING_PUT (data, all_or_nothing_sector) • CAREFUL_PUT(data, all_or_nothing_sector.S1) • CAREFUL_PUT (data, all_or_nothing_sector.S2) • CAREFUL_PUT (data, all_or_nothing_sector.S3) • procedure ALL_OR_NOTHING_GET (referencedate,all_or_nothing_sector) • CAREFUL_GET (data1, all_or_nothing_sector.S1) • CAREFUL_GET (data2, all_or_nothing_sector.S2) • CAREFUL_GET (data3, all_or_nothing_sector.S3) • if (data1 = data2) data ← data1 • elsedata ← data3
ALL_OR_NOTHING_PUT • procedure ALL_OR_NOTHING_PUT (data, all_or_nothing_sector) • CHECK_AND_REPAIR (all_or_nothing_sector) • ALMOST_ALL_OR_NOTHING_PUT (data, all_or_nothing_sector) • procedure CHECK_AND_REPAIR (all_or_nothing_sector) // Ensure copies match • CAREFUL_GET (data1, all_or_nothing_sector.S1) • CAREFUL_GET (data2, all_or_nothing_sector.S2) • CAREFUL_GET (data3, all_or_nothing_sector.S3)
ALL_OR_NOTHING_PUT • if (data1 = data2) and (data2 = data3) return// State 1 or 7, no repair • if (data1 = data2) • CAREFUL_PUT (data1, all_or_nothing_sector.S3) return// State 5 or 6. • if (data2 = data3) • CAREFUL_PUT (data2, all_or_nothing_sector.S1) return// State 2 or 3. • CAREFUL_PUT (data1, all_or_nothing_sector.S2) // State 4, go to state 5 • CAREFUL_PUT (data1, all_or_nothing_sector.S3 // State 5, go to state 7
Commit • Pre-commit • identify all the resources needed • establish their availability • maintain the ability to abort at any instant • shared resources, once reserved, cannot be released until the commit point is passed • should not do anything externally visible • Post-commit • release reserved resources that are no longer needed • perform externally visible actions • cannot try to acquire additional resources • Q: where’s commit in ALL_OR_NOTHING_PUT?
Shadow Copy • Pre-commit: • Create a complete duplicate working copy of the file that is to be modified • make all changes to the working copy • Commit point: • Carefully exchange the working copy with the original • Typically this step is bootstrapped • Post-commit: • Release the space that was occupied by the original The golden rule of atomicity: Never modify the only copy!
Bank account transfer xfer(bank, a, b, amt): bank[a] = bank[a] – amt bank[b] = bank[b] + amt
Bank account transfer xfer(bank, a, b, 50): xfer(bank, a, b, amt): bank[a] = bank[a] – amt bank[b] = bank[b] + amt <- a=100, b=100 <- a=50, b=100 <- a=50, b=150
Bank account transfer xfer(bank, a, b, amt): bank[a] = bank[a] – amt bank[b] = bank[b] + amt audit(bank): sum = 0 for acct in bank: sum = sum + bank[acct] return sum
Bank account transfer audit(bank): xfer(bank, a, b, amt): bank[a] = bank[a] – amt bank[b] = bank[b] + amt audit(bank): sum = 0 for acct in bank: sum = sum + bank[acct] return sum <- sum=200 <- sum=150 <- sum=200
Eventual goal: transactions xfer(bank, a, b, amt): begin bank[a] = bank[a] – amt bank[b] = bank[b] + amt commit audit(bank): begin sum = 0 for acct in bank: sum = sum + bank[acct] commit return sum
Strawman implementation xfer(bank, a, b, amt): bank[a] = read_accounts(bankfile) bank[a] = bank[a] – amt bank[b] = bank[b] + amt write_accounts(bankfile)
Shadow copy xfer(bank, a, b, amt): bank[a] = read_accounts(bankfile) bank[a] = bank[a] – amt bank[b] = bank[b] + amt write_accounts(#bankfile) rename(“#bankfile”, bankfile)
Rename • Rename • unlink(bankname) • link("newfile", bankname) • unlink("newfile")
File system data structures • directory data blocks: • filename “bank” → inode 12 • filename “#bank” → inode 13 • inode 12: • data blocks: 3, 4, 5 • refcount: 1 • inode 13: • data blocks: 6, 7, 8 • refcount: 1
Rename • What needs to happen during rename? • Point "bank" directory entry at inode 13 • Remove "newfile" directory entry • Remove refcount on inode 12 • First try at rename(x, y): • y's dirent gets x's inode # • decref(y's original inode) • remove x's dirent • Problems • What happens if we crash after modifying y's dirent? • Two names point to inode 13, refcount is 1. • What if we increment refcount before, and decrement it afterwards?
Increase ref-count before rename(x, y): newino= lookup(x) oldino= lookup(y) incref(newino) change y's dirent to newino decref(oldino) remove x's dirent decref(newino)
rename(“#bank”, “bank”) • directory data blocks: • filename “bank” → inode 12 • filename “#bank” → inode 13 • inode 12: • data blocks: 3, 4, 5 • refcount: 1 • inode 13: • data blocks: 6, 7, 8 • refcount: 2
rename(“#bank”, “bank”) • directory data blocks: • filename “bank” → inode 13 • filename “#bank” → inode 13 • inode 12: • data blocks: 3, 4, 5 • refcount: 1 • inode 13: • data blocks: 6, 7, 8 • refcount: 2
rename(“#bank”, “bank”) • directory data blocks: • filename “bank” → inode 13 • filename “#bank” → inode 13 • inode 12: • data blocks: 3, 4, 5 • refcount: 0 • inode 13: • data blocks: 6, 7, 8 • refcount: 2
rename(“#bank”, “bank”) • directory data blocks: • filename “bank” → inode 13 • filename “#bank” → inode 13 • inode 12: • data blocks: 3, 4, 5 • refcount: 0 • inode 13: • data blocks: 6, 7, 8 • refcount: 2
rename(“#bank”, “bank”) • directory data blocks: • filename “bank” → inode 13 • filename “#bank” → inode 13 • inode 12: • data blocks: 3, 4, 5 • refcount: 0 • inode 13: • data blocks: 6, 7, 8 • refcount: 1
Recovery after crash salvage(disk): for inode in disk.inodes: inode.refcnt= find_all_refs(disk.root_dir, inode) if exists(“#bank”): unlink(“#bank”)
Shadow copy • Write to a copy of data, atomically switch to new copy • Switching can be done with one all-or-nothing operation (sector write) • Requires a small amount of all-or-nothing atomicity from lower layer (disk) • Main rule: only make one write to current/live copy of data • In our example, sector write for rename. • Creates a well-defined commit point.
Shadow copy • Does the shadow copy approach work in general? • +: Works well for a single file. • -: Hard to generalize to multiple files or directories. • Might have to place all files in a single directory, or rename subdirs. • -: Requires copying the entire file for any (small) change. • -: Only one operation can happen at a time. • -: Only works for operations that happen on a single computer, single disk.
Summary • Not always possible to use replication to avoid failures. • To recover from failure, need to reason about what happened to the system • Atomicity makes it much easier to reason about possible system states: • All-or-nothing atomicity • Before-or-after atomicity • Shadow copy: one approach to all-or-nothing atomicity • Make all modifications to a shadow copy of data. • Replace old copy with new copy with an all-or-nothing write to one sector. • -> commit point • Rule: never modify live data (unless one sector write is all you need)
Summary • Two kinds of atomicity: all-or-nothing, before-or-after • Shadow copy can provide all-or-nothing atomicity • Golden rule of atomicity: never modify the only copy! • Typical way to achieve all-or-nothing atomicity. • Works because you can fall back to the old copy in case of failure. • Software can also use all-or-nothing atomicity to abort in case of error. • Drawbacks of shadow file approach: • only works for single file (maybe fixable with shadow dirs) • copy the entire file for every all-or-nothing action (harder to avoid) • Still, shadow copy is a simple and effective design when it suffices. • Many Unix applications (e.g., text editors) use it, owing to rename.
All-or-nothing: Logging • Amore general techniques for achieving all-or-nothing atomicity • Idea: keep a log of all changes, and whether each change commits or aborts • We will start out with a simple scheme that's all-or-nothing but slow • Then, we will optimize its performance while preserving atomicity
Journal Storage • Astore operation • Not overwrites old data • create a new, tentative version of the data • remains invisible to any reader outside this all-or-nothing action • until the action commits
Journal Storage • procedure NEW_ACTION () • id ← NEW_OUTCOME_RECORD () • id.outcome_record.state ← PENDING • return id • procedure COMMIT (referenceid) • id.outcome_record.state ← COMMITTED • procedure ABORT (referenceid) • id.outcome_record.state ← ABORTED
Journal Storage • procedure READ_CURRENT_VALUE (data_id, caller_id) • starting at end of data_id repeat until beginning • v ← previous version of data_id // Get next older version • a ← v.action_id // Identify the action a that created it • s ← a.outcome_record.state // Check action a’s outcome record • if s = COMMITTED then • return v.value • else skip v // Continue backward search • signal (“Tried to read an uninitialized variable!”)
Journal Storage • procedure WRITE_NEW_VALUE (referencedata_id, new_value, caller_id) • if caller_id.outcome_record.state = PENDING • append new version v to data_id • v.value ← new_value • v.action_id ← caller_id • else signal (“Tried to write outside of an all-or-nothing action!”)
Journal Storage • procedure TRANSFER (referencedebit_account, referencecredit_account, amount) • my_id ← NEW_ACTION () • xvalue ← READ_CURRENT_VALUE (debit_account, my_id) • xvalue ← xvalue - amount • WRITE_NEW_VALUE (debit_account, xvalue, my_id) • yvalue ← READ_CURRENT_VALUE (credit_account, my_id) • yvalue ← yvalue + amount • WRITE_NEW_VALUE (credit_account, yvalue, my_id) • if xvalue > 0 then • COMMIT (my_id) • else • ABORT (my_id) • signal (“Negative transfers are not allowed.”)