480 likes | 623 Views
CSC 536 Lecture 10. Outline. Recovery Case study Google Spanner. Recovery. Recovery. Error recovery: replace a a present erroneous state with an error-free state Backward recovery: bring system into a previously correct state
E N D
Outline Recovery Case study • Google Spanner
Recovery • Error recovery: replace a a present erroneous state with an error-free state • Backward recovery: bring system into a previously correct state • Need to record the system's state from time to time (checkpoints) • Example: • Forward recovery: bring system to a correct new state from which it can continue to execute • Only works with known errors • Example:
Recovery • Error recovery: replace a a present erroneous state with an error-free state • Backward recovery: bring system into a previously correct state • Need to record the system's state from time to time (checkpoints) • Example: retransmit message • Forward recovery: bring system to a correct new state from which it can continue to execute • Only works with known errors • Example: erasure correction
Backward recovery • Backward recovery is typically used • It is more general • However • Recovery is expensive • Sometimes we can't go back (e.g. a file is deleted) • Checkpoints are expensive • Solution for the last point: message logging • Sender-based • Receiver-based
Checkpoints: Common approach • Periodically make a “big” checkpoint • Then, more frequently, make an incremental addition to it • For example: the checkpoint could be copies of some files or of a database • Looking ahead, the incremental data could be “operations” run on the database since the last transaction finished (committed)
Problems with checkpoints p request reply q • P and Q are interacting • Each makes independent checkpoints now and then
Problems with checkpoints p request reply q • Q crashes and rolls back to checkpoint
Problems with checkpoints p request q • Q crashes and rolls back to checkpoint • It will have “forgotten” message from P
Problems with checkpoints p request reply q • … Yet Q may even have replied. • Who would care? Suppose reply was “OK to release the cash. Account has been debited”
Two related concerns • First, Q needs to see that request again, so that it will reenter the state in which it sent the reply • Need to regenerate the input request • But if Q is non-deterministic, it might not repeat those actions even with identical input • So that might not be “enough”
Rollback can leave inconsistency! • In this example, we see that checkpoints must somehow be coordinated with communication • If we allow programs to communicate and don’t coordinate checkpoints with message passing, system state becomes inconsistent even if individual processes are otherwise healthy
More problems with checkpoints p request reply q • P crashes and rolls back
More problems with checkpoints p request reply q • P crashes and rolls back • Will P “reissue” the same request? Recall our non-determinism assumption: it might not!
Solution? • One idea: if a process rolls back, roll others back to a consistent state • If a message was sent after checkpoint, … • If a message was received after checkpoint, … • Assumes channels will be “empty” after doing this
Solution? • One idea: if a process rolls back, roll others back to a consistent state • If a message was sent after checkpoint, roll receiver back to a state before that message was received • If a message was received after checkpoint roll the sender back to a state prior to sending it • Assumes channels will be “empty” after doing this
Solution? p request reply q • Q crashes and rolls back
Solutions? q rolled back to a state before this was received, or reply was sent p request reply q • Q crashes and rolls back
Solution? p q • P must also roll back • Now it won’t upset us if P happens not to resend the same request
Implementation • Implementing independent checkpointing requires that dependencies are recorded so processes can jointly roll back to a consistent global state • Let CPi(m) be the m-th checkpoint taken by process Pi and let INTi(m) denote the interval between CPi(m-1) and CPi(m) • When Pi sends a message in interval INTi(m) • Pi attaches to it the pair (i,m) • When Pj receives a message with attachment (i,m) in interval INTj(n) • Pj records the dependency INTi(m) INTj(n) • When Pj takes checkpoint CPj(n), it logs this dependency as well • When Pi rolls back to checkpoint CPi(m-1): • we need to ensure that all processes that have received messages from Pi sent in interval INTi(m) are rolled back to a checkpoint preceding the receipt of such messages…
Implementation • Implementing independent checkpointing requires that dependencies are recorded so processes can jointly roll back to a consistent global state • Let CPi(m) be the m-th checkpoint taken by process Pi and let INTi(m) denote the interval between CPi(m-1) and CPi(m) • When Pi sends a message in interval INTi(m) • Pi attaches to it the pair (i,m) • When Pj receives a message with attachment (i,m) in interval INTj(n) • Pj records the dependency INTi(m) INTj(n) • When Pj takes checkpoint CPj(n), it logs this dependency as well • When Pi rolls back to checkpoint CPi(m-1): • Pj will have to roll back to at least checkpoint CPj(n-1) • Further rolling back may be necessary…
Problems with checkpoints p q • But now we can get a cascade effect
Problems with checkpoints p q • Q crashes, restarts from checkpoint…
Problems with checkpoints p q • Forcing P to rollback for consistency…
Problems with checkpoints p q • New inconsistency forces Q to rollback ever further
Problems with checkpoints p q • New inconsistency forces P to rollback ever further
This is a “cascaded” rollback • Or “domino effect” • It arises when the creation of checkpoints is uncoordinated w.r.t. communication • Can force a system to roll back to initial state • Clearly undesirable in the extreme case… • Could be avoided in our example if we had a log for the channel from P to Q
Sometimes action is “external” to system, and we can’t roll back • Suppose that P is an ATM machine • Asks: Can I give Ken $100 • Q debits account and says “OK” • P gives out the money • We can’t roll P back in this case since the money is already gone
Bigger issue is non-determinism • P’s actions could be tied to something random • For example, perhaps a timeout caused P to send this message • After rollback these non-deterministic events might occur in some other order • Results in a different behavior, like not sending that same request… yet Q saw it, acted on it, and even replied!
Issue has two sides • One involves reconstructing P’s message to Q in our examples • We don’t want P to roll back, since it might not send the same message • But if we had a log with P’s message in it we would be fine, could just replay it • The other is that Q might not send the same response (non-determinism) • If Q did send a response and doesn’t send the identical one again, we must roll P back
Options? • One idea is to coordinate the creation of checkpoints and logging of messages • In effect, find a point at which we can pause the system • All processes make a checkpoint in a coordinated way: the consistent snapshot (seen that, done that) • Then resume
Why isn’t this common? • Often we can’t control processes we didn’t code ourselves • Most systems have many black-box components • Can’t expect them to implement the checkpoint/rollback policy • Hence it isn’t really practical to do coordinated checkpointing if it includes system components
Why isn’t this common? • Further concern: not every process can make a checkpoint “on request” • Might be in the middle of a costly computation that left big data structures around • Or might adopt the policy that “I won’t do checkpoints while I’m waiting for responses from black box components” • This interferes with coordination protocols
Implications? • Ensure that devices, timers, etc, can behave identically if we roll a process back and then restart it • Knowing that programs will re-do identical actions eliminates need to cascade rollbacks
Implications? • Must also cope with thread preemption • Occurs when we use lightweight threads, as in Java or C# • Thread scheduler might context switch at times determined by when an interrupt happens • Must force the same behavior again later, when restarting, or program could behave differently
Determinism Despite these issues, often see mechanisms that assume determinism Basically they are saying • Either don’t use threads, timers, I/O from multiple incoming channels, shared memory, etc • Or use a “determinism forcing mechanism”
With determinism… We can revisit the checkpoint rollback problem and do much better • Eliminates need for cascaded rollbacks • But we do need a way to replay the identical inputs that were received after the checkpoint was made Forces us to think about keeping logs of the channels between processes
Two popular options Receiver based logging • Log received messages; like an “extension” of the checkpoint Sender based logging • Log messages when you send them, ensures you can resend them if needed
Why do these work? • Recall the reasons for cascaded rollback • A cascade occurs if • Q received a message, then rolls back to “before” that happened • Now, Q can regenerate the input and re-read the message
With these varied options • When Q rolls back we can • Re-run Q with identical inputs if • Q is deterministic, or • Nobody saw messages from Q after checkpoint state was recorded, or • We roll back the receivers of those messages • An issue: deterministic programs often crash in the identical way if we force identical execution • But here we have flexibility to either force identical executions or do a coordinated rollback
Google Spanner • Scalable globally-distributed multi-versioned database • Main features: • Focus on cross-datacenter data replication • for availability and geographical locality • Automatic sharding and shard migration • for load balancing and failure tolerance • Scales to millions of servers across hundreds of datacenters • and to database tables with trillions of rows • Schematized, semi-relational (tabular) data model • to handle more structured data (than Bigtable, say) • Strong replica consistency model • synchronous replication
Google Spanner • Scalable globally-distributed database • Follow-up to Google’s Bigtable and Megastore • Detailed DB features • SQL-like query interface • to support schematized, semi-relational (tabular) data model • General-purpose distributed ACID transactions • even across distant data centers • Externally (strongly) consistent global write-transactions with synchronous replication • Lock-free read-only transactions • Timestamped multiple-versions of data
Google Spanner • Scalable globally-distributed database • Follow-up to Google’s Bigtable and Megastore • Detailed DS features • Auto-sharding, auto-rebalancing, automatic failure response • Replication and external (strong) consistency model • App/user control of data replication and placement • number of replicas and replica locations (datacenters) • how far the closest replica can be (to control reading latency) • how distant replicas are from each other (to control writing latency) • Wide-areas system
Google Spanner • Scalable globally-distributed database • Follow-up to Google’s Bigtable and Megastore • Key implementation design choices • Integration of concurrency control, replication, and 2PC • Transaction serialization via global, wall-clock timestamps • using TrueTime API • TrueTime API uses GPS devices and Atomic clocks to get accurate time • acknowledges clock uncertainty and guarantees a bound on it
Google Spanner • Scalable globally-distributed database • Follow-up to Google’s Bigtable and Megastore • Production use • Unrolled in Fall 2012 • Used by Google F1, Google’s advertising backend • Replaced a shardedMySQL database • 5 replicas across the US • Less critical app may only need 3 replicas in a single region which would decrease latency (but also availability) • Future use: Gmail, Picasa, Calendar, Android Market, AppEngine, etc
Google Spanner • spanner-osdi2012.pptx d