CSC 536 Lecture 10

CSC 536 Lecture 10

Outline Recovery Case study • Google Spanner

Recovery

Recovery • Error recovery: replace a a present erroneous state with an error-free state • Backward recovery: bring system into a previously correct state • Need to record the system's state from time to time (checkpoints) • Example: • Forward recovery: bring system to a correct new state from which it can continue to execute • Only works with known errors • Example:

Recovery • Error recovery: replace a a present erroneous state with an error-free state • Backward recovery: bring system into a previously correct state • Need to record the system's state from time to time (checkpoints) • Example: retransmit message • Forward recovery: bring system to a correct new state from which it can continue to execute • Only works with known errors • Example: erasure correction

Backward recovery • Backward recovery is typically used • It is more general • However • Recovery is expensive • Sometimes we can't go back (e.g. a file is deleted) • Checkpoints are expensive • Solution for the last point: message logging • Sender-based • Receiver-based

Checkpoints: Common approach • Periodically make a “big” checkpoint • Then, more frequently, make an incremental addition to it • For example: the checkpoint could be copies of some files or of a database • Looking ahead, the incremental data could be “operations” run on the database since the last transaction finished (committed)

Problems with checkpoints p request reply q • P and Q are interacting • Each makes independent checkpoints now and then

Problems with checkpoints p request reply q • Q crashes and rolls back to checkpoint

Problems with checkpoints p request q • Q crashes and rolls back to checkpoint • It will have “forgotten” message from P

Problems with checkpoints p request reply q • … Yet Q may even have replied. • Who would care? Suppose reply was “OK to release the cash. Account has been debited”

Two related concerns • First, Q needs to see that request again, so that it will reenter the state in which it sent the reply • Need to regenerate the input request • But if Q is non-deterministic, it might not repeat those actions even with identical input • So that might not be “enough”

Rollback can leave inconsistency! • In this example, we see that checkpoints must somehow be coordinated with communication • If we allow programs to communicate and don’t coordinate checkpoints with message passing, system state becomes inconsistent even if individual processes are otherwise healthy

More problems with checkpoints p request reply q • P crashes and rolls back

More problems with checkpoints p request reply q • P crashes and rolls back • Will P “reissue” the same request? Recall our non-determinism assumption: it might not!

Solution? • One idea: if a process rolls back, roll others back to a consistent state • If a message was sent after checkpoint, … • If a message was received after checkpoint, … • Assumes channels will be “empty” after doing this

Solution? • One idea: if a process rolls back, roll others back to a consistent state • If a message was sent after checkpoint, roll receiver back to a state before that message was received • If a message was received after checkpoint roll the sender back to a state prior to sending it • Assumes channels will be “empty” after doing this

Solution? p request reply q • Q crashes and rolls back

Solutions? q rolled back to a state before this was received, or reply was sent p request reply q • Q crashes and rolls back

Solution? p q • P must also roll back • Now it won’t upset us if P happens not to resend the same request

Implementation • Implementing independent checkpointing requires that dependencies are recorded so processes can jointly roll back to a consistent global state • Let CPi(m) be the m-th checkpoint taken by process Pi and let INTi(m) denote the interval between CPi(m-1) and CPi(m) • When Pi sends a message in interval INTi(m) • Pi attaches to it the pair (i,m) • When Pj receives a message with attachment (i,m) in interval INTj(n) • Pj records the dependency INTi(m) INTj(n) • When Pj takes checkpoint CPj(n), it logs this dependency as well • When Pi rolls back to checkpoint CPi(m-1): • we need to ensure that all processes that have received messages from Pi sent in interval INTi(m) are rolled back to a checkpoint preceding the receipt of such messages…

Implementation • Implementing independent checkpointing requires that dependencies are recorded so processes can jointly roll back to a consistent global state • Let CPi(m) be the m-th checkpoint taken by process Pi and let INTi(m) denote the interval between CPi(m-1) and CPi(m) • When Pi sends a message in interval INTi(m) • Pi attaches to it the pair (i,m) • When Pj receives a message with attachment (i,m) in interval INTj(n) • Pj records the dependency INTi(m) INTj(n) • When Pj takes checkpoint CPj(n), it logs this dependency as well • When Pi rolls back to checkpoint CPi(m-1): • Pj will have to roll back to at least checkpoint CPj(n-1) • Further rolling back may be necessary…

Problems with checkpoints p q • But now we can get a cascade effect

Problems with checkpoints p q • Q crashes, restarts from checkpoint…

Problems with checkpoints p q • Forcing P to rollback for consistency…

Problems with checkpoints p q • New inconsistency forces Q to rollback ever further

Problems with checkpoints p q • New inconsistency forces P to rollback ever further

This is a “cascaded” rollback • Or “domino effect” • It arises when the creation of checkpoints is uncoordinated w.r.t. communication • Can force a system to roll back to initial state • Clearly undesirable in the extreme case… • Could be avoided in our example if we had a log for the channel from P to Q

Sometimes action is “external” to system, and we can’t roll back • Suppose that P is an ATM machine • Asks: Can I give Ken $100 • Q debits account and says “OK” • P gives out the money • We can’t roll P back in this case since the money is already gone

Bigger issue is non-determinism • P’s actions could be tied to something random • For example, perhaps a timeout caused P to send this message • After rollback these non-deterministic events might occur in some other order • Results in a different behavior, like not sending that same request… yet Q saw it, acted on it, and even replied!

Issue has two sides • One involves reconstructing P’s message to Q in our examples • We don’t want P to roll back, since it might not send the same message • But if we had a log with P’s message in it we would be fine, could just replay it • The other is that Q might not send the same response (non-determinism) • If Q did send a response and doesn’t send the identical one again, we must roll P back

Options? • One idea is to coordinate the creation of checkpoints and logging of messages • In effect, find a point at which we can pause the system • All processes make a checkpoint in a coordinated way: the consistent snapshot (seen that, done that) • Then resume

Why isn’t this common? • Often we can’t control processes we didn’t code ourselves • Most systems have many black-box components • Can’t expect them to implement the checkpoint/rollback policy • Hence it isn’t really practical to do coordinated checkpointing if it includes system components

Why isn’t this common? • Further concern: not every process can make a checkpoint “on request” • Might be in the middle of a costly computation that left big data structures around • Or might adopt the policy that “I won’t do checkpoints while I’m waiting for responses from black box components” • This interferes with coordination protocols

Implications? • Ensure that devices, timers, etc, can behave identically if we roll a process back and then restart it • Knowing that programs will re-do identical actions eliminates need to cascade rollbacks

Implications? • Must also cope with thread preemption • Occurs when we use lightweight threads, as in Java or C# • Thread scheduler might context switch at times determined by when an interrupt happens • Must force the same behavior again later, when restarting, or program could behave differently

Determinism Despite these issues, often see mechanisms that assume determinism Basically they are saying • Either don’t use threads, timers, I/O from multiple incoming channels, shared memory, etc • Or use a “determinism forcing mechanism”

With determinism… We can revisit the checkpoint rollback problem and do much better • Eliminates need for cascaded rollbacks • But we do need a way to replay the identical inputs that were received after the checkpoint was made Forces us to think about keeping logs of the channels between processes

Two popular options Receiver based logging • Log received messages; like an “extension” of the checkpoint Sender based logging • Log messages when you send them, ensures you can resend them if needed

Why do these work? • Recall the reasons for cascaded rollback • A cascade occurs if • Q received a message, then rolls back to “before” that happened • Now, Q can regenerate the input and re-read the message

With these varied options • When Q rolls back we can • Re-run Q with identical inputs if • Q is deterministic, or • Nobody saw messages from Q after checkpoint state was recorded, or • We roll back the receivers of those messages • An issue: deterministic programs often crash in the identical way if we force identical execution • But here we have flexibility to either force identical executions or do a coordinated rollback

Google Spanner

Google Spanner • Scalable globally-distributed multi-versioned database • Main features: • Focus on cross-datacenter data replication • for availability and geographical locality • Automatic sharding and shard migration • for load balancing and failure tolerance • Scales to millions of servers across hundreds of datacenters • and to database tables with trillions of rows • Schematized, semi-relational (tabular) data model • to handle more structured data (than Bigtable, say) • Strong replica consistency model • synchronous replication

Google Spanner • Scalable globally-distributed database • Follow-up to Google’s Bigtable and Megastore • Detailed DB features • SQL-like query interface • to support schematized, semi-relational (tabular) data model • General-purpose distributed ACID transactions • even across distant data centers • Externally (strongly) consistent global write-transactions with synchronous replication • Lock-free read-only transactions • Timestamped multiple-versions of data

Google Spanner • Scalable globally-distributed database • Follow-up to Google’s Bigtable and Megastore • Detailed DS features • Auto-sharding, auto-rebalancing, automatic failure response • Replication and external (strong) consistency model • App/user control of data replication and placement • number of replicas and replica locations (datacenters) • how far the closest replica can be (to control reading latency) • how distant replicas are from each other (to control writing latency) • Wide-areas system

Google Spanner • Scalable globally-distributed database • Follow-up to Google’s Bigtable and Megastore • Key implementation design choices • Integration of concurrency control, replication, and 2PC • Transaction serialization via global, wall-clock timestamps • using TrueTime API • TrueTime API uses GPS devices and Atomic clocks to get accurate time • acknowledges clock uncertainty and guarantees a bound on it

Google Spanner • Scalable globally-distributed database • Follow-up to Google’s Bigtable and Megastore • Production use • Unrolled in Fall 2012 • Used by Google F1, Google’s advertising backend • Replaced a shardedMySQL database • 5 replicas across the US • Less critical app may only need 3 replicas in a single region which would decrease latency (but also availability) • Future use: Gmail, Picasa, Calendar, Android Market, AppEngine, etc

Google Spanner • spanner-osdi2012.pptx d

CSC 536 Lecture 10

CSC 536 Lecture 10

Presentation Transcript

ENV 536: Environmental Economics and Policy (Lecture 2)

CSC 4320/6320 Operating Systems Lecture 10 Protection

CS 536

CSC 536 Lecture 9

CSC 536 Lecture 8

CSC 101 Introduction to Computing Lecture 10

CSC 536 Lecture 2

CSC 536 Lecture 1

CSC 536 Lecture 3

CSC 536 Lecture 5

CSC 536 Lecture 7

CSC 536 Lecture 4

CSC 536 Lecture 6

CSC 2515 2008 Lecture 10 Support Vector Machines

CSC 211 Data Structures Lecture 10

536

CSC 453 Database Systems Lecture

CSC 453 Database Systems Lecture

CSC 453 Database Systems Lecture

CSC 453 Database Systems Lecture

CSC 453 Database Systems Lecture

CSC 453 Database Systems Lecture