Taking Transactions Mainstream: Social Failure Modes and Recovery

Taking Transactions Mainstream: Social Failure Modes and Recovery Don Box, Microsoft Corp

Disclaimers • This talk is mostly looking backwards • Future is way less clear than the past • Lots of interesting data to be harvested • STM may look more like the past than we think • Microsoft stack used in examples • It’s the one I know best • I’ll qualify where things are different • When in doubt, assume we have it worse

Microsoft & Transactions To Date

Transactional Mechanics 101 • Three Party System • Application – establishes transaction boundaries and initiates work • Resource Manager (RM) – performs transactional work on behalf of application • Transaction Manager (TM) – coordinates outcome with application and resource managers • Resource manager may be durable (survives crashes) or volatile (doesn’t)

Isolation and the Three Parties • The Application specifies the desired isolation level when it asks the TM for new transaction • RMs discover isolation level from transaction at enlistment-time • RMs implement isolation however they see fit • May use two-phase locking • May use multi-versioning • May provide RM-specific overrides

Three Parties and Performance • In the limit, every party is on a distinct box • Lots of marshaling and x-host communication • If there’s only one RM, no 2PC needed • TM delegates commit to RM • If all resources are volatile, TM needn’t log • If all resources are volatile and in-proc, 2PC reduces to virtcalls++

Transactional Mechanics 102 • Transactions protect operations on a managed resource • Some resource managers dynamically enlist with the transaction at the time of invocation • Op1(tx, args) • Op2(tx, args) • Some resource managers statically enlist a session with the transaction • Bind(conn, tx) • Op1(conn, args) • Op2(conn, args)

Static vs. Dynamic Enlistment • Dynamic enlistment is what people expect • Transactions are temporal phenomena • Transactions “flow” with thread of control • Static enlistment was a performance optimization • Most early RMs were x-proc/x-host • Marshaling transactions non-trivial due to logging • Matters get worse when transactions are passed implicitly • Does op use cached tx, current tx, or no tx? • State of the practice is RTFM

The App Server Era • 1996-2006+: App Server era • OLTP + Distributed Objects + the Web • Microsoft: MTS + DCOM + IIS/ASP • Sun et al: EJB + RMI + Servlets/JSP • Emblematic features • Declarative Transactions • Managed execution/deployment environment • N-tier design style (transaction composition)

Declarative Transactions • Captured transactional requirements as data • The “system” guaranteed that your code ran with an implicit transaction (typically in TLS) • Your implicit transaction propagated to other chunks of code you called by default • Your code gets to influence transaction outcome via implicit context

Declarative Transactions in MTS int MyMethod(string name) { GetObjectContext().SetAbort(); int id = CreateUser(name); AuditChange(id, name + " was added"); GetObjectContext().SetComplete(); return id; } Class MyClass Transaction=Required

Trouble in Paradise? • Previous example shows the ideal • Declaration of intent visible to system • Work composes without manual transaction enlistment or propagation • Most important: Code knows its transactional • Potential Problem #1: Transaction Extraction • Potential Problem #2: Transaction Injection

Problem: Transaction Extraction • Developer writes code with expectations of atomicity • Declaration captures this expectation • At least one system allows admin to change declaration without developer consent • It took us years to fix this • ASP got it right from day one 

Avoiding Transaction Extraction • Obvious Solution: Embed it with (or in) code int MyMethod(string name) { int id = 0; using (TransactionScope scope = new TransationScope()) { id = CreateUser(name); AuditChange(id, name + " was added"); scope.Complete(); } return id; } [OperationBehavior(TransactionScopeRequired=true)] int MyMethod(string name) { int id = CreateUser(name); AuditChange(id, name + " was added"); return id; }

Problem: Transaction Injection • Transaction Injection problem way more insidious (and harder to fix) • Transaction-ignorant code tends to: • Interact with non-transactional resources • Interact with transactional resources • Cache connections to static-binding RMs • Interact with user

Avoiding Transaction Injection • If you know about transactions, you can turn them off (or at least suppress propagation) int MyMethod(string name) { int id = 0; using (TransactionScope scope = new TransationScope(TransactionScopeOption.RequiresNew)) { id = CreateUser(name); AuditChange(id, name + " was added"); scope.Complete(); } return id; }

The Dangers of Suppressing Propagation • Previous example forced itself into a transaction that was distinct from its caller’s • Two independent outcomes (by design) • These are not nested transactions (by design) • If caller and callee access a common resource, we have an isolation problem • Will likely tank one of the transactions due to timeout (deadlock) or validation error

Transactions and Trust • Previous example demonstrated one side of composition problem (callee distrusts caller) • Problem also applies in reverse direction • Passing transaction to callee gives ability to rollback (feature and bug) • Passing transaction to callee gives ability to significantly increase transaction time • The latter flies in the face of TX dogma

Transactions and Time • Transaction isolation discourages long-running transactions • Two-phase locking leads to blocking/starvation • Multi-versioning leads to rollbacks from validation failures • Neither of these an issue with private resources • Unfortunately, we’re still screwed due to heterogeneity • No central lock manager means deadlock avoidance is done using timeouts

Core Problem: Composition • Most problems shown are a result of composing code (including RMs) in a single transaction • Here are a few other problems to address: • Mixing isolation levels • Mixing 2PL and multiversion RMs • Mixing capabilities (nesting, chaining) • Shared data and quiescence • Mixing enlisted and non-enlisted work • Non-Idempotency • Declarative notations for all of the above

Life After A Transaction • Getting things right within a single transaction is the simple problem • Things get much dicier after the transaction is complete • What if the transaction returned a result? • What if the transaction’s success implies more work needs to be done?

Transactions and Results • Care is needed when RM state leaks out of a transaction • All locks have been released • Results are potentially stale • MTS/COM+ took draconian measures to make it hard to retain results across transactions • Enter the “stateless object” • Common practice is to reassert assumptions in subsequent transactions • Most data access stacks provide the basics for free

Transactions and Future Work • Ensuring future work requires the ability to trigger work on successful TX outcome • E.g., do next step IFF this tx commits • Classically done with transacted message queues (JMS, MSMQ, MQ) • Control flow state implicit in queue state • Increasingly done using “process” engines on top of TP system • Control flow state explicit in persisted process • Huge growth area – likely successor to App Server

So, Where are we? • Implicit transactions are a hit despite the difficulties • SQL 2K5 + System.Transactions nailed sweet spot • STM is a logical progression in this trend • Composition and non-TX operations still hard • As an industry, we’re way less baked in the cross-transaction problem space • Lots of ad hoc machinery being built • This is where the enterpri$e is hurting • The web is hurting too

Taking Transactions Mainstream: Social Failure Modes and Recovery

Taking Transactions Mainstream: Social Failure Modes and Recovery

Presentation Transcript

Failure Prevention and Recovery

Transactions and Recovery

Loads, Responses, Failure Modes

VRLA BATTERIES-FAILURE MODES

Failure Modes and Effects Analysis

Human Failure Modes

Failure Prevention and recovery

Transactions and Recovery

CSE544 Transactions: Recovery

Failure Recovery

Failure Modes and Effects Analysis

Failure Modes and Effects Analysis (FMEA)

Modes of Failure

Mainstream social sciences

Failure Recovery

Process Failure modes and effects analysis

Bolted joint failure modes

Failure Prevention and Recovery

Failure Modes and Effects Analysis A Failure Modes and Effects Analysis (FMEA)

Failure Modes and Effects Analysis