Recovery Management in Quicksilver

Recovery Management in Quicksilver Haskin, Malachi, Sawdon, Chan IBM Almaden ACM TOCS (6:1) February 1988

Introduction • Distributed, extensible system • Partition computation and data • “lean” kernel • System services are processes • Message-oriented IPC • How to deal with more complicated failure modes? • Provide atomic transactions as system service

Recovery Techniques • timeouts • how to distinguish slow from dead? • connectionless protocols / stateless servers • some actions can’t be made idempotent • retries can cause problems

Recovery Techniques • virtual circuits • can’t handle multiple servers • replication • too expensive for some uses • how to detect failures?

Transactions • Basic idea: use transactions as a single, system-wide recovery paradigm • Transactions are heavyweight • Not every server needs them • Different server classes • Volatile (window mgr) • Replicated + volatile (name server, uses TXN for commit atomicity) • Recoverable (file server) • Long running transactions need log support

Structure of Transactions • Everything belongs to a transaction • Default transaction ID for processes • Globally unique transaction identifiers • Each transaction has an owner and multiple participants • Owner can commit or abort • Participants can only abort

Recovery Manager • One transaction-based recovery manager per host • Three components • Transaction Manager • Log Manager • Deadlock Detector

Transaction Manager • Tracks transactions for processes on host • Manages distributed commit protocol • Distributed transaction is a tree • Only need to know your superior and your immediate subordinates • Failure vs. Termination • Termination causes commit/abort to proceed immediately • Failure is remembered and transaction aborted when it finally terminates

Transaction Manager • Participants can say whether their failure causes transaction failure or termination • Subordinates can reclaim resources early • Several alternative commit protocols available to servers • 1-phase – used by volatile servers • 2-phase – used by recoverable servers

2-phase Commit • Different voting options • abort: undo my action, announce abort to others in 2nd phase • commit-read-only: no recoverable resources modified, don’t include me in 2nd phase • commit-volatile: same as read-only, but notify me of results of 2nd phase • commit-recoverable: recoverable state modified, notify me of results of 2nd phase

Commit Processing • Special rules to handle special cases • Commit before participate (late joining) • Cycles in transaction graph • New requests after being prepared to commit • Rules • TM must accept new participants and let them vote until commit • All requests that could force an abort must complete before commit • 1-phase-commit servers cannot commit before making requests that might force an abort

Commit Processing • Transaction coordinator at transaction birth-site • Usually a user workstation, likely to fail • Migrate or replicate coordinator for reliability

Log Manager • Log manager provides optional services • Backpointers for log replay • Block I/O access • Log replication • Log archival • Servers tell LM what they need • Not penalized for services they don’t use • LM does not interpret data – servers determine recovery strategy

Deadlock Detector • Distributed deadlock detection is hard! • So, they didn’t do it.

Criticisms • ???

Criticisms • IPC is responsible for a lot • Guaranteed delivery • Message ordering • Security constraints • Keeping transaction graphs together • For a system that claims to not make you pay for services you don’t use….

Why Do We Care? • Transactions as a core OS mechanism • Mechanism, not policy • Customize effort to need • Optional cost for optional services

Recovery Management in Quicksilver

Recovery Management in Quicksilver

Presentation Transcript

Disaster Recovery Management- Introduction

Designing Services for Recovery: Toward Sustained Recovery Management

Illness Management and Recovery

Mercury Quicksilver

Illness Management and Recovery

Disaster Recovery Management

Recovery Case Management A Model in Progress

Transaction Management: Crash Recovery

Recovery Management: Presentation Guidelines

Quicksilver: Live Distributed Object

Recovery Management: Changes in Clinical Practices

DB2 Recovery Management Strategy

Recovery Management in QuickSilver

Using QuickSilver to mock-up Web 3.0

Asset Recovery, Asset Management

Recovery and NPA Management

Debt Recovery Management

PDF Recovery and Management

Disaster Recovery Management Solutions

X Men Apocalypse Quicksilver Leather Jacket

Disaster Recovery in Business Continuity Management