Recovery Management in QuickSilver

Recovery Management in QuickSilver Roger Haskin, Yoni Malachi, Wayne Sawdon, and Gregory Chan IBM Almaden Research Center

Introduction: Problem Domain • Recovery management in distributed OSs • Trends in contemporary research: • Extensibility and Distribution

Contemporary Recovery Techniques • timeouts • how to distinguish slow from dead? • connectionless protocols / stateless servers • some actions can’t be made idempotent • retries can cause problems • virtual circuits • can’t handle multiple servers • replication • too expensive for some uses • how to detect failures?

Quicksilver: what’s so special? • Fundamental Trade-Off: • Generality & efficiency vs. Ease of use (Quicksilver)(Camelot, Argus, etc.) Transparency isn’t always best!

Quicksilver: specs and features • Client-server model • System services are processes • IPC message-passing • More complicated set of failure modes (to handle more specific cases) • Atomic transactions

Server Classes Common server classes: • Volatile (window manager) • Replicated + volatile (name server) • Recoverable (file server) • Long running transactions need log support

Design Goals • Programs should be resilient to external process and machine failure • Server processes should contain their own recovery code • Uniform system-wide architecture for recovery management • Logically related activities must execute atomically

Transaction Structure • Everything belongs to a transaction • Globally unique transaction identifiers (tid) • Each transaction has one owner and multiple participants • Owner can commit or abort • Participants can only abort

Recovery Manager: Components • Transaction Manager: manages commit coordination by communicating with servers at its own node and with transaction managers at other nodes • Log Manager: serves as a common recovery log both for the TM’s commit log and the server’s recovery data • Deadlock Detector: detects and resolves global deadlocks (not implemented)

Quicksilver System Structure

Transaction Manager • Tracks transactions for processes on host • Manages distributed commit protocol • Distributed transaction is a tree • Only need to know your superior and your immediate subordinates • Several alternative commit protocols available to servers • 1-phase – used by volatile servers • 2-phase – used by recoverable servers

2-Phase Commit • Voting options • abort: undo my action, announce abort to others in 2nd phase • commit-read-only: no recoverable resources modified, don’t include me in 2nd phase • commit-volatile: same as read-only, but notify me of results of 2nd phase • commit-recoverable: recoverable state modified, notify me of results of 2nd phase

Transaction Coordination • Transaction coordinator at transaction birth-site • Usually a user workstation, likely to fail • Migrate or replicate coordinator for reliability

Log Manager • Log manager provides optional services • Backpointers for log replay • Block I/O access • Log replication • Log archival • Servers tell LM what they need • Not penalized for services they don’t use • LM does not interpret data – servers determine recovery strategy

Quicksilver Distributed IPC

Structure of a Distributed Transaction

Open questions - ??? • Efficiency vs. Transparency? • Still relevant for today’s hardware? • …

Recovery Management in QuickSilver

Recovery Management in QuickSilver

Presentation Transcript

Disaster Recovery Management- Introduction

Designing Services for Recovery: Toward Sustained Recovery Management

Illness Management and Recovery

Mercury Quicksilver

Illness Management and Recovery

Disaster Recovery Management

Recovery Case Management A Model in Progress

Transaction Management: Crash Recovery

Recovery Management: Presentation Guidelines

Quicksilver: Live Distributed Object

Recovery Management: Changes in Clinical Practices

DB2 Recovery Management Strategy

Using QuickSilver to mock-up Web 3.0

Asset Recovery, Asset Management

Recovery and NPA Management

Recovery Management in Quicksilver

Debt Recovery Management

PDF Recovery and Management

Disaster Recovery Management Solutions

X Men Apocalypse Quicksilver Leather Jacket

Disaster Recovery in Business Continuity Management