300 likes | 400 Views
Rewind, Repair, Replay: Three R’s to improve dependability. Aaron Brown and David Patterson ROC Research Group University of California at Berkeley SIGOPS European Workshop, 23 September 2002. What if computer systems could travel in time?. We could have retroactive repair
E N D
Rewind, Repair, Replay:Three R’s to improve dependability Aaron Brown and David Patterson ROC Research GroupUniversity of California at Berkeley SIGOPS European Workshop, 23 September 2002
What if computer systems could travel in time? • We could have retroactive repair • travel back and fix problems before they had a chance to corrupt data • We could eliminate human operator error • make a mistake? Just travel back and try it again. • Our systems could be more robust • we could eliminate the dangers of upgrades • we could better tolerate buggy software • we might even be able to tolerate viruses and hackers • We could make more dependable systems
Sci-fi time travel our hero loses a loved one or lives through disaster hero uses time machine to travel back in time hero alters the past to avert the future disaster hero returns to the present; past changes have been merged into the original timeline Computer time travel human error, software bug, or attack causes data loss Rewind: roll system state backwards in time Repair: make changes to avert foretold disaster Replay: roll system state forward, merging the original timeline with the effects of repairs Sci-fi vs. computer time travel • Three R’s are the fundamental primitives of computer time travel
Key properties of the 3R’s • Recovery from problems at any system layer • rewind, repair, replay cover OS through application • Recovery from unanticipated problems • arbitrary repair • No assumptions about correct application behavior • physical rewind • Integrated interface • provide “undo for sysadmins”
Designing a 3R system • Goals • application-neutrality • provide abstractions for reasoning about 3R behavior • Target domain: network services • accessed by remote users via well-defined interfaces • email, messaging, e-commerce, auctions, forums, web hosting, enterprise applications (J2EE, .NET), ... • Challenges, learned from first attempt • integrating history and repair during replay • managing inconsistency in externally-visible state
ControlUI App. Service Includes: - user state - application - operating system UndoManager Time-travelstorage layer HistoryLog 3R API control Basic architecture • Application-independent undo manager • coordinates 3R cycle; manages external inconsistencies • linked via a set of APIs to application, time-travel storage, history log, and control UI
Abstracting the application service • To the undo manager, the application is: • a collection of state • a history of events affecting the state • an event is typically a user interaction with the service • a model of acceptable external consistency • These are encoded into application-defined verbs • high-level encodings of user interactions (events) • records of intent to alter state, not actual state changes • reference application state by opaque UIDs • provide policies that define external consistency
Verbs and the 3R cycle • Normal operation • undo manager logs application-provided verbs to disk Userinteraction ControlUI App. Service Verbs Includes: - user state - application - operating system UndoManager HistoryLog Time-travelstorage layer control
Verbs and the 3R cycle • Rewind • time-travel storage layer reverts system hard state to rewind point • all changes since rewind point are discarded ControlUI App. Service Includes: - user state - application - operating system UndoManager HistoryLog Time-travelstorage layer control
Verbs and the 3R cycle • Repair • operator edits logged history and/or makes arbitrary changes to system ControlUI Repairs Edits App. Service Includes: - user state - application - operating system UndoManager HistoryLog Time-travelstorage layer control
Verbs and the 3R cycle • Replay • undo manager feeds verbs back to application for re-execution in the context of repaired system ControlUI App. Service Includes: - user state - application - operating system UndoManager HistoryLog Verbs Time-travelstorage layer control
The fundamental roles of verbs • Providing application-independence • verbs encapsulate application semantics, but remain semi-opaque to undo manager • Integration of repair into history • high-level specification of intent makes verbs relatively independent of system changes • verbs are re-executed, not restored, so they inherit effects of repairs • Scoping restored history • only changes logged as verbs will be preserved by 3Rs • effects of bugs, corruption, human error are discarded • can reason about what is preserved/lost in 3R cycle
Managing external inconsistency • External inconsistency == time paradox? • system is internally-consistent after a 3R cycle • but external observers see inexplicable state changes • external inconsistency is OK unless affected state was externalized (observed) before the 3R cycle • Coping with external inconsistency • cannot eliminate • must manage: ignore, explain, compensate, encompass • Verbs let us manage external inconsistency
Managing inconsistency with verbs • To detect inconsistencies: • verbs specify the state that they depend upon • undo manager tracks signatures of that state • if verb is altered or if signatures don’t match, there is an inconsistency • applications supporting relaxed consistency can replace signature-check with arbitrary consistency predicates • To detect state viewed externally: • verbs indicate what state they externalize • example: IMAP fetch verb externalizes email message • To handle externalized inconsistencies: • verb supplies compensation functions
Hello olleH m m ! Deliver Fetch Inbox olleH Move olleH Folder1 DeliverMsg MoveMsg FetchMsg Externalizes: — ContentDep: — ExistsDep: Inbox Externalizes: — ContentDep: — ExistsDep: Inbox, Folder1 Externalizes: m ContentDep: m ExistsDep: m, Folder1 + input “Hello” + Signature(m)=“olleH” Email example: original timeline Systemboundary Systemstate Verbs Historylog Time
Hello Hello Hello olleH m m m m ! Deliver Deliver Fetch Fetch Inbox olleH Move Move Hello olleH Hello Folder1 mismatch! => inconsistency DeliverMsg DeliverMsg MoveMsg MoveMsg FetchMsg FetchMsg Externalizes: — ContentDep: — ExistsDep: Inbox Externalizes: — ContentDep: — ExistsDep: Inbox Externalizes: — ContentDep: — ExistsDep: Inbox, Folder1 Externalizes: — ContentDep: — ExistsDep: Inbox, Folder1 Externalizes: m ContentDep: m ExistsDep: m, Folder1 Externalizes: m ContentDep: m ExistsDep: m, Folder1 + input “Hello” + input “Hello” + Signature(m)=“olleH” + Signature(m)=“olleH” Email example: replay timeline Systemboundary X Systemstate Verbs Historylog Time
Recap: 3R architecture • Goal: application-neutral implementation of 3R’s • verb abstraction couples generic undo manager to app. • verbs provide tools to reason about 3R behavior • Challenges • integrating history and repair during replay • re-executing verbs restores intent of history • managing inconsistency in externally-visible state • verbs track externalization, state dependencies, and define compensations
Status • Prototype implementation of 3R primitives nearly complete • app-independent undo manager written in Java • all APIs defined as Java interfaces • Network Appliance filer as time-travel storage layer • BerkeleyDB as history log • First target app: web-based email service • 3R-enhanced JavaMail API provider classes • plus additional hooks to verb-ify operator maintenance tasks like account creation • JWebMail web front-end • RDBMS-based backend mail store (DB2 or MySQL) • implementation in progress
Open issues & future work • Resource impact of the 3R’s • what are the performance/space penalties for the 3R’s? • Verb definition • can we specify verbs & consistency policy declaratively? • Providing the 3R’s at multiple granularities • can we track & manage cross-granularity dependencies? • Measuring the dependability benefit of 3R’s • how do we build recovery/dependability benchmarks? • Other uses for verb-based characterizations • easy georeplication? online self-checking? automatic verification of upgrades?
Conclusions • We can build time travel for computers • using the 3R’s: Rewind, Repair, Replay • An architecture for the 3R primitives • generic undo manager coupled to application by verbs • Verbs are a useful abstraction for the 3R’s • can use to reason about effects of 3R’s on state • help address problem of external inconsistencies • Prototype 3R-enabled email system under construction • hope to demonstrate increased dependability and faster recovery from problems
Rewind, Repair, Replay:Three R’s to improve dependability For more information: http://roc.cs.berkeley.edu/ abrown@cs.berkeley.edu
Verbs vs. transactions • Both encapsulate state-altering events • But, unlike transactions: • verbs are higher-level, recording end-user intent, not specific state changes • verbs do not depend on internal data models (but do depend on external protocols) • transactions are the reverse • verbs do not necessarily conform to ACID consistency • verbs inherit consistency model provided by application at the external-protocol level
Implementing verbs • Verbs are defined by a type hierarchy • base type defines interfaces for state dependencies, externalizations, predicates, compensations • applications subclass the base type for their verbs • additions to the type are opaque to the undo manager • Referencing state • all user-visible state named by time-invariant UIDs • undo manager requires signature method for all state • Consistency predicates and compensations are application-supplied functions • they encode the app’s external consistency model
Defining verbs • Currently, verbs are defined procedurally • provide dependency information via lists of state IDs • provide functions for special consistency predicates • provide functions for compensation • Better: declarative specification • compile textual specification into verb code using libraries of predicates and compensation fns • reduces complexity of adding 3R’s to the application • increases confidence in undo system via easier testing
External consistency policies • Verbs capture external consistency policies • Example: email • message order in folder is irrelevant • AppendMessage verb does not express dependency on content of target folder, only its existence • content of messages is relevant, except for headers • ReadMsg verb depends on hash of target message body; if changed, compensate by inserting explanatory text • Example: e-commerce • order total depends on item prices, not descriptions • Checkout verb depends on prices of items in cart, not their hash-values; if sum of prices changed, compensate by emailing customer for approval
External consistency policies (2) • Example: auctions • new bid must be larger than prior bids • PlaceBid verb depends on content of all bids in bid set; if one is now larger than new bid, compensate by canceling new bid and informing bidder
Application implications • To support the 3R’s, an application must have: • a high-level, verb-structured interface/API for user, operator, and external actions • a state model where all user-visible state: • is nameable via the API • is tagged with GUIDs • supports a signature/hash method • a relaxed external consistency model that allows compensation for externalized inconsistent verbs
Example: a 3R email store • State • mailstores, folders, messages, user properties, aliases • Verbs • transport: create/delete/alter mapping; deliver msg • directory: create/alter/delete user-entry; create/alter/delete filter-rule; add/remove maildrop • store: create/delete store; create/rename/delete folder; expunge folder; list folder; set folder flags; copy msg; append msg; fetch msg; set msg flags HTTP IMAP, internal WebUI SMTP Transport Store internal LDAP, internal verbs Directory/Auth. UndoMgr verbs