180 likes | 190 Views
CSE 490dp Check-pointing and Migration. Robert Grimm. Problem. How to capture the state of an application? Save and restore application Clone application Move application to a different node Technical issue. Motivation. Failure resilience Restart application after failure Performance
E N D
CSE 490dpCheck-pointing and Migration Robert Grimm
Problem • How to capture the state of an application? • Save and restore application • Clone application • Move application to a different node • Technical issue
Motivation • Failure resilience • Restart application after failure • Performance • Balance load across several nodes • Co-locate application with (remote) data • Availability • Move away from nodes that are going to go down • Follow a user as she moves through physical world
Application State • Internal data • Memory, objects • Execution state • Thread-based: Stack, registers • Event-based: Event queue • Connections • Open files, sockets • Outside data • Executables • Stored data
What State to Capture? • Issue: Degree of transparency • Fully transparent • Application cannot tell the difference • No transparency • Application needs to do everything itself
Internal Data • Most basic application state • Memory – copy • C, C++ • Objects – serialize • Modula-3 • Java
Execution State • System must be quiescent • All execution is suspended • Thread-based: State is implicit • Stack • Registers, including PC • Condition variable queues • Very low level • Event-based: State is explicit • Event queue
Connections • Open files, sockets, etc. • Problems • May change while application is not executing • Check-points • May not be available on new node • Migration
Alternative • Let application restore its connections • Harder for thread-based systems • Thread may be accessing file or socket • Easier for event-based systems • Tell application to restore connections • Explicit event
Outside Data • Executables, stored data • Make data available everywhere • Distributed file system • Move executable(s) with application • Support moving code but not other data • Group data and applications • Environments in one.world • Hierarchy moved as one unit
Three Points in the Design Space • Sprite [Douglis & Ousterhout 91] • Aglets [Lange & Oshima 98] • Representative of Java-based agent systems • one.world
Sprite • Process migration motivated by performance • Use idle machines • Transferred application state • Data • Execution state • Open connections • “It turned out to be particularly difficult in Sprite to migrate the state associated with open files”
Transparency in Sprite • Application seems to be on “home machine” • Location-independent kernel calls • File system • Transfer execution state • VM, open files, PIDs, UIDs, resource usage statistics • Call back to home machine • gettimeofday • Modify state on both machines • fork, exit, wait
Aglets • Mobile agent system • “Clean” platform for experimentingwith mobile agents • Transferred application state • Data • Relies on Java serialization • Executables • Lazily – only currently used classes
Limitations • Not transferred • Execution state • Not supported by Java • Applications need to implement their own state machines • Outside data beyond executables • Not part of platform
one.world • Failure resilience, availability, (performance) • checkpoint, restore, move, clone • Transferred state • Data • Execution state • Event queue • Outside data • Environment hierarchy • Not transferred • Open connections
Programming for Change • Pervasive computing environment • Highly dynamic • Tens of thousand of nodes and services come and go • Applications • Cannot assume existence or availabilityof resources • Need to be prepared to re-acquireany resource at any time
Summary • Sprite • Full migration, full transparency • Does not scale across a global network • Aglets • Limited environment with limited migration • one.world • Better balance between no migration and full migration (?)