400 likes | 411 Views
Explore the proposed solution of movement-based check-pointing and logging for efficient recovery in mobile computing systems. Analyze performance, system model, results, and future work. Address challenges like mobility, failure recovery, and resource limitations in the mobile computing environment.
E N D
Movement-Based Check-pointing and Logging for Recovery in Mobile Computing Systems Sapna E. George, Ing-Ray Chen, Ying Jin Dept. of Computer Science Virginia Polytechnic and State University
Outline • Background • Problem Definition – Failure Recovery in the Mobile Computing Environment • Proposed Solution – Movement-Based Check-pointing and Logging • Performance Analysis • Analytic Model of the System • Analysis Results and Conclusions • Future Work
Mobile Computing • Advances in wireless networking and portable device technologies are revolutionizing computing • Mobile Computing – A type of distributed computing • Involves hosts that may be mobile • Host network connectivity maintained through wireless communications
Fault-tolerance in Distributed systems Check-pointing, Logging, Rollback recovery • Check-pointing failure-free operations • Save system state to stable storage • This snapshot is called a checkpoint • Logging failure-free operations • All non-deterministic events and the information necessary to replay these events are logged to the stable storage • In addition to checkpoints
Fault-tolerance in Distributed systems • Failure Recovery • Failed process rolls back to the latest checkpoint • Replays all the logged events in their original order • Recreates pre-failure state independently
Problem Definition Failure Recovery in the Mobile Computing Environment
Effects of Properties of MC Env. • Mobility of hosts • If checkpointing requires coordination, the MH must be searched and located first before control messages can be delivered; this increases communication delay • Data related to recovery, such as checkpoints and logs, may be distributed over many MSS; a mechanism is required for efficient storage, retrieval and management of this dispersed information
Effects of Properties of MC Env. • Low bandwidth and unreliable network connectivity • A recovery mechanism that requires a large number of messages or large size of messages imposes undue burden on the wireless resources and increases the cost of providing fault tolerance.
Effects of Properties of MC Env. • Limited battery life of host devices • Communication is energy intensive. • Recovery mechanism must keep communication (the number of messages and the size of messages) to a minimum.
Effects of Properties of MC Env. • Lack of stable storage on host devices • Devices are vulnerable to physical damage • Devices are small and are equipped with limited memory • MH’s disk cannot reliably function as the stable storage required to store recovery information.
Effects of Properties of MC Env. • Different types of ‘failures‘ • Voluntary disconnection and hardware failure must be handled differently • A disconnected host may reconnect after a while and expect to resume operations • A MH that is currently unreachable cannot be expected to participate in a checkpointing or recovery operation. • A scheme that requires synchronization or coordination with other MHs would either block until the MH reconnected or would fail.
The Problem… • Traditional recovery schemes suffer from many shortcomings when applied to the mobile computing environment. • The failure-prone nature of the environment makes it essential to provide some form of explicit recovery mechanism.
The Problem… • In general, application recovery mechanisms try to balance • Recovery cost (failure-free operational cost) • Recovery time • Storage requirements for recovery related information
The Problem… • Adaptations of traditional recovery schemes for the mobile computing environment • Do not consider mobility in the selection of checkpointing interval • Use periodic checkpointing • Subsequently control the proliferation of recovery information using techniques that merge logs and move the information closer to the MH.
Proposed Solution Movement-Based Check-pointing and Logging
Assumed Mobile Computing System • A set of mobile hosts (MHs) • They maintain network connectivity through a wireless link to a static mobile support station (MSS) • A MSS handles all communications to and from MHs within its area of influence known as a cell • Each MSS is equipped with enough volume of stable storage to store the state and log information
Assumed Mobile Computing System • Interactions between the MH and the network infrastructure relevant to failure recovery • Handoff – Cell boundary crossing • Disconnection – For power conservation • Reconnection – Possibly in a cell different from the one in which it disconnected
Assumed Mobile Computation • A distributed computation a number of processes executing concurrently on multiple hosts. • Process states: • Normal- executing application related computations, receiving user inputs or sending and receiving messages. • Save - saves its state as a checkpoint to the stable storage • Between checkpoints, the process also logs all events (Normal state) • Recovery – Loads checkpoints and applies logs
Movement-Based Checkpointing and Logging • Interval between checkpoints is governed by the number of handoffs experienced by the MH and is not fixed • MH maintains a handoff counter which is incremented by 1 every time a handoff occurs. • When the value of the counter becomes greater than a threshold M, a checkpoint is taken. • In between checkpoints, all write events related to a MH is also logged to the local MSS.
Movement-Based Checkpointing and Logging • The threshold M is a configurable parameter. Depends on: • User mobility rate • Network the failure rate • Application log arrival rate
Movement-Based Checkpointing and Logging • Thus, depending on the variability in the MH’s mobility, the time interval between successive checkpoints differs. • Recovery – MH recovers independently without coordination with other MHs • Upon reconnection, MH informs local MSS. • Local MSS contacts MSS with latest checkpoint • Local MSS contacts all MSS storing logs • All data transferred to local MSS via wired network and to MH via wireless link • MH rolls back and applies logs
Movement-Based Checkpointing and Logging • The performance of this scheme depends on identifying the optimal movement threshold Mper user and application. • Checkpoints and logs remain within acceptable range of the MH’s current location and eliminates the need for information consolidation. • Ensures acceptable recovery time since M bounds the number of MSSs’ from which logs must be retrieved.
Performance Analysis Analytic Model
SPN Model Parameters • Parameter Θk- Checkpoint rate of the MH • Parameter Θi- Recovery rate of the MH = inverse of recovery time • i - number of handoffs experienced by the MH since the last checkpoint and before failure.
Analytic Model – Recovery Time • Treq_rec - Time spent on recovery information requests • Nmss_logs – Number of MSSs storing logs • Dmss - average hop count between MSScp and MSSrec
Analytic Model – Recovery Time • Tckp_tx - Time spent on transmitting the latest checkpoint to the MH • Tlog_tx - Time spent on transmitting the logs to the MH • Trec - Time spent to rollback to the last checkpoint and apply the logs
Analytic Model – Cost of Recovery • Tr – Average Recovery time per failure • Fr – Recovery probability • Tc – Cost of recovery No. of checkpoints before failure No. of logs before failure
SPN Evaluation Parameters • Size of a log entry - 50B • Size of a checkpoint - 2000B • Bandwidth of wired network-2Mbps • Ratio of bandwidth of wireless to wired network (r) - 0.1 • Time required to apply a log entry (Telog) - 0.0001s • Time required to transmit a log entry through the wireless channel (Tlog_w) - 0.002s • Time required to transmit a checkpoint through the wireless channel (Tckp_w) - 0.08s
Performance Analysis Results and Conclusions
Determining Optimal Movement Threshold that Minimizes Recovery Cost Per Failure
Conclusion – Proposed Scheme • An efficient failure recovery scheme for mobile computing systems based on movement-based checkpointing and logging • Movement-based checkpointing and logging scheme takes a checkpoint only after the mobile node has made M movements (mobility handoffs). • The value of M is governed by the failure rate, log arrival rate, and the mobility rate of the application and MH. • Identify the optimal movement threshold M, when given the failure, mobility and log arrival rates, to minimize the cost of recovery per failure.
Conclusion – Practical Application • Build a table at configuration time covering possible parameter values of the mobility rate and failure rate of the MH and log arrival rate of the mobile applications, and listing the optimal M value that would minimize the recovery cost per failure. • At runtime, based on the measured rates, the optimal M may be selected dynamically to minimize the recovery cost per failure. • Optimal M selected must also satisfy the specified recovery probability when given an application deadline to recover from a failure.