1 / 1

Crash Fault Detection in Celerating Environments

Real Time. And so on, leading to an infinite stream of mistakes!. Msg Send. Msg Recv. 4k action-clock ticks. Conclusion. Implementing ◊P. Failure Detectors. Measuring Time. Estimate on Round-trip time is k real-time ticks. Eventually Perfect Failure Detector. ….

auryon
Download Presentation

Crash Fault Detection in Celerating Environments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Real Time And so on, leading to an infinite stream of mistakes! Msg Send Msg Recv 4k action-clock ticks Conclusion Implementing ◊P Failure Detectors Measuring Time Estimate on Round-trip time is k real-time ticks Eventually Perfect Failure Detector …. 2k action-clock ticks Timeout! False suspicion …. Msg Send Msg Recv New estimate on RTT is now 2k action ticks Timeout! False suspicion Process Speed 2k action-clock ticks New estimate on Round-trip time is now 2k real-time ticks (Process Speed ) Process Step Time k action-clock ticks …. Timeout! False suspicion Estimated bound on RTT - k action ticks De facto bound on Round-Trip Time (RTT) …. Msg Send Msg Recv k action-clock ticks Timeout! False suspicion And so on, leading to an infinite stream of mistakes! Real Time Real-time Clocks in Decelerating Environments Solving the Celeration Problem Bi-Chronal Timers in Non-Celerating Environments Slower processes  Longer duration to generate and process messages  Unbounded RTT (in real time) • Bi-chronal timer • A vectored composition of action timer and real-time timer. • Measures time in terms of actions as well as real-time. • All processes use separate local bi-chronal timers. • Timer expires only when both action timer and the real-time timer expire. • The action timer insulates ◊P from deceleration. • The real-time timer insulates ◊P from acceleration. • Bi-chronal clocks insulate ◊P from transient network behavior. • Hardware upgrades often accelerate process speeds • Action clocks precipitate ◊P mistakes during acceleration • Bi-chronal clocks are immune to acceleration • Multiple process crashes (in a server farm), DoS attacks, and such can decelerate processes to a crawl • Real-time clocks precipitate ◊P mistakes during deceleration • Bi-chronal clocks are immune to deceleration • Many existing ◊P implementations are subtly broken • Bi-chronal clocks provide a simple solution • Additionally, they insulate systems from transient behavior • Future work: • Properties and behavior of Bi-chronal clocks • Use of Bi-chronal clocks in other applications • Other approaches to dealing with Celeration Crash Fault Detection in Celerating Environments Distributed Systems Crash Detection and System Models A collection of autonomous computers (processes) connected through a communication network ◊P may make mistakes initially, but eventually provides perfect information • Asynchrony: Unbounded message delay and process speeds • Synchrony: Known bounds on message delay and process speeds • Partial Synchrony: Between synchrony and asynchrony • Failure detectors: Distributed system service to detect process crashes. • Failure detector provide (potentially) incorrect information. • Still powerful enough to solve important problems. • E.g., distributed consensus, leader election, wait-free scheduling, contention management. • Failure detector implementations often require partial synchrony. • One well known failure detector is ◊P, the eventually perfect failure detector. Fault Pattern 1 Live Crashed … Crash! ◊P outputs Crashed … Live Crashed … Asynchrony Permissive Model Crash Detection Impossible Synchrony Restrictive Model Crash Detection Possible Crash! Live Crashed … Fault Pattern 2 • But processes can crash! • Maintain correctness despite crashes • Fault tolerance through crash detection • Crash detection determined by synchronism in the system Live Partial Synchrony Crash Detection Possible Greater Fidelity to Real World Systems ◊P outputs Live Crashed Live Live Crashed Live Local Adaptive Estimation of RTT Action Clocks in Accelerating Environments Faster processes  More action-clock ticks per RTT  Action clock timer continually times out • Implementable under (some models of) partial synchrony. • Popular model: Unknown bounds on message delay () • and relative process speeds (). • Start timer with some arbitrary (small) value • If timer expires without receiving a message, suspect the process • If a message arrives after timer expiry, trust the process and increase the timer value. • Eventually timer value exceeds the bound on RTT. • After which correct processes will never be suspected. • Any crashed process is permanently suspected. • Two techniques: • Action clocks: Counting the number of actions • Real-time clocks: Independent device to measure time (e.g., hardware clocks, NTP). • Either technique works in environments that do NOT accelerate or decelerate arbitrarily • But in Celerating environments, where processes can accelerate or decelerate arbitrarily, each technique fails independently. Round Trip Time (RTT) = Outgoing message delay + message processing time + incoming message delay Incoming message delay ≤ ACK ≤ f() Outgoing message delay ≤  PING Local ◊P module But how do processes measure time? Ack generation Time ≤ f() RTT ≤  + f() +   RTT is bounded above! This bound on RTT can be adaptively estimated.

More Related