30 likes | 279 Views
Real Time. And so on, leading to an infinite stream of mistakes!. Msg Send. Msg Recv. 4k action-clock ticks. Conclusion. Implementing ◊P. Failure Detectors. Measuring Time. Estimate on Round-trip time is k real-time ticks. Eventually Perfect Failure Detector. ….
E N D
Real Time And so on, leading to an infinite stream of mistakes! Msg Send Msg Recv 4k action-clock ticks Conclusion Implementing ◊P Failure Detectors Measuring Time Estimate on Round-trip time is k real-time ticks Eventually Perfect Failure Detector …. 2k action-clock ticks Timeout! False suspicion …. Msg Send Msg Recv New estimate on RTT is now 2k action ticks Timeout! False suspicion Process Speed 2k action-clock ticks New estimate on Round-trip time is now 2k real-time ticks (Process Speed ) Process Step Time k action-clock ticks …. Timeout! False suspicion Estimated bound on RTT - k action ticks De facto bound on Round-Trip Time (RTT) …. Msg Send Msg Recv k action-clock ticks Timeout! False suspicion And so on, leading to an infinite stream of mistakes! Real Time Real-time Clocks in Decelerating Environments Solving the Celeration Problem Bi-Chronal Timers in Non-Celerating Environments Slower processes Longer duration to generate and process messages Unbounded RTT (in real time) • Bi-chronal timer • A vectored composition of action timer and real-time timer. • Measures time in terms of actions as well as real-time. • All processes use separate local bi-chronal timers. • Timer expires only when both action timer and the real-time timer expire. • The action timer insulates ◊P from deceleration. • The real-time timer insulates ◊P from acceleration. • Bi-chronal clocks insulate ◊P from transient network behavior. • Hardware upgrades often accelerate process speeds • Action clocks precipitate ◊P mistakes during acceleration • Bi-chronal clocks are immune to acceleration • Multiple process crashes (in a server farm), DoS attacks, and such can decelerate processes to a crawl • Real-time clocks precipitate ◊P mistakes during deceleration • Bi-chronal clocks are immune to deceleration • Many existing ◊P implementations are subtly broken • Bi-chronal clocks provide a simple solution • Additionally, they insulate systems from transient behavior • Future work: • Properties and behavior of Bi-chronal clocks • Use of Bi-chronal clocks in other applications • Other approaches to dealing with Celeration Crash Fault Detection in Celerating Environments Distributed Systems Crash Detection and System Models A collection of autonomous computers (processes) connected through a communication network ◊P may make mistakes initially, but eventually provides perfect information • Asynchrony: Unbounded message delay and process speeds • Synchrony: Known bounds on message delay and process speeds • Partial Synchrony: Between synchrony and asynchrony • Failure detectors: Distributed system service to detect process crashes. • Failure detector provide (potentially) incorrect information. • Still powerful enough to solve important problems. • E.g., distributed consensus, leader election, wait-free scheduling, contention management. • Failure detector implementations often require partial synchrony. • One well known failure detector is ◊P, the eventually perfect failure detector. Fault Pattern 1 Live Crashed … Crash! ◊P outputs Crashed … Live Crashed … Asynchrony Permissive Model Crash Detection Impossible Synchrony Restrictive Model Crash Detection Possible Crash! Live Crashed … Fault Pattern 2 • But processes can crash! • Maintain correctness despite crashes • Fault tolerance through crash detection • Crash detection determined by synchronism in the system Live Partial Synchrony Crash Detection Possible Greater Fidelity to Real World Systems ◊P outputs Live Crashed Live Live Crashed Live Local Adaptive Estimation of RTT Action Clocks in Accelerating Environments Faster processes More action-clock ticks per RTT Action clock timer continually times out • Implementable under (some models of) partial synchrony. • Popular model: Unknown bounds on message delay () • and relative process speeds (). • Start timer with some arbitrary (small) value • If timer expires without receiving a message, suspect the process • If a message arrives after timer expiry, trust the process and increase the timer value. • Eventually timer value exceeds the bound on RTT. • After which correct processes will never be suspected. • Any crashed process is permanently suspected. • Two techniques: • Action clocks: Counting the number of actions • Real-time clocks: Independent device to measure time (e.g., hardware clocks, NTP). • Either technique works in environments that do NOT accelerate or decelerate arbitrarily • But in Celerating environments, where processes can accelerate or decelerate arbitrarily, each technique fails independently. Round Trip Time (RTT) = Outgoing message delay + message processing time + incoming message delay Incoming message delay ≤ ACK ≤ f() Outgoing message delay ≤ PING Local ◊P module But how do processes measure time? Ack generation Time ≤ f() RTT ≤ + f() + RTT is bounded above! This bound on RTT can be adaptively estimated.