310 likes | 445 Views
Linear Time Byzantine Self-Stabilizing Clock Synchronization. Ariel Daliot 1 , Danny Dolev 1 , Hanna Parnas 2 1 School of Engineering and Computer Science, 2 Department of Neurobiology and the Otto Loewi Center for Cellular and Molecular Neurobiology, The Hebrew University of Jerusalem, Israel
E N D
Linear Time Byzantine Self-Stabilizing Clock Synchronization Ariel Daliot1, Danny Dolev1, Hanna Parnas2 1School of Engineering and Computer Science, 2Department of Neurobiology and the Otto Loewi Center for Cellular and Molecular Neurobiology, The Hebrew University of Jerusalem, Israel This research is supported in part by Intel COMM Grant - Internet Network/Transport Layer & QoS Environment (IXA)
Lecture Outline • What is “Pulse Synchronization” • Examples of pulse synchronization in nature • A biologically inspired pulse synchronization algorithm for distributed computer networks • Efficient Byzantine Self-Stabilizing clock synchronization above pulse synchronization
The target is to synchronize pulses from any state and any faults cycle .....|.............|..................|.....................|...................|.... ……...|.............|..................|.....................|..............|.......... .......|.............|..................|.....................|..................|..... t ……………......|.............|..................|.....................|....................... …......|.............|..................|.....................|................|........ …………….|.............|..................|.....................|.....|................... .……......|.............|..................|.....................|...........|............. .....||||||........||.....|||......||......||......|.......||.||.||.....|......||.......…… …….....|.............|||.||.||.||||...............|.......|||||||||||||||||||||...||||.||||…... Arbitrary state Synchronized state
Convergence: Starting from an arbitrary state s, the system reaches a synchronized state in finite time Closure: If s is a synchronized state of the system at real-time t0 then real-timet≥ t0 : The system state at time t is a synchronized state «Linear Envelope», for every correct node p:a[t-t0] + b ψp(t, t0) g[ t-t0] + h Ψp(t1,t2)is is the number of pulses a correct node pi invoked during a real time interval [t1,t2] within which pi was continuously correct Self-Stabilizing “Pulse Synchronization”
Fault Models • Many problems trivial with no faults, some unsolvable with a single fault (E.g. Byzantine Generals) • Common fault models: Crash/Link/Message faults • Byzantine failures (“malicious” faults) • Usually proven to require n>3fto tolerate f faults • Not solvable for some problems • Transient faults (system in arbitrary state or total chaos) • Requires Self-Stabilizing algorithms in order to overcome • Not solvable for some problems (Clock Synchronization)
Self-Stabilization • Addresses the situation when ALL nodes can concurrently be faulty for a limited period of time • A SS algorithm realizes its task once the system is back within the assumption boundaries • Is orthogonal to Byzantine failures, i.e. these are uncorrelated fault models • Byzantine algorithms typically focus on limiting the influence of faulty nodes once the task has been realized • Self-stabilizing algorithms focus on realizing the task following a “catastrophic” state
Synchrony phenomena in biology • The phenomenon of synchronization is displayed by many biological systems • Synchronized flashing of the male malaccae fireflies • Oscillations of the neurons in the circadian pacemaker, determining the day-night rhythm • Crickets that chirp in unison • Coordinated mass spawning in corals • Audience clapping together after a “good” performance • We were inspired by the pacemaker network in the cardiac ganglion of lobsters
The phenomenon of synchronization is displayed by many biological systems Synchronized flashing of the male malaccae fireflies Oscillations of the neurons in the circadian pacemaker, determining the day-night rhythm Crickets that chirp in unison Coordinated mass spawning in corals Audience clapping together after a “good” performance We were inspired by the pacemaker network in the cardiac ganglion of lobsters Synchrony phenomena in biology
|..|.. |.|.||. motor neurons |..|.. |..|.. |..|.. Cardiac ganglion of the lobster (Sivan, Dolev & Parnas, 2000) • Four interneurons tightly synchronize their pulses in order to give the heart its optimal pulse rate (though one is enough for activation) • Able to adjust the synchronized firing pace, up to a certain bound (e.g. while escaping a predator)
A related problem – real-time-Clock Synchronization (rCS) There exists γ, t0 , ν, a and b such thatt≥t0: • Agreement. For any correct nodes p, q|Cp(t) - Cq(t)| ≤ γ, (precision) • Validity. For every correct node p(1+ν)-1t +a ≤ Cp(t) ≤ (1+ν)t + b, (accuracy)Optimal precision is d.(1-1/n)Optimal accuracy is ν =
real-time-Clock Synchronization • rCS has two additional constraints over pulse synchronization: • The pulses have labels (“the time”) • The time needs to approximate real time • Most Byzantine rCS use the following principles: • At every time the computers exchange clock values • They operate some function on the received values (which seeks to neutralize the effect of the Byzantine values and set the clocks close to each other)
real-time-Clock Synchronization impossibility result with no external time source => This works only if the clocks initially have close values => Which implies rCS cannot be solved when all clocks hold arbitrary times =>Which means there is no self-stabilizing algorithm for rCS I.e. if the clocks are initially far apart they cannot both synchronizeANDestimate real time => Internal rCS assume clocks are initially synched
FAB8 AIT WSR - ww16-17/2003 Main outages andissues: …. synchronize problem in VAX - On WW16.5 - Impact: 8 CW SC's unable to introduce lots for a period of 4 hr and 15 min.( from 23:00 until 03:15). Root cause: - the job which synchronize the time between the VAXes failed on 22:00 (Thursday Night) and created gaps between the machines clocks. This gap caused the remotes which worked with CW* to get to loop status, with error message of FCM message. Solution: Time synchronized. Helpdesk will get alerts when this job will fail again.
A Distributed System according to Lamport “A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.”
Applicability of logical clocks • Many algorithms depending on clock synchronization actually only require synchronized logical clocks • E.g.: TDMA, Kerberos tickets, DHCP leases, global snapshots, data base time stamps and many others • Why then not use a self-stabilizing Byzantine clock synchronization algorithm that synchronizes logical time?
Self-Stabilizing Byzantine Clock Synchronization • Because the known previously best self-stabilizingByzantine clock synchronization algorithm converges in expected (n-f).n6(n-f) time! (Dolev-Welch, 95) • The difficulty lies in the fact that: • the initial clock values can differ arbitrarily • there is no agreed time for exchanging the values and setting the clock according to the values received • the clocks can wrap around
Clock Synchronization using synchronized pulses • We assume no outside source of real-time • At every pulse exchange clock values and operate some clock adjustment function on the received multiset • If clocks were initially close to real time then they will stay close to real time • If not then the clocks will proceed synchronously close to logical time This scheme yields a Byzantine self-stabilizing clock synchronization algorithm with convergence time, accuracy and precision on the order of existing rCS
The Byzantine Self-Stabilizing Clock Synchronization Algorithm At “pulse” event Begin Clock := ET; Wait for every correct node to invoke a pulse; ET := SS-Strong-Byz-Agreement(ET + cycle mod M); End ET - Expected Time of next pulse Cycle - Expected elapsed logical time until next pulse M - Bound on clock value
The Byzantine Self-Stabilizing Pulse Synchronization Algorithm if (cycle_countdown = 0) then send “Propose-Pulse” message to all; if (received f+1 distinct “Propose-Pulse” messages) thensend “propose-Pulse” message to all; if (received n-f distinct “Propose-Pulse” messages) then invoke “pulse” event; cycle_countdown := cycle; flush “Propose-Pulse” message counter; ignore “Propose-Pulse” messages for 2d(1+) time;
The Self-Stabilizing Byzantine Strong Agreement Algorithm • Any Strong Byzantine Agreement algorithm can be used • Agreement and validity is not ensured until the pulses synchronize • Self-stabilization is supported by counting recovering nodes as correct only following cycle+time-for-BA of correct behavior • We use a slightly modified version of the Toueg, Perry and Srikanth (1987) Strong Byzantine Agreement algorithm • It has the advantage of “early stopping”: if all correct nodes start with identical values then termination is within 2 rounds • Hence, during continuous correct system behavior clock synchronization is maintained with very little overhead
The teaching of Pythagoras • “Evolution is the law of life” • “Unity is the law of God” • “Number is the law of the universe”
? Questions? ? ? ? ? ?
Related Problems • Digital Clock Synchronization • Agreement on pulse counters, with or without a global pulse • Clock Synchronization • Common notion of real time, high precision and accuracy • Phase Clocks • Agreement on pulse counters in asynchronous settings • Synchronized Rates • Clocks progress at approximately the same rate, the times may differ • Firing Squad • All nodes enter the same state in step k after a process has initialized fire • Pulse Synchronization • Precise synchronization of regular pulses, slack linear envelope accuracy