160 likes | 176 Views
Learn how to specify and reason about network protocols, including state-based specifications, correctness reasoning, safety, progress, invariants, fault-tolerance, and self-stabilization. Explore issues in distributed computing and a self-stabilizing leader election protocol example.
E N D
Specifying and reasoning about network protocols Vinod Kulathumani West Virginia University
Outline • Why we needs tools for specifying network protocols • State-based specification • Guarded commands • Reasoning about correctness • Safety • Progress • Invariants • Fault-tolerance and self-stabilization
Network protocols • Constituted by interaction of set of processes • Networks are closed • Communicate by messages • Can be specified • In English • Using timing diagrams • C [very specific to implementation] • State based language
Issues in distributed computing • Absence of shared clock • Absence of shared memory • Handling faults • Hard to even detect failures • Handling concurrency • Handling scale
State based specification • Process process-name • Variables var • Actions • Guard 1-> action1 • Guard2 -> action 2 • Actions guarded by predicates • Any action enabled can be picked non-deterministically • Fairness assumptions may be made
Reasoning • Invariant • A state predicate that continues to hold when the actions are executed in any process in any order • Proving safety • Find an invariant that satisfies “correctness” and show that it holds for the protocol • Fixed point • A predicate that belongs to invariant and is a terminating condition [no more actions are enabled] • Proving progress • Show that program eventually terminates • Sometimes shown using a “variant” function
Reasoning • Fault model • A set of faults that can lead to program moving out of “invariant” states • Example, node fault, network faults • Self-stabilization • Show that irrespective of initial condition, protocol converges to “invariant” • If no more faults, protocol stays within invariant
An example program Self-stabilizing leader election Given a set of motes within a 2 hop neighborhood, write a distributed program which ensures that a unique leader is appointed. This leader should have its leds on. The program should stabilize when nodes (including the leader) are added or removed. Programs that use node ids are not recommended as they will re-elect a leader every time a new node is added.
Leader election protocol [2 hops] • Process j • Variables: • Status [either idle,candidate, leader, follower] • Cluster_id [id of leader] • Actions • Timeout I(j.idle) -> • j.status = candidate; bcast [cand_msg(j)] • Timeout II((j.follower or j.leader) ^ recv[cand_msg(i)] )-> • bcast[conflict_msg(j, j.cluster_id)] • Timeout III(j.candidate) -> • j.status=leader; j.cluster_id=j; bcast[leader_msg(j)] • (J.idle or J.candidate) ^ recv[conflic_msg(i, m)] -> • J.status=follower; j.cluster_id = m; • (J.idle or J.candidate) ^ recv[leader_msg(i)] -> • J.status = follower; j.cluster_id = i;
Analysis • Conditions to guarantee atomicity of election • Timeout I is a random timer chosen from [0 .. T] and T should be large enough so that multiple nodes do not become candidates at same instant • Timeout III is a fixed value T3 • Conditions to avoid collisions of conflict messages • Timeout II is a random timer chosen from [0 .. T2] where T2 is large enough (depending on size of neighborhood) • T3 should be large enough to ensure that even if messages collide, at least one will reach the candidate
Analysis • Invariant G [proof of safety] • (j.idle or j.candidate) Ξ j.clusterid = null • J.leader Ξ j.clusterid = j • J.follower ^ j.clusterid = k => k.leader • K.leader ^ j is nbr of k => j.follower ^ j.clusterid = k • Fixed point?
Analysis • Proof of progress • Action 1 enabled within T time • If cluster already had leader, node assigned leader within T3 time by action 2 and action 4 • If cluster has no leader, and no conflict received, self elect as leader within T3 time
Analysis • Fault model • What if clusterhead dies? • What if new node joins • And leader is set to something else! • What if there are two leaders by slightest of chances that atomicity fails [become candidate at the same instant] • Additional stabilizing actions needed
Analysis • Stabilizing actions • Timeout (j.leader) -> • Bcast (j.clusterhead_msg(j)) • J.follower and timeout (rcv_clusterhead_msg) -> • J.status = idle; • Bcast (demote_msg(j.leader)) ; j.leader = null; • Receive (demote(m)) ^ j.leader=m -> • J.status = idle; j.leader=null; • J.follower ^ rcv(clusterhead_msg(i)) ^ j.leader != i -> • J.status = idle; • Bcast (demote_msg(j.leader)) ; j.leader = null;
Ping pong program analysis • Process Ping-pong at node j • Variables: • id [either server, returner] • count • Actions • Booted -> • If (id==server) start hit_timer; Turn Led on • Timeout (hit_timer)-> • Turn Led Off; count++; • If (count<5000) Send (Ping ball message, count) • If (count==5000) Turn Green led on; • Receive (ping ball message) -> • Turn led on; copy count; start hit_timer;
Ping pong program analysis • Invariant