300 likes | 311 Views
This lecture discusses group communication systems, focusing on reliable and ordered multicast. It covers types of total ordering, services provided by GCS, and approaches to total ordering.
E N D
EEC 688/788Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org EEC688/788: Secure & Dependable Computing
Outline Upcoming schedule This Thursday 10/16 IEEE event at LCCC (no lecture) Next Tuesday 10/21: spread toolkit lab Next Thursday 10/23: lecture 11 Group communication systems Reliable, ordered multicast Types of total ordering GCS services How to implement GCS EEC688/788: Secure & Dependable Computing
Group Communication System Services provided by the GCS Membership service: who is up and who is down Deals with failure detection and more Reliable, totally ordered, multicast service Virtual synchrony service Virtual synchrony synchronizes membership change with multicasts GCS makes the implementation of state machine replication much easier EEC688/788: Secure & Dependable Computing
Main Approaches to Total Ordering Sequencer based: One of the nodes in the membership is designated the task of assigning a global sequence number (representing the total order) of each application message Fixed sequencer Rotating sequencer Sender based: the nodes in the membership take turn to multicast=> all multicast msgsare naturally totally ordered Use a virtual token to be passed around the nodes Vector clock based: The causal relationship among different messages can be captured using vector clocks Each message that is multicast is piggybacked with a vector timestamp EEC688/788: Secure & Dependable Computing
System Model An asynchronous system with N nodes that communicate with each other directly by sending and receiving messages Anode may become faulty and stop participating the group communication protocol (i.e., a fail-stop fault model is used) A failed node might recover. However, it must rejoin the system via a membership change protocol We assume a closed, single group system: foreign msgs are ignored EEC688/788: Secure & Dependable Computing
Protocol Design A group communication system must define two protocols: One for normal operation when all nodes in the current membership can communicate with each other in a timely fashion The other for membership change when one or more nodes are suspected as failed, or when the failed nodes are restarted These protocols work together to ensure the safety properties and the livenessproperty of the group communication system EEC688/788: Secure & Dependable Computing
Protocol Design Liveness: a nonfaultynode multicasts a message, it will eventually be delivered in a total order at other nodes For a message loss, it is addressed by retransmission Node failures, extended delay in processing, and message propagations, are addressed by membership reconfigurations (i.e., view changes) EEC688/788: Secure & Dependable Computing
Two Types of Total Ordering Uniform total ordering Given any msg that is broadcast, if it is delivered by a node according to some total order, it is delivered in every other node in the same total order unless the node has failed Nonuniform total ordering Given a set of messages that have been broadcast and totally ordered, no node delivers any of them out of the total order. However, there is no guarantee that if a node delivers a message, then all other nodes deliver the same message. EEC688/788: Secure & Dependable Computing
Example EEC688/788: Secure & Dependable Computing
Implementing Total Ordering Use a sequencer to order all multicast Sequencer determines the order for the message Each sender can take turn to serve as the sequencer (rotating sequencer) Use a token that moves around Token has a sequence number Sender determines the total order: when you hold the token you can send the next burst of multicasts Use vector clocks Each process maintains a vector clock Each msg is piggybacked with a vector timestamp EEC688/788: Secure & Dependable Computing
Sequencer Based GCS First practical solution for GCS A system is structured into a combination of two subsystems Multiple senders with a single receiver A single sender with multiple receivers The single receiver and single sender are collocated at the same node => all msgs are funneled through this node, i.e., sequencer EEC688/788: Secure & Dependable Computing
Sequencer Based GCS The sequencer is responsible to assign a global sequence number to each message funneled Each node deliver a msg if it has received and delivered all msgs with smaller sequence numbers Sequencer: a single point of failure Rotating sequencer: overcoming single point of failure Assume up to f nodes could fail, total number of nodes N > 2f Each node takes turn to act as a sequencer (e..g, one msg at a time) A node does not deliver a msg until it receives f+1 sequencing msgs Achieves fault tolerance as well as uniform total ordering EEC688/788: Secure & Dependable Computing
Rotating Sequencer: Data Structure View number v, list of node ids in the current view Each node has a rank: it knows when it should take over as the next sequencer A local sequence number vector M[], each element representing the expected local seq # for the corresponding node: for reliable delivery M[i] refers to the expected local seq# carried by the next msg sent by node i Init each element to 0 Expected global seq# s carried in the next sequencing msg sent by the sequencer node: for total ordering EEC688/788: Secure & Dependable Computing
Rotating Sequencer: Normal Operation Transmitting phase A node i broadcasts a msg, B(v,i,n), to all nodes n: local seq#, initial 0, incremented for each msg broadcast => reliable broadcast Waits for a sequencing msg for the broadcast msg A node j accepts a msg B(v,i,n) if it is in the same view and buffer it Sequencing phase Committing phase EEC688/788: Secure & Dependable Computing
Rotating Sequencer: Normal Operation Sequencing phase When the sequencer receives a broadcast msg B(v,i,n) It verifies that it is the next expected msg from node i, M[i] = n Assigns the current global seq# s to B(v,i,n) Broadcasts a sequencing msg: SEQ(s,v,[i,n]) When a node j receives SEQ(s,v,[i,n]), it accepts it provided S is the expected global seq# It has B(v,i,n) in its buffer, otherwise, request retransmission Updates its data structures: Increment expected global seq# Increment expected local seq# SEQ(s,v,[i,n]) also serves as positive ack for broadcast msg B(v,i,n) EEC688/788: Secure & Dependable Computing
Rotating Sequencer: Normal Operation Committing phase A node does not deliver a broadcast msg B(v,i,n) until it receives SEQ(s,v,[i,n]) and f subsequent SEQ msgs Ensuring uniform total ordering Even if f nodes failed, at least one node would have received both B(v,i,n) and SEQ(s,v,[i,n]) This node ensures that the message is delivered at other nodes in the same total order How to transfer the sequencer role The transfer of the sequencer role can be achieved implicitly by the sending of a new sequencing message The next node i assumes the sequencer role when it receives a SEQ(s) msg and the following conditions are met (s+1)%N=i It has received all previous SEQ msgs and B msgs What if no one broadcasts B msgs, sequencer sends null SEQ msgs EEC688/788: Secure & Dependable Computing
Normal Operation: Example N=5, f=1 Can a node delivers B as soon as it receives the corresponding SEQ msg? When B(v,4,20) will be delivered? EEC688/788: Secure & Dependable Computing
Rotating Sequencer: Membership Change A membership change is triggered by The detection of a failure. A node fails to receive the corresponding SEQ msg for its B msg => sequencer failed The recovery of a failed node. When a node recovers from a failure, it tries to rejoin the membership Objective of membership change protocol Only one valid membership view can be formed by the system If a B msg is committed at some nodes in a view, then all nodes in the new view must commit B in the same total order EEC688/788: Secure & Dependable Computing
Rotating Sequencer: Membership Change Operates in three phases Phase I: The node that detected a failure (originator) set new view# = v+1, and broadcasts an invitation msg Invitation msg carries the new view# A node accepts the invitation and ack it provided that It has not accepted an invitation for a competing view Note: a node joins at most one membership view at a time The ack carries the node’s current view# and the next expected global seq# EEC688/788: Secure & Dependable Computing
Rotating Sequencer: Membership Change, Phase II The originator keeps collecting acks until Either it has received ack from every node in the new membership, or It has collected at least N-f acks and a timeout occurred (for liveness) If all acks are positive, the originator proceeds to building a node list for the new view and broadcast it The originator also learns the msg ordering history of previous view Highest global seq#: smax Originator’s expected global seq#: s0 If smax > s0, the originator is missing msgs Smax ≥ than that of the last msg committed in previous view Request retransmission Use smax as starting global seq# for new view provided that it can receive all missing msgs, otherwise, use largest s with the corresponding B received If negative responses received, abort and retry A node aborts when (1) receives an abort msg from originator, or (2) it times out membership change EEC688/788: Secure & Dependable Computing
Rotating Sequencer: Membership Change, Phase III The originator collects responses to its new membership view msg If receives positive responses from every node in new view, commits to the new view Otherwise, abort, waits for a random amount time, and retry with a larger view number EEC688/788: Secure & Dependable Computing
Rotating Sequencer: Membership Change Examples Competing originators EEC688/788: Secure & Dependable Computing
Rotating Sequencer: Membership Change Examples Premature timeout EEC688/788: Secure & Dependable Computing
Rotating Sequencer: Membership Change Examples Network partitioning EEC688/788: Secure & Dependable Computing
Token Based GCS: Totem Totem consists of: Total ordering protocol Membership protocol Recovery protocol Flow control mechanisms Total ordering msg delivery types Safe delivery: a message is delivered only when all correct processes have received it => uniform total ordering Agreed delivery: a message is delivered as long as it is the next message in total order => nonuniform total ordering EEC688/788: Secure & Dependable Computing
FSM for Totem EEC688/788: Secure & Dependable Computing
Totem Membership Protocol EEC688/788: Secure & Dependable Computing
Totem Membership Protocol: the case of 2 concurrent memberships EEC688/788: Secure & Dependable Computing
Totem Recovery Protocol EEC688/788: Secure & Dependable Computing
Totem Recovery Protocol: Network Partition Scenario EEC688/788: Secure & Dependable Computing