290 likes | 539 Views
TOTEM: A FAULT-TOLERANT MULTICAST GROUP COMMUNICATION SYSTEM. L. E. Moser, P. M. Melliar Smith, D. A. Agarwal, B. K. Budhia C. A. Lingley-Papadopoulos University of California, Santa Barbara. INTRODUCTION. Totem provides reliable totally-ordered multicasting of messages over LANs
E N D
TOTEM: A FAULT-TOLERANT MULTICASTGROUP COMMUNICATION SYSTEM L. E. Moser, P. M. Melliar Smith,D. A. Agarwal, B. K. BudhiaC. A. Lingley-Papadopoulos University of California, Santa Barbara
INTRODUCTION • Totem provides reliable totally-ordered multicasting of messages over LANs • Intended for complex applications with critical requirements for • fault tolerance • real-time performance • Exploits hardware broadcast of most LANs
TOTEM SERVICES • Built as a hierarchy of protocols: Application layer Process group interface Multiple-ring protocol Single-ring protocol Physical medium
Single-ring protocol • Built on top of a best-effort multicast service, using UDP to exploit the hardware broadcasts of the LAN • Converts these multicasts into the service of reliable totally ordered delivery of messages on a single LAN • Also provides fault-detection, recovery and configuration change service
Multiple-ring protocol • Uses information from the process group interface above it • Provides total ordering of messages as well as network topology maintenance services
Process group interface • Delivers messages to the application processes in the appropriate process groups • Provides process group membership services.
Services provided by Totem • Two reliable totally ordered message delivery services: • Agreed delivery • Safe delivery • Both services deliver messages in a single system-wide total order that respects Lamport’s causal order
Agreed Delivery • Guarantees that a processor will not deliver a message before it has delivered all prior messages that: • Have been issued by processors in the current configuration and • Have time-stamps within the duration of that configuration • All processes receive all messages in theorder they were sent
Safe Delivery • Further guarantee that a processor will not deliver a message unless all processors in its configuration have received it (everyone or nobody). • All processes receive all messages in the same order at the same time
Why Lamport’s causal order? • Otherwise processes that belong to two or more groups could receive message from different groups in different order • A and B both in groups G and H • A receives m from group G then m’ from group H and finally m’’ from group G • B could receive m’ from group H then m from group G and finally m’’ from group G
Example Group G sends messages m and m’’to A and B Group H sends message m’to A and B Both A and B will receive m and m’’in the same order Without total ordering, A could receive m’ before m’’ and B could receive m’’ before m’
Delivery guarantees • Extended virtual synchrony ensures that these guarantees are honored within every configuration • When a fault occurs, Totem forms a transitional configuration with a reduced membership • Message order is guaranteed even in the presence of network partitions
Extended virtual synchrony (I) • We want to ensure that • Messages are received in the same order by all processes • All processes share the same view of the process group to which they belong
Extended virtual synchrony (II) • Virtual synchrony model (K. Birman, ISIS) orders group membership changes along with the regular messages • Ensures that failures do not result in • Incomplete delivery of multicast messages • Holes in the causal delivery order • Problems remain if network can partition
Extended virtual synchrony (III) • Extended virtual synchrony model (Totem) extends the virtual synchrony model to systems • Processes can fail and recover • Network can partition and remerge • Guarantees that same message sent to processes in two or more components of a partitioned network will be in a consistent order in all these components
Ordering of messages • Messages are born-ordered: • Each message includes a time-stamp • Relative order of messages is determined by the message themselves as created by their senders
The single-ring protocol (I) • Uses a circulating token containing among others: • A seq field with the sequence number of the last message that was sent • An aru field with the sequence number of the last message that has been received by all processors • Only the processor that holds the token can send a message
The single-ring protocol (II) • aru field used to implement safe delivery: • Tells processors which messages have been received by every processor in the ring • Token also provides information about the aggregate message backlog of the processors on the ring • Results in a fairer bandwidth allocation among processors than FDDI
Local membership protocol (I) • Part of the single-ring protocol • Allows • Inclusion of new or recovering processors • Deletion of faulty processors
Local membership protocol (II) • Ensures: • Consensus among all members of a configuration about the configuration membership • Terminationas each configuration will be installed on every processor within a bounded time or not at all.
The multiple-ring protocol (I) • Operates over several LANs linked by gateways • Each LAN is organized as a virtual token ring and managed by the single-ring protocol • Offers same services and same guarantees as single-ring protocol Ring A Ring B Ring C
The multiple-ring protocol (II) • Uses Lamport’s timestamps and delivers messages in timestamp order • When a gateway forwards a message from one ring to another, it gives to the message a new sequence number for the new ring • Processor faults and network partitions are detected by the single-ring protocol
Ring B C A Messages mB1, mB2 mC1 mA1, mA2, mA3 Message delivery (I) • Each processor maintains one recv_msgs list of messages received but not yet delivered for each ring from which it can receive messages
Message delivery (II) • A processor will deliver a message as an agreed message as soon as • Message has the lowest time stamp of all the messages in its recv_msgs lists • None of these lists is empty • We could wait forever for rings that have no messages to send but for guaranteed vector messages
Ring Messages (with Timestamps) A mA1(8), mA2(12), mA3(16) B mB1(9), mB2(13) C mC1(10) Example (I) • Consider the following recv_msgs list where the numbers in parentheses indicate the message timestamps.
Ring Messages (with Timestamps) A mA1(8), mA2(12), mA3(16) B mB1(9), mB2(13) C mC1(10) Example (II) • We can deliver messages mA!, mB1 and mC1
Ring Messages (with Timestamps) A mA2(12), mA3(16) B mB2(13) C --- Example (III) • We cannot deliver more messages because we might miss a message mCnfrom ring C with an earlier timestamp, say ts = 11.
Related issues • Totem sends from time to time guaranteed vector messages • They specify among other things, which rings have sent messages • Processor faults and network partitions are detected by the single-ring protocol • Configuration and topology change messages have timestamps and are delivered in strict timestamp order