TOTEM: A FAULT-TOLERANT MULTICAST GROUP COMMUNICATION SYSTEM

TOTEM: A FAULT-TOLERANT MULTICASTGROUP COMMUNICATION SYSTEM L. E. Moser, P. M. Melliar Smith,D. A. Agarwal, B. K. BudhiaC. A. Lingley-Papadopoulos University of California, Santa Barbara

INTRODUCTION • Totem provides reliable totally-ordered multicasting of messages over LANs • Intended for complex applications with critical requirements for • fault tolerance • real-time performance • Exploits hardware broadcast of most LANs

TOTEM SERVICES • Built as a hierarchy of protocols: Application layer Process group interface Multiple-ring protocol Single-ring protocol Physical medium

Single-ring protocol • Built on top of a best-effort multicast service, using UDP to exploit the hardware broadcasts of the LAN • Converts these multicasts into the service of reliable totally ordered delivery of messages on a single LAN • Also provides fault-detection, recovery and configuration change service

Multiple-ring protocol • Uses information from the process group interface above it • Provides total ordering of messages as well as network topology maintenance services

Process group interface • Delivers messages to the application processes in the appropriate process groups • Provides process group membership services.

Services provided by Totem • Two reliable totally ordered message delivery services: • Agreed delivery • Safe delivery • Both services deliver messages in a single system-wide total order that respects Lamport’s causal order

Agreed Delivery • Guarantees that a processor will not deliver a message before it has delivered all prior messages that: • Have been issued by processors in the current configuration and • Have time-stamps within the duration of that configuration • All processes receive all messages in theorder they were sent

Safe Delivery • Further guarantee that a processor will not deliver a message unless all processors in its configuration have received it (everyone or nobody). • All processes receive all messages in the same order at the same time

Why Lamport’s causal order? • Otherwise processes that belong to two or more groups could receive message from different groups in different order • A and B both in groups G and H • A receives m from group G then m’ from group H and finally m’’ from group G • B could receive m’ from group H then m from group G and finally m’’ from group G

Example Group G sends messages m and m’’to A and B Group H sends message m’to A and B Both A and B will receive m and m’’in the same order Without total ordering, A could receive m’ before m’’ and B could receive m’’ before m’

Delivery guarantees • Extended virtual synchrony ensures that these guarantees are honored within every configuration • When a fault occurs, Totem forms a transitional configuration with a reduced membership • Message order is guaranteed even in the presence of network partitions

Extended virtual synchrony (I) • We want to ensure that • Messages are received in the same order by all processes • All processes share the same view of the process group to which they belong

Extended virtual synchrony (II) • Virtual synchrony model (K. Birman, ISIS) orders group membership changes along with the regular messages • Ensures that failures do not result in • Incomplete delivery of multicast messages • Holes in the causal delivery order • Problems remain if network can partition

Extended virtual synchrony (III) • Extended virtual synchrony model (Totem) extends the virtual synchrony model to systems • Processes can fail and recover • Network can partition and remerge • Guarantees that same message sent to processes in two or more components of a partitioned network will be in a consistent order in all these components

Ordering of messages • Messages are born-ordered: • Each message includes a time-stamp • Relative order of messages is determined by the message themselves as created by their senders

The single-ring protocol (I) • Uses a circulating token containing among others: • A seq field with the sequence number of the last message that was sent • An aru field with the sequence number of the last message that has been received by all processors • Only the processor that holds the token can send a message

The single-ring protocol (II) • aru field used to implement safe delivery: • Tells processors which messages have been received by every processor in the ring • Token also provides information about the aggregate message backlog of the processors on the ring • Results in a fairer bandwidth allocation among processors than FDDI

Local membership protocol (I) • Part of the single-ring protocol • Allows • Inclusion of new or recovering processors • Deletion of faulty processors

Local membership protocol (II) • Ensures: • Consensus among all members of a configuration about the configuration membership • Terminationas each configuration will be installed on every processor within a bounded time or not at all.

The multiple-ring protocol (I) • Operates over several LANs linked by gateways • Each LAN is organized as a virtual token ring and managed by the single-ring protocol • Offers same services and same guarantees as single-ring protocol Ring A Ring B Ring C

The multiple-ring protocol (II) • Uses Lamport’s timestamps and delivers messages in timestamp order • When a gateway forwards a message from one ring to another, it gives to the message a new sequence number for the new ring • Processor faults and network partitions are detected by the single-ring protocol

Ring B C A Messages mB1, mB2 mC1 mA1, mA2, mA3 Message delivery (I) • Each processor maintains one recv_msgs list of messages received but not yet delivered for each ring from which it can receive messages

Message delivery (II) • A processor will deliver a message as an agreed message as soon as • Message has the lowest time stamp of all the messages in its recv_msgs lists • None of these lists is empty • We could wait forever for rings that have no messages to send but for guaranteed vector messages

Ring Messages (with Timestamps) A mA1(8), mA2(12), mA3(16) B mB1(9), mB2(13) C mC1(10) Example (I) • Consider the following recv_msgs list where the numbers in parentheses indicate the message timestamps.

Ring Messages (with Timestamps) A mA1(8), mA2(12), mA3(16) B mB1(9), mB2(13) C mC1(10) Example (II) • We can deliver messages mA!, mB1 and mC1

Ring Messages (with Timestamps) A mA2(12), mA3(16) B mB2(13) C --- Example (III) • We cannot deliver more messages because we might miss a message mCnfrom ring C with an earlier timestamp, say ts = 11.

Related issues • Totem sends from time to time guaranteed vector messages • They specify among other things, which rings have sent messages • Processor faults and network partitions are detected by the single-ring protocol • Configuration and topology change messages have timestamps and are delivered in strict timestamp order

TOTEM: A FAULT-TOLERANT MULTICAST GROUP COMMUNICATION SYSTEM

TOTEM: A FAULT-TOLERANT MULTICAST GROUP COMMUNICATION SYSTEM

Presentation Transcript

Chapter 1 : Introduction to Electronic Communications System

IP-Multicast (outline)

Fault Tolerance in Distributed Systems

Fault Tolerance

Computing in the

Client-Server and Multicast Communication

Fault-Tolerance: Practice Chapter 7

Fault Tolerance

Organizational Communication

Deploying Interdomain IP Multicast

Hazard Communication (HazComm2012) and the Globally Harmonized System (GHS )

IP-Multicast (outline)

Border Gateway Protocol, Redistribution, and IP Multicast

CS 347: Parallel and Distributed Data Management Notes X: S4

Computer Networks (Graduate level)

IP multicast and multimedia

IP Multicasting: Explaining Multicast

Dave Angell Idaho Power 21st Annual Hands-On Relay School

Measurements of forward physics in the TOTEM experiment at the LHC

2. Communication in Distributed Systems