1 / 79

Membership and Clique Avoidance in TTP/C

Membership and Clique Avoidance in TTP/C. Gunther Bauer, Michael Paulitsch Presented by Michael Sirivianos 02/01/2005. Overview. Membership in hard Real Time systems. What is it and why? Objectives TTP/C Overview Group membership. Clique Avoidance and Implicit Acks

zena
Download Presentation

Membership and Clique Avoidance in TTP/C

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Membership and Clique Avoidance in TTP/C Gunther Bauer, Michael Paulitsch Presented by Michael Sirivianos 02/01/2005

  2. Overview • Membership in hard Real Time systems. What is it and why? • Objectives • TTP/C Overview • Group membership. • Clique Avoidance and Implicit Acks • Cluster Model-Fault Model • General Properties • Analysis • Conclusions

  3. What is a RT Membership Service? • Safety critical RT systems use a bus system for communication. • A class C system offers the required FT. • A membership service gives timely and consistent info on the state of all nodes.

  4. Why do we need it? • Membership service • establishes replica-deterministic agreement on all messages. • Prevents clique formation and certain classes of arbitrary faults • Allows global knowledge thus consistent and timely reaction to faults. • Membership is a critical function for the correct operation of the communication system. Should be placed below the app. Layer within the TTP layer.

  5. TTP/C Overview • Services: • Message transport at specific time instances, with minimal jitter. • Fault-tolerant clock synchronization • Fault-tolerant membership management. • TDMA media access • Not necessarily equal sized time slots. • MEssage Description List contains TDMA schedule and groups several rounds of TDMA in cluster rounds. Statically assigned to all nodes.

  6. TTP/C Overview, cont. • State of the distributed system (C-state). It comprises of: • Membership • The global time last frame B/C started. • Number of current TDMA slot • I (protocol state info) and X (protocol+app. data info) frames periodically transmit and carry C-state. • N (app. data info) frames. Determining consistency of C-state, by calculating CRC over both app. data and C-state.

  7. TTP/C Overview, cont. • A node in the cluster, which is included in the schedule but has been inactive, can be integrated using global time and C-state info from the I/X frames.

  8. Application software in Host Host Layer FTU CNI FTU Layer FTU Membership Basic CNI RM Layer Redundancy Management SRU Membership Clock Synchronization SRU Layer Data Link/Physical Layer Media Access: TDMA TTP Protocol Stack

  9. TTP Protocol Stack (cont.) • Data Link/Physical Layer • Provide the means to exchange frames between the nodes • SRU Layer • Store the data fields of the received frames • RM Layer • Provide the mechanisms for the cold start of a TTP/C cluster • FTU Layer • Group two or more nodes into FTUs • Host Layer • Provide the application software • Basic CNI • A data-sharing interface between the RM layer and FTU layer • FTU CNI • The interface between FTU layer and Host Layer

  10. Structure of a TTP/C Based System

  11. Timeline in TTP/C • TDMA Cycle • One FTU sends message twice • The pattern is repeated when TDMA round ends • Cluster Cycle • Cluster cycle involves scheduling all possible messages and tasks

  12. TTP/C Frame Structure N-Frame:

  13. Paper Objective • Investigate properties of the Clique Avoidance algorithm. Performance analysis and study of interaction with Implicit Acks mechanism. • Study ability to resolve and detect conflicts in membership views of nodes within a cluster. • Provide time bounds for detecting and removing faulty members. • For their analysis, they assume arbitrary failures with bounded frequency.

  14. Initial TTP/C Fault Hypothesis.Nodes. • Only one faulty node within the duration of aTDMA round. • A node may become faulty only after any previouslyfaulty node has either shut down or operates correctlyagain. • Transmission fault is consistent (nodes will consistently consider the respective frame faulty or correct) • A node does not send faulty or correct data outside its assigned sendings slots. • A node never hides its identity when sendingframes.

  15. Initial TTP/C Fault Hypothesis.Network. • Only one channelcan be faulty during a TDMA slot. • A channel does not spontaneously create correct frames • A channel will deliver a frame either within someknown time bounds or never. • Bus Guardian transforms node errors, to comply with hypothesis. • Central Guardian a more cost effective solution. Handles several arbitrary faults.

  16. Cluster Model - Extended Fault Hypothesis • No more failures besides the one that caused a cluster partition can occur two TDMA rounds before and after the failure. Thus, initially there is a single clique in which all nodes are assigned to. • Partition failure should cause both partitions to contain more than one member. Should affect both channels and be inconsistent. Contrary to the to initial hypothesis. • TTP/C can handle faults in violation of hypothesis, but in this case there is no guarantee it selects the correct clique.

  17. Group Membership Protocol • Clique Avoidance algorithm • Removes faulty nodes from cluster • Prevents several coexisting cliques • Implicit Acknowledgement • The node inspects the membership list sent by the receiving nodes, to determine whether its message was correctly received.

  18. Cluster Model - Slot n slots per TDMA round

  19. Clique Avoidance • A reception is considered correct if the received C-state matches the local C-state and data are not corrupted. i.e transmission time is correct and memberships match after adding sender. • After a successful reception sender is added to receivers ML. • After incorrect reception, sender is removed from ML. • If the ML of the receiver differs only by the sender, then reception is successful. • Accept Counter is increased for every successful reception. • Failed Counter is increased for every incorrect reception. • If Failed counter >= Accept counter, node raises Ack Error and shuts down (freezes). • FC and AC are reset to 0 in each TDMA round.

  20. Clique formation under the extended fault hypothesis • Prior to failure, there is consensus on membership. • Transient failure occurs at slot 0, when node A is transmitting. Asymmetric send fault. • As a result, several nodes in cluster correctly received A’s transmission and the rest did not. • Two cliques are formed. The one of members with membership that includes A and the one of members that do not include A.

  21. Implicit Acks - Successors • After successful transmission, A increases AC. B checks frame for correctness. • A waits for expected message from B. • If reception was successful, B adds A in its ML and transmits a non corrupted message. • If ML’s are the same or B’s differ only by A , then A considers B its successor. • If ML’s are the same, then A is acked. (case 1). It increases its AC and adds B in ML. • If B’s ML differs by A, then A increases FC and removes B. B’s reception was not successful and B removed A. (case 2) • Otherwise A removes B from its ML. It increases FC unless B did not transmit at all. A goes to step 1.(case 3)

  22. Implicit Acks - Successors • A waits for expected message from subsequent node C. • If A finds successor C that contains A in ML, then it is acknowledged. • B is assumed faulty and both FC and ML were updated correctly. • A increases AC and adds C in ML. (case 4) • However, if C’s ML does not include A, A considers himself erroneous. A removes itself from local list and adds both B, C. Increases AC. It has the same ML with B, and C (case 5)

  23. Implicit Acks - Defector • In case 5, A changes clique membership. Becomes defector. • Other nodes become aware of a defector only in its next sending round, by the transmitted ML. • If defector becomes implicitly acknowledged, then it is no longer defector. If not, it freezes due to CA.

  24. Partition failure. Slot 0Preparation Phase

  25. Partition failure. Slot 0Transmission Phase

  26. Partition failure. Slot 0Evaluation Phase

  27. Partition failure. Slot 1Preparation Phase

  28. Partition failure. Slot 1Transmission Phase

  29. Partition failure. Slot 1Evaluation Phase

  30. Partition failure. Slot 2Preparation Phase

  31. Partition failure. Slot 2Transmission Phase

  32. Partition failure. Slot 2Evaluation Phase

  33. Partition failure. Slot 3Preparation Phase

  34. Partition failure. Slot 3Transmission Phase

  35. Partition failure. Slot 3Evaluation Phase

  36. Partition failure. Slot 4Preparation Phase

  37. Partition failure. Slot 4Preparation Phase FC > AC Node A4 Freezes !

  38. Partition failure. Slot 4Evaluation Phase

  39. Partition failure. Slot 5Preparation Phase

  40. Partition failure. Slot 5Transmission Phase

  41. Partition failure. Slot 5Evaluation Phase

  42. Partition failure. Slot 6Preparation Phase

  43. Partition failure. Slot 6Preparation Phase FC > AC Node A1 Freezes!

  44. Partition failure. Slot 6Evaluation Phase

  45. Partition failure-Defection. Slot 0Preparation Phase

  46. Partition failure-Defection. Slot 0Transmission Phase

  47. Partition failure-Defection Slot 0Evaluation Phase

  48. Partition failure-Defection. Slot 1Preparation Phase

  49. Partition failure-Defection. Slot 1Transmission Phase

  50. Partition failure-Defection. Slot 1Evaluation Phase

More Related