1 / 26

EEC 688/788 Secure and Dependable Computing

This lecture covers checkpoint-based and logging-based protocols for secure and dependable computing, including distributed snapshot protocols, fault handling mechanisms, and recovery strategies. Topics include pessimistic, optimistic, and causal logging, as well as benefits and challenges of different logging mechanisms. The discussion also touches on application-level reliable messaging and the importance of message logging for system recovery. Overall, the focus is on understanding the complexities and trade-offs of implementing secure computing practices.

blenoir
Download Presentation

EEC 688/788 Secure and Dependable Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EEC 688/788Secure and Dependable Computing Lecture 7 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org

  2. Outline Checkpointing and logging Checkpoint-based protocols Uncoordinted checkpointing Coordinated checkpointing Logging-based protocols Pessimistic logging Optimistic logging Causal logging EEC688/788: Secure & Dependable Computing

  3. Chandy and Lamport Distributed Snapshot Protocol CL snapshot protocol is a nonblocking protocol TS checkpointing protocol is blocking CL protocol is more desirable for applications that do not wish to suspect normal operation However, CL protocol is only concerned how to obtain a consistent global checkpoint CL Protocol: no coordinator, any node may initiate a global checkpointing Data structure Marker message: equivalent to the CHECKPOINT message Marker certificate: keep track to see a marker is received from every incoming channel

  4. CL Distributed Snapshot Protocol

  5. Example P0 channel state: m0 (p1 to p0 channel) P1 channel state: m1 (p2 to p1 channel) P2 channel state: empty

  6. Comparison of TS & CL Protocols Similarity Both rely on control msgs to coordinate checkpointing Both capture channel state in virtually the same way Start logging channel state upon receiving the 1st checkpoint msg from another channel Stop logging channel state after received checkpoint on the incoming channel Communication overhead similar

  7. Comparison of TS & CL Protocols Differences: strategies in producing a global checkpoint TS protocol suspends normal operation upon 1st checkpoint msg while CL does not TS protocol captures channel state prior to taking a checkpoint, while CL captures channel state after taking a checkpoint TS protocol more complete and robust than CL Has fault handling mechanism

  8. Log Based Protocols Work might be lost upon recovery using checkpoint-based protocols By logging messages, we may be able to recover the system to where it was prior to the failure System mode: the execution of a process is modeled as a set of consecutive state intervals Each interval is initiated by a nondeterministic state or initial state We assume the only type of nondeterministic event is receiving of a message

  9. Log Based Protocols In practice, logging is always used together with checkpointing Limits the recovery time: start with the latest checkpoint instead of from the initial state Limits the size of the log: after taking a checkpoint, previously logged events can be purged Logging protocol types: Pessimistic logging: msgs are logged prior to execution Optimistic logging: msgs are logged asynchronously Causal logging: nondeterministic events that not yet logged (to stable storage) are piggybacked with each msg sent For optimistic and causal logging, dependency of processes has to be tracked => more complexity, longer recovery time

  10. Pessimistic Logging Synchronously log every incoming message to stable storage prior to execution Each process periodically checkpoints its state: no need for coordination Recovery: a process restores its state using the last checkpoint and replay all logged incoming msgss

  11. Pessimistic Logging: Example Pessimistic logging can cope with concurrent failures and the recovery of two or more processes

  12. Benefits of Pessimistic Logging Processes do not need to track their dependencies Logging mechanism is easy to implement and less error prone Output commit is automatically ensured No need to carry out coordinated global checkpointing By replaying the logged msgs, a process can always bring itself to be consistent with other processes Recovery can be done completely locally Only impact to other processes: duplicate msgs (can be discarded)

  13. Pessimistic Logging: Discussion Reconnection A process must be able to cope with temporary connection failures and be ready to accept reconnections from other processes Application logic should be made independent from the transport level events: event-based or document-based computing paradigm Message duplicate detection Messages may be replayed during recovery => duplicate messages Transport level duplicate detection irrelevant. Must add mechanism in application level protocols, e.g., WS-ReliableMessaging Atomic message receiving and logging A process may fail right after the receiving of a message before it has a chance to log it to stable storage Need application-level reliable messaging mechanism

  14. Application-Level Reliable Messaging Sender buffers message sent until receives an application-level ack Benefits of application-level reliable messaging Atomic message receiving and logging Facilitate distributed system recovery from process failures: enables reconnection Enables optimization: message received can be executed immediately and the logging can be deferred until another message is to be sent Logging and msg execution can be done concurrently If a process sends out a message after receiving several msgs, logging of msgs can be batched

  15. Sender Based Message Logging Basic idea Log the message at the sending side in volatile memory Should the receiving process fail, it could obtain the messages logged at the sending processes for recovery. To avoid restarting from the initial state after a failure, a process can periodically checkpoint its local state and write the message log in stable storage (as part of the checkpoint) asynchronously Tradeoff Relative ordering of messages must be explicitly supplied by the receiver to the sender (quite counter-intuitive!) The receiver must wait for an explicit ack for the ordering message before it send any msgs to other processes (however, it can execute the message received immediately without delay) The mechanism is to prevent the formation of orphan messages and orphan processes

  16. Orphan Message and Orphan Process An orphan message is one that was sent by a process prior to a failure, but cannot be guaranteed to be regenerated upon the recovery of the process An orphan process is a process that receives an orphan message If a process sends out a message and subsequently fails before the determinants of the messages it has received are properly logged, the message sent becomes an orphan message

  17. Sender Based Message Logging Protocol: Data Structures A counter, seq_counter, used to assign a sequence number (using the current value of the counter) to each outgoing message Needed for duplicate detection A table for duplicate detection Each entry has the form <process_id,max_seq>, where max_seq is the maximum sequence number that the current process has received from a process with an identifier of process_id. A message is deemed as a duplicate if it carries a sequence number lower or equal to max_seq for the corresponding process Another counter, rsn_counter, used to record the receiving/execution order of an incoming message The counter is initialized to 0 and incremented by one for each message received

  18. Sender Based Message Logging Protocol: Data Structures A message log (in volatile memory) for msg sent by the process. In addition to the msg sent, the following meta data is also recorded: Destination process id, receiver_id; Sending sequence number, seq; Receiving sequence number, rsn. A history list for the messages received since the last checkpoint. It is used to find the receiving order number for a duplicate msg. Upon receiving a duplicate message, the process should supply the corresponding (original) receiving order number so that the sender of the message can log such ordering information properly Each entry in the list has the following information: Sending process id, sender_id; Sending sequence number, seq; Receiving sequence number, rsn (assigned by the current process).

  19. What Should be Checkpointed? All the data structures described above except the history list must be checkpointed together with the process state The two counters, one for assigning the message sequence number and the other for assigning the message receiving order, are needed so that the process can continue doing so upon recovery using the checkpoint The table for duplicate detection is needed for a similar reason. Why the message log must be checkpointed? The log is needed for the receiving processes to recover from a failure, and hence, cannot be garbage collected upon a checkpointing operation Additional mechanism is necessary to ensure that the message log does not grow indefinitely

  20. Sender Based Message Logging Protocol: Message Types REGULAR: It is used for sending regular messages generated by the application process, and it has the form <REGULAR, seq, rsn,m> ORDER: It is used for the receiving process is notify the sending process the receiving order of the message. An order message carries the form <ORDER, [m], rsn>, [m] is the message identifier consisting of a tuple <sender_id, receiver_id, seq> ACK: It is used for the sending process (of a regular message) to acknowledge the receipt of the order message. It assumes the form <ACK, [m]>

  21. Sender Based Message Logging Protocol: Normal Operation The protocol operates in three steps for each message: A regular message, <REGULAR,seq, rsn,m>, is sent from one process, e.g., Pi, to another process, e.g., Pj . Process Pj determines the receiving/execution order, rsn, of the regular message and informs the determinant information to Pi in an order message <ORDER, [m], rsn>. Process Pj waits until it has received the corresponding acknowledgment message, <ACK, [m]>, before it sends out any regular message.

  22. Sender Based Message Logging Protocol: Recovery Mechanism On recovering from a failure, a process first restores its state using the latest local checkpoint, and then it must broadcast a request to all other processes in the system to retransmit all their logged messages that were sent to the process The recovering process retransmit the regular messages or the ack messages based on the following rule: If the entry in the log for a message contains no rsn value, then a regular message is retransmitted because the intended receiving process might not have received this message. If the entry in the log for a message contains a valid rsn value, then an ack message is sent so that the receiving process can send regular messages When a process receives a regular message, it always sends a corresponding order message in response

  23. Actions upon Receiving a Regular Message A process always sends a corresponding order msg in response Three scenarios with recovery The msg is a not duplicate: the current rsn counter value is assigned to the msg and the order msg is sent. The process must wait until it receives the ack msg before it can send any regular msg The msg is a duplicate, and the corresponding rsn is found in the history list: actions are identical to above except rsn is not newly assigned The msg is a duplicate, and no rsn is found in the history list: the process must have checkpointed its state after receiving the msg and the msg is no longer needed for recovery. Hence, the order msg includes a special constant indicating so. The sender can then purge the msg in its log The recovering process may receive two types of retransmitted regular messages: Those with a valid rsn value: the rsn must be already part of the checkpoint. It executes the msg according to the order Those without: can assign the msg to any order

  24. Limitations of Sender Based Msg Logging Protocol Won’t work in the presence of 2 or more concurrent failures Determinant for some regular msgs (i.e., rsn) might be lost => orphan processes and cascading rollbacks P2 may become an orphan process if P0 and P1 both crash: received mt that no one has sent

  25. Truncating Sender’s Message Log Once a process completes a local checkpoint, it broadcasts a message containing the highest rsn value for the messages that it has executed prior to the checkpoint. All messages sent by other processes to this process that were assigned a value that is smaller or equal to this rsn value can now to purged from its message log (including those in stable storage as part of a checkpoint) Alternatively, this highest rsn value can be piggybacked with each message (regular or control messages) sent to another process to enable asynchronous purging of the logged messages that are no longer needed

More Related