740 likes | 959 Views
ICS362 – Distributed Systems. Dr. Ken Cosh Lecture 8. Review. Replication & Consistency Data Centric Consistency Models Continuous Consistency Sequential Consistency Causal Consistency Entry (/Release) Consistency Client Centric Consistency Models Eventual Consistency Monotonic Reads
E N D
ICS362 – Distributed Systems Dr. Ken Cosh Lecture 8
Review • Replication & Consistency • Data Centric Consistency Models • Continuous Consistency • Sequential Consistency • Causal Consistency • Entry (/Release) Consistency • Client Centric Consistency Models • Eventual Consistency • Monotonic Reads • Monotonic Writes • Read Your Writes • Writes Follows Reads • Replica Management • Replica & Content Placement • Protocols • Remote Write Protocols • Local Write Protocols • Active Replication – Quorum Based Protocols
This Week • Fault Tolerance • Process Resilience • Reliable Client-Server Communication • Reliable Group Communication • Distributed Commit • Recovery
Fault Tolerance • One of our primary Distributed Systems goals was Fault Tolerance • i.e. a partial failure may result in some components not working, but at the same time other components may be totally unaffected • Whereas in non-Distributed Systems a failure may bring down the whole system.
Dependent Systems • Fault tolerance is closely related to the concept of dependability • i.e. the degree of trust users have with a system. • In distributed systems we consider the following properties affecting dependability • Availability • Reliability • Safety • Maintainability • (Security)
Availability/ Reliability • Availability • The property that a system is ready to be used when requested • Measured by a probability • Reliability • The property that a system can run continuously without failure • Measured by a time interval • Note: These are different definitions to those discussed in other courses… ;)
Availability / Reliability • If a system goes down for one millisecond every hour; • It is highly available (>99.9999%) • But highly unreliable • If a system never crashes, but is shut down for 2 weeks each year • It is highly reliable • But not very available (96%)
Safety / Maintainability • Safety • Situations where a temporary failure in a system leads to something catastrophic • Human life, injury, environmental damage etc. • Maintainability • Refers to how easily a failed system can be repaired • Highly maintainable systems are often highly available • Especially if the failures can be automatically detected and corrected
Failures • A system fails when it doesn’t perform as promised • When one or more service can’t be provided • An error is the system state which leads to the failure • Perhaps a message sent across the network is damaged • An error is caused by a fault (hence fault tolerance • The fault could be incorrect transmission medium (which is easily corrected), or poor weather conditions (which is not so easily corrected).
Faults • Faults lead to Errors, Errors lead to Failures • But there are different types of faults. • Transient Faults • Occur once and then disappear. • E.g. bird flies through a microwave beam transmitter • The operation can simply be repeated • Intermittent Faults • Occur, then vanish then reappear • E.g. loose contact on a connector • Typically disappear when the engineer arrives! • Permanent Faults • Occur until the faulty component is replaced • E.g. Burnt out disk, software bug
Failure Models • Distributed Systems • Collection of Clients & Servers communicating and providing services • Both machine and communication channels could cause faults • Complex dependencies between servers • A faulty server may be caused by a fault within a different server • There are several different types of failures
Crash Failure • Server prematurely halts, but was working until it stopped. • Perhaps caused by the operating system in which case there is one solution • Reboot it! • Our PCs suffer from crash failures so frequently that we just ‘accept it’ • the reset button is now on the front of the case.
Omission Failure • Server fails to respond to a request • Receive Omission • When the server didn’t receive the request in the first place. • Send Omission • When the server fails to send the response • Perhaps a send buffer overflow.
Timing Failure • When the server’s response is outside of a specified time interval • Remember isochronous data streams? • Providing data too soon can cause as many problems as being too late…
Response failure • When the server’s response is just incorrect • Value Failure • When the server simply sends the wrong reply to a request • State Transition Failure • When the server reacts unexpectedly to an incoming request • Perhaps it can’t recognise the message, or perhaps it has no code for dealing with the message.
Arbitrary Failures • Perhaps the most serious failures, also known as Byzantine Failures. • Server produces output that it shouldn’t have, but it can’t be detected as being incorrect. • Worse is when the server works maliciously with other servers to produce intentionally wrong answers • We’ll return to Byzantine later…
Redundancy • The key to masking failures is Redundancy • Information Redundancy • Extra bits added to allow recovery from damaged bits (e.g. Hamming codes) • Time Redundancy • If need be after a period of time the action is performed again (perhaps if a transaction aborts) • Physical Redundancy • Extra equipment / processes to make it possible to continue with broken components (replication!)
Physical Redundancy • We have 2 eyes, 2 ears, 2 lungs… • A boeing 747 has 4 engines, but can fly with only 3. • In football we have a referee and 2 referees assistants (linesmen) • TMR or (Triple Modular Redundancy) works by having 3 components
Triple Modular Redundancy • Suppose A2 fails. • Each voter (V1, V2, V3) gets 2 good inputs allowing them to pass the correct value to stage B. • Suppose voter V1 fails. • B1 will get an incorrect input, but B2 & B3 can produce the correct output so V4-V6 can choose the correct response.
Process Resilience • Similar to TMR, the key to tolerating faulty processes is organising multiple identical processes in a group. • When a message is sent to the group, all processes receive it, in the hope that one can deal with it. • Process groups are dynamic • A process can join or leave, and a process could be part of multiple groups at the same time • The group can be considered as a single abstraction • i.e. a message can be sent to the group regardless of which processes are in the group
Flat Groups vs Hierarchical Groups • In a Flat Group all processes are equal • Decisions are made collectively • In a Hierarchical Group one process may be the co-ordinator • The co-ordinator decides which worker process is best to perform some request
Flat Groups vs Hierarchical Groups • Flat Groups have no single point of failure • If one crashes, the group continues but just becomes smaller • But, decision making is complicated, often involving a vote • Hierarchical Groups are the opposite • If the coordinator breaks, the group breaks • But, the coordinator can make decisions without interrupting the others
Group Membership Management • How do we know which processes are part of a group? • We could have a group server responsible for creating, deleting groups and allowing processes to join and leave a group • This is efficient, but again results in a single point of failure • Alternatively it could be managed in a distributed style • To join or leave a group a process simply lets everyone know they are there or they are leaving • Assuming they leave voluntarily and don’t just crash
Group Membership Management • A further issue with distributed management is that joining / leaving needs to be synchronous with messages being sent • i.e. when a process joins it should then receive all subsequent messages and should stop receiving messages when it leaves • Which means joining and leaving are added to the process queue • Also, what happens when too many processes leave and the group can’t function any longer? • We need to rebuild the group – what if multiple processes attempt to rebuild the group simultaneously?
How many processes are needed? • A system is k fault tolerant, if k components fail and it continues working. • If processes fail silently k+1 processes are needed. • If processes exhibit Byzantine failures, 2k+1 are needed • Byzantine failures occur when a process continues to send erroneous or random replies • But how do we determine (with certainty) that k processes might fail, but k+1 won’t?
What are the processes deciding? • Who should be coordinator? • Whether or not to commit a transaction? • How do we divide up tasks? • How / When should we synchronise? • …
Failure Detection • How can we know when a process has failed? • Ping - “Are you alive?” • But is it the process or the communication channel that has failed? • False Positives • Gossiping – “I’m alive!”
Reliable CommunicationClient / Server • As well as processes being ‘unreliable’, the communication between processes is ‘unreliable’. • Building a fault tolerant DS involves managing point to point communication. • TCP masks omission failures such as lost messages using acknowledgements and retransmissions • But this doesn’t resolve crash failures when the server may crash during transmission
RPC Semantics • RPC works well when client and server are functioning. If there is a crash it’s not easy to mask the difference between local and remote calls. • The client is unable to locate the server • The request message from the client to the server is lost • The server crashes after receiving a request • The reply message from the server to the client is lost • The client crashes after sending a request • Each pose different problems
Client Cannot Locate Server • Server could be down, or perhaps has upgraded and is now using a different communication format • We could generate an exception (Java, C) • Not every language has exceptions • Exceptions destroy the transparency • If the RPC responds with an exception “Cannot Locate Server”, it is clear that it isn’t a single processor system.
Lost Request Messages • Easiest to deal with • Start a timer, if the timer expires before an acknowledgement or a reply, then send the message again. • Server just needs to detect if it is a message or a retransmission • But, if too many messages are lost the client will conclude “Cannot Locate Server”
Server Crashes • Tricky as there are different scenarios • The client can’t tell the difference between b and c, but they need different responses
Server Crashes • The server has 2 options • At Least Once Semantics • At Most Once Semantics • While we would like • Exactly Once Semantics • There is no way to arrange this
Semantics • At least Once Semantics • Wait until the server reboots and try the operation again. • Keep trying until you get a response • The RPC will be carried out at least once, but possibly more. • At most Once Semantics • Give up immediately! • The RPC may have been carried out, but wont be carried out more than once. • Alternative: • Give no guarantees, so the RPC may happen anywhere between zero or a large number of times.
Server Crashes • The client also has options (4) • Never reissue a request • Always reissue a request • Reissue a request if it did not yet receive an acknowledgement • Reissue a request if it received an acknowledgement, but no reply
Server Crashes • With 2 server strategies and 4 client strategies, there are 8 possible combinations • None of them are satisfactory • In short, the possibility of server crashes radically changes the nature of RPC, very different from single processor systems.
Lost Reply Messages • Also difficult • Did the reply get lost, or is the server just slow? • Resend the request based on a client timer? • Depends whether the request is idempotent • Idempotency • Can the request is performed more than once without any damage being done?
Idempotency • Consider a request for the first 1024bytes of data from file “xyz.txt” • Consider a request to transfer 1,000,000B from your account to mine • What happens if the reply is lost 10 times?
Lost Reply Messages • An alternative is to contain a sequence number within each request • The retransmission will then have a different sequence number from the original request and the server can distinguish the two. • However, this requires the server to maintain administration for each client • A further option is to send a bit in the message header indicating if it is an original request or a retransmission • Original requests can be performed, but care should be taken with retransmissions.
Client Crashes • When the client (parent) crashes after it has sent an RPC then the process becomes an ‘orphan’. • i.e. there is no parent waiting for the results of the process. • Orphans cause problems • They waste CPU (and other) resources • They can cause confusion if they send their result just after the client reboots • How can we deal with orphans? • Exterminate them • Reincarnation • Gentle reincarnation • Expiration
Orphan Extermination • Each time a client sends an RPC message it stores on a hard disk what it is about to do. • When it reboots it checks the log and explicitly kills off any orphans. • Downsides: • It’s expensive writing to disks • It might not work, as the orphans may have themselves made RPC calls creating grand-orphans • If the network is broken it might not be possible to find the orphans again • If the orphan has a lock on some resource, that lock may remain in place forever
Reincarnation • When the client returns it sends a message to all other machines declaring a new epoch • Complete with a new epoch number • All servers can check if they have remote computations and if so kill them • If any are missed when they report back they will have a different epoch number so are easy to detect
Gentle Reincarnation • When an epoch request comes in, each machine tries to locate the owner of their remote computations • If the owner can’t be located, the computation is killed.
Expiration • Each RPC is given a standard amount of time T to complete the job • If it can’t finish, then it explicitly asks for a new quantum • If a client crashes and waits T before rebooting all orphans are sure to be gone. • The problem is choosing a suitable T.
Reliable Group CommunicationProcess Groups • Reliable Multicasting enables messages to be delivered to all members of a process group • Unfortunately enabling reliable multicasting is not that easy • Most transport layers support reliable point-to-point communication channels, but not reliable communication to groups. • At its simplest we can use multiple point-to-point messages
Reliable Multicasting • What happens when a process joins during the communication? • Should it get the message? • What happens if the sending process crashes? • To simplify, lets assume that we know who is in the group and nobody is going to join or leave
Basic Reliable Multicasting • Each message has a sequence number and then stores the message until it receives “Acknowledge” from every other process. • If a receiver missed a message it can simple request resubmission • Or if the sender doesn’t get all the acknowledgements within a certain amount of time, it can resend the message.