Distributed Systems

Distributed Systems Docent: Vincent Naessens

Chapter 7:Consistency and replication

Chapter 7: Consistency and replication • 7.1. Introduction • 7.2. Data-centric consistency models • 7.3. Client-centric consistency models • 7.4. Replication management • 7.5. Consistency protocols

7.1. Introduction • Data are replicated to increase the reliability of a system • Replication for performance • Scaling in numbers • Scaling in geographical area • Side effects • Gain in performance • Cost of increased bandwith for maintaining replication

7.2. Data-centric consistency models • Problem: • Process that reads from DB wants recent data • Writes must be propagated to all copies of data store • Replication poses consistency problems

7.2. Data-centric consistency models A consistency model is a contract between processes and the data store

7.2.1. Continuous Consistency • Deviations in replicas are sometimes allowed • Numerical deviations: • Prices of two copies may not deviate more than $0.02 • Number of updates not seen by other copies • Staleness deviations: • Replica may contain weather forecast of 2 hours ago • Ordering deviations: • Tentatively update 1 replica • Rollback mechanism must be foreseen!

7.2.1. Continuous Consistency • The notion of a conit • A conit = a unit over which consistency is measured • A single stock in a stock market • An individual weather report • Coarse-grained conits (a) • bring replicas earlier in inconsistent state • Fine-grained conit (b) • More state info needs to be kept (~more management)

7.2.2. Consistent ordering of operations • Models for consistent ordering • Sequential consistency • Causal consistency • Terminology

7.2.2. Consistent ordering of operations • Sequential consistency The result of any execution is the same as if the (read and write) operations by all processes on the data store were executed in some sequential order and the operations of each individual process appear in this sequence in the order specified by its program. OK NOK

7.2.2. Consistent ordering of operations • Sequential consistency • 720 (6!) possible execution sequences • 90 valid execution sequences • Less than 64 different program results under assumption of sequential consistency

7.2.2. Consistent ordering of operations • Sequential consistency • Four valid execution sequences • Invalid sequences: • Signature: 000000 • Signature: 001001

7.2.2. Consistent ordering of operations • Causal consistency Writes that are potentially causally related must be seen by all processes in the same order. Concurrent writes may be seen in a different order on different machines. NOK Sequentially consistent OK Causally consistent

7.2.2. Consistent ordering of operations • Causal consistency Writes that are potentially causally related must be seen by all processes in the same order. Concurrent writes may be seen in a different order on different machines. NOK OK

7.2.2. Consistent ordering of operations • Grouping operations • ENTER_CS • LEAVE_CS LEAVE_CS completed only all the guarded shared data have been brought up-to-date. ENTER_CS before updating a shared data item If a process want to ENTER_CS in non-exclusive mode, first fetch most recent copy of the guarded shared data

7.2.2. Consistent ordering of operations • Consistency versus coherence • Consistency models: are applied to a set of items • Coherence models: are applied to 1 data item

7.3. Client-centric consistency models • Eventual consistency • All replicas will evolve to the same version • Works well in many systems: • Ex: DNS • Ex: caching websites at client • Some problems with eventual consistency • Mobile users work on different replicas • Solution: client-centric consistency • Guarantees for a single client concerning consistency of accesses • See next slide

7.3. Client-centric consistency models • problem statement • User performs updates at replica Li • User moves to replica Lj • User sees old copy at Lj

7.3. Client-centric consistency models • Four client-centric consistency models • Monotomic reads • Monotomic writes • Read your writes • Writes follow reads

7.3. Client-centric consistency models • Monotomic reads (ex: replicated mailboxes) If a process reads the value of a data item x any successive read operation on x by that process will always return that same value or a more recent value. OK NOK

7.3. Client-centric consistency models • Monotomic writes (ex: updates on SW library) A write operation by a process on a data item x is completed before any successive write operation on x by the same process. OK NOK

7.3. Client-centric consistency models • Read your writes (ex: updating your password) The effect of a write operation by a process on data item x will always be seen by a successive read operation on x by the same process. OK NOK

7.3. Client-centric consistency models • Writes follow Reads (ex: reaction on newsgroup only posted after original message) A write operation by a process on a data item x following a previous read operation on x by the same process is guaranteed to take place on the same or a more recent value of x that was read. OK NOK

7.4. Replica management • 2 questions: • Q1: Where to place replica servers? • Q2: Where to place content on replica servers? • Replica server placement • Based on distance between clients and locations • Splitting space in cells

7.4. Replica management • Content replication and placement • The logical organization of different kinds of copies of a data store into three concentric rings.

7.4. Replica management • Content replication and placement • Permanent replicas • Site with replicated servers at single location • Mirroring: mirror site geographically spread across Internet • Server-Initiated replicas • Why? sudden burst of requests at a certain location • Used in web hosting services

7.4. Replica management • Content replication and placement • Server-Initiated replicas • Each server Q maintains cnt_Q(P,F) • Number of times server is contacted for file F by a client close to server P • Notation: • Del(S,F): deletion threshold for file F at server S • Rep(S,F): replication threshold for a file F at server S • Algorithm • If access_counts(S,F) < del(S,F) then delete copy unless 1 left • If del(S,F) < access_count(S,F) < Rep(S,F), then file may migrate • consider count_Q(P,F) when deciding for migration/replication

7.4. Replica management • Client-initiated replicas • Where to cach data? • At the client-machine • At a separate machine in the same LAN

7.4. Replica management • Content distribution • 3 possible strategies • Propagate only a notification of an update. • If number of writes are large compared to number of reads • Saves bandwith • Transfer the datafrom one copy to another. • If read-to-write ratio is high • Propagate the update operation to other copies. • if data is large, but parameters of operation small

7.4. Replica management • Content distribution • Push-based approach (~server-based protocols) • Mainly used between servers • Push invalidations OR push data • Pull-based approach (~client-based protocols) • Mainly used by client caches • Hybrid approach: leases • Server promises to push data for a limited period of time

7.5. Consistency protocols • Protocols for continuous consistency (see book) • Bounding numerical deviations • Bounding staleness deviations • Bounding ordering deviations • Not very intuitive to application developers • Protocols that are frequently applied in practise • Primary-based protocols • Replicated-write protocols

7.5. Consistency protocols • Primary-based protocols • Remote write protocol

7.5. Consistency protocols • Primary-based protocols • Local write protocol

7.5. Consistency protocols • Replicated-write protocols • Active replication • Unordered approach • Client sends update to all servers directly • Note: how to ensure sequence is correct? • Ordered approach • Client sends update to sequencer • Sequencer adds number and forwards update to servers • Note: similar to primary based protocols • Quorum-based protocols • See next slides

7.5. Consistency protocols • Replicated-write protocols • Quorum-based protocols (~ voting) • Ask permission to other servers before reading/writing • N_R = read quorum; N_W = write quorum • Properties • N_R + N_W > N • N_W > N/2 ROWA OK OK NOK

Chapter 8: Fault tolerance

Chapter 8: fault tolerancy • 8.1. Introduction to fault tolerance • 8.2. Process resilience • 8.3. Reliable client-server communication • 8.4. Reliable group communication • 8.5. Distributed commit • 8.6. Recovery

8.1. Introduction to fault tolerance • Requirements for dependable systems • Availability: • chance that system works correctly • Reliability: • chance that system works correctly in large interval • Safety: • nothing catastrophic happens at failure • Maintainability: • how easy a system can be repaired

8.1. Introduction to fault tolerance • Type of faults • Transient fault: bird through transmittor beam • Intermittent fault: loose contact on a connector • Permanent fault: burnt-out chips • Types of failures

8.1. Introduction to fault tolerance • Masking faults  redundancy • Information redundancy: f.i. Hamming codes • Time redundancy: f.i. Resending messages • Physical redundancy: f.i. Replicating SW or HW comp’s Triple Modular Consistency

8.2. Process resilience • Flat groups versus hierarchical groups Symmetrical No-single-point-of-failure More easy coordination Single-point-of-failure

8.2. Process resilience • Group membership • Same troubles: • Distributed group membership alg’s: more complex • Centralized approach (~group server): S-P-o-F • Failure masking and replication • Primary-based protocols • Primary backup protocol • Election algoritm needed if primary server crashed • Replicated-write protocols

8.2. Process resilience • Failure masking and replication • k fault tolerant system = system can survice faults in k components • How many replication? • Silent failures  k +1 components • Byzantine failures  2*k +1 components

8.2. Process resilience • Agreement in faulty systems • Possible cases: • Synchronous versus asynchronous systems. • Communication delay is bounded or not. • Message delivery is ordered or not. • Message transmission is done through unicasting or multicasting. • Distributed agreement only possible in some cases

8.2. Process resilience • Agreement in faulty systems (example) 4 servers – 1 faulty OK

8.2. Process resilience • Agreement in faulty systems (example) 3 servers – 1 faulty NOK

8.2. Process resilience • Agreement in faulty systems (example) • In general: • k failure possible in this set up with 3*k+1 servers

8.2. Process resilience • Failure detection • Pinging  pull based approach • Gossiping  anounce that you are alive • Types of failures • Node failures • Network failures

8.3. Reliable client-server communication • Point-to-point communication • Masking omission failures by using acknowledgements and retransmission • RPC semantics in the presence of failures • 5 types of failures • The client is unable to locate the server. • The request message from the client to the server is lost. • The server crashes after receiving a request. • The reply message from the server to the client is lost. • The client crashes after sending a request.

8.3. Reliable client-server communication • RPC semantics in the presence of failures • The client is unable to locate the server • Solution: adding exceptions to the client process • Disadvantage: remote procedures different from local ones • Lost request messages • Acknowledge by server • Timer at client: if no ACK, resend message

Distributed Systems