220 likes | 254 Views
Learn about state machine replication approach and Paxos protocol for fault-tolerant distributed systems. Understand the fundamental concepts, roles, and relationships within this framework. Delve into the design, implementation, and fault tolerance strategies critical for distributed systems.
E N D
Distributed Systems – Paxos Prof. Orhan Gemikonakli Module Leader: Prof. Leonardo Mostarda Università di Camerino
Last lecture • Basic definitions about Consistency and replication • Data-centric consistency models • Continuous consistency • Sequential consistency • Causal consistency • Client-centric consistency models • Monotonic reads • Monotonic writes • Reads your writes • Writes follows reads
Outline State machine replication approach Paxos protocol
Learning outcomes Understand the basic concepts of the state machine replication approach Understand the use of the state machine replication approach Understand the Paxos protocol Understand the relationship between the Paxos protocol and the state machine based approach
State machine replication • The state machine approach is a method for implementing a fault-tolerant service by replicating servers and coordinating client interactions with server replicas. • State machine definition • States • Inputs • Outputs • A transition function (Input x State -> State) • An output function (Input x State -> Output) • A distinguished State called Start.
State machine replication Deterministic state machine
State machine replication Place copies of the State Machine on multiple, independent servers. Receive client requests, interpreted as Inputs to the State Machine. Choose an ordering for the Inputs. Execute Inputs in the chosen order on each server. Respond to clients with the Output from the State Machine. Monitor replicas for differences in State or Output.
State machine replication If multiple copies of a system exist, a fault in one would be noticeable as a difference in the State or Output from the others The minimum number of copies needed for fault-tolerance is three; one which has a fault, and two others to whom we compare State and Output. In general a system which supports F failures must have 2F+1 copies All of this deduction pre-supposes that replicas are experiencing only random independent faults such as memory errors or hard-drive crash.
Problem • In distributed system: • No master for issuing locks. • Failures • How to reach consensus/data consistency in distributed system that can tolerate non-malicious failures?
Paxos and consensus Paxos is a family of protocols for solving consensus in a network of unreliable processors. The Paxos protocol was first published in 1989 and named after a fictional legislative consensus system used on the Paxos island in Greece Consensus is the process of agreeing on one result among a group of participants. This problem becomes difficult when the participants or their communication medium may experience failures.
PAXOS • Paxos is usually used where durability is required in which the amount of durable state could be large. • e.g. to replicate a file or a database • The protocol attempts to make progress even during periods when some bounded number of replicas are unresponsive. • There is also a mechanism to drop a permanently failed replica or to add a new replica.
Paxos – assumptions • Processors • Processors operate at arbitrary speed. • Processors may experience failures. • Processors with stable storage may re-join the protocol after failures. • Processors do not collude, lie, or otherwise attempt to subvert the protocol. • Network • Processors can send messages to any other processor. • Messages are sent asynchronously and may take arbitrarily long to deliver. • Messages may be lost, reordered, or duplicated. • Messages are delivered without corruption.
Paxos - roles • Client: The Client issues a request to the distributed system, and waits for a response. For instance, a write request on a file in a distributed file server. • Acceptor (Voter): The Acceptors act as the fault-tolerant "memory" of the protocol. Acceptors are collected into groups called Quorums. Any message sent to an Acceptor must be sent to a Quorum of Acceptors. Any message received from an Acceptor is ignored unless a copy is received from each Acceptor in a Quorum. • Proposer: A Proposer advocates a client request, attempting to convince the Acceptors to agree on it, and acting as a coordinator to move the protocol forward when conflicts occur. • Learner: Learners act as the replication factor for the protocol. Once a Client request has been agreed on by the Acceptors, the Learner may take action
PAXOS - roles • Leader: Paxos requires a distinguished Proposer (called the leader) to make progress. • Many processes may believe they are leaders, but the protocol only guarantees progress if one of them is eventually chosen. • If two processes believe they are leaders, they may stall the protocol by continuously proposing conflicting updates. • However, the safety properties are still preserved in that case. • In most deployments of Paxos, each participating process acts in three roles; Proposer, Acceptor and Learner
PAXOS – safety properties • Safety • Only a value that has been proposed may be chosen. • Only a single value is chosen. • A node never learns that a value has been chosen unless it actually has been. • Liveness • Some proposed value is eventually chosen. • If a value has been chosen, a node can eventually learn the value.
PAXOS Algorithm • Phase 1 (prepare): • A proposer selects a proposal number n and sends a prepare request with number n to majority of acceptors. • If an acceptor receives a prepare request with number n greater than that of any prepare request it saw, it responses YES to that request with a promise not to accept any more proposals numbered less than n and include the highest-numbered proposal (if any) that it has accepted.
PAXOS Algorithm • Phase 2 (accept): • If the proposer receives a response YES to its prepare requests from a majority of acceptors, then it sends an accept request to each of those acceptors for a proposal numbered n with a values v which is the value of the highest-numbered proposal among the responses. • If an acceptor receives an accept request for a proposal numbered n, it accepts the proposal unless it has already responded to a prepare request having a number greater than n.
PAXOS Applications • Google, Chubby lock service. • IBM SAN • Amazon • Petal: Distributed virtual disks. • Frangipani: A scalable distributed file system. • …….
Summary • State machine replication approach • Paxos protocol • The general algorithm • Note that PAXOS has extended versions • Cheap PAXOS • Fast PAXOS • Byzantine Paxos • Fast Byzantine Multi-Paxos
Next Lecture • Consistency – 2 • Replica management • Permanent replicas • Server initiated replicas • Client initiated replicas • Pull versus push protocols • Consistency protocols