280 likes | 298 Views
Database replication is the process of sharing information to ensure consistency across redundant resources, enhancing reliability, fault tolerance, and accessibility. This involves creating and maintaining multiple copies of databases, optimizing reads on replicated servers, transparency in access, and utilizing active vs. passive and multi-master replication models. Learn about load balancing, data distribution, backup strategies, replication in distributed systems, and various replication models like Transactional and State Machine Replication.
E N D
Database Replication Replication Replication is the process of sharing information so as to ensure consistency between redundant resources, such as software or hardward components, to improve reliability, fault tolerance or accessibility.
Database Replication Data Replication It could be data replication if the same data is stored on multiple storage devices or computation replication if the same computing task is executed many times. A computational task is typically replicated in space, i.e. executed on separate devices, or it could be replicated in time, if it is executed repeatedly on a single device.
Database Replication Database Replication Databaes replication is the creation and maintenance of multiple copies of the same database. In most implementations of database replication, one database server maintains the master copy of the database and additional database servers maintain slave copies of the database. Database writes are sent to the master database server and are then replicated by the slave database servers.
Database Replication Database Reads on Replicated Servers Database reads are divided among all of the database servers, which results in a large performance advantage due to load sharing. In addition, database replication can also improve availability because the slave database servers can be configured to take over the master role if the master database server becomes unavailable.
Database Replication Transparency The access to a replicated entity is typically uniform with access to a single, non-replicated entity. The replication itself should be transparent to an external user. Also, in a failure scenario, a failover of replicas is hidden as much as possible.
Database Replication Transparency The isolation property is the most often relaxed ACID property in a DBMS (Database Management System). This is because to maintain the highest level of isolation a DBMS must acquire locks on data, which may result in a loss of concurrency or else implement multiversioning (version control), which may require additional logic to function correctly.
Database Replication Active vs. Passive Replication It is common to talk about active and passive replication in systems that replicate data or services. Active replication is performed by processing the same request at every replica. In passive replication, each single request is processed on a single replica and then its state is transferred to the other replicas. If at any time one master replica is designated to process all the requests, then we are talking about the primary-backup scheme (master-slave scheme) predominant in high availalibility clusters.
Database Replication Multi-Master Replication On the other side, if any replica processes a request and then distributes a new state, then this is a multi-primary scheme (called multi-master in database field). In the multi-primary scheme, some form of distributed concurrency control must be used, such as distributed lock manager.
Database Replication Load Balancing Load balancing is different from task replication, since it distributes a load of different (not the same) computations across machines, and allows a single computation to be dropped in case of failure. Load balancing, however, sometimes uses data replication (esp. multi-master) internally, to distribute its data among machines.
Database Replication Backup Backup is different from replication, since it saves a copy of data unchanged for a long period of time. Replicas on the other hand are frequently updated and quickly lose any historical state.
Database Replication Replication in distributed systems Whether one replicates data or computation, the objective is to have some group of processes that handle incoming events. If we replicate data, these processes are passive and operate only to maintain the stored data, reply to read requests, and apply updates. When we replicate computation, the usual goal is to provide fault-tolerance. But the underlying needs are the same in both cases: by ensuring that the replicas see the same events in equivalent orders, they stay in consistent states and hence any replica can respond to queries.
Database Replication Replication Models Replication models in distributed systems A number of widely cited models exist for data replication, each having its own properties and performance: • Transactional Replication • State Machine Replication • Virtual Synchrony
Database Replication Transactional Replication This is the model for replicating transactional data, for example a database or some other form of transactional storage structure. The one copy serializeability model is employed in this case, which defines legal outcomes of a transaction on replicated data in accordance with ttion on replicated data in accordance with the overall ACID properties that transactional systems seek to guarantee.
Database Replication State Machine Replication This model assumes that replicated process is a deterministic finite state machine and that atomic broadcast of every event is possible. It is based on a distributed computing problem called distributed consensus and has a great deal in common with the transactional replication model. This is sometimes mistakenly used as synonym of active replication.
Database Replication Virtual Synchronomy This computational model is used when a group of processes cooperate to replicate in-memory data or to coordinate actions. The model defines a new distributed entity called a process group. A process can join a group, which is much like opening a file: the process is added to the group, but is also provided with a checkpoint containing the current state of the data replicated by group members. Processes can then send events (multicasts) to the group and will see incoming events in the identical order, even if events are sent concurrently. Membership changes are handled as a special kind of platform-generated event that delivers a new membership view to the processes in the group.
Database Replication Performance Comparison Transactional replication is slowest, at least when one-copy serializability guarantees are desired (better performance can be obtained when a database uses log-based replication, but at the cost of possible inconsistencies if a failure causes part of the log to be lost). Virtual synchrony is the fastest of the three models, but the handling of failures is less rigorous than in the transactional model. State machine replication lies somewhere in between; the model is faster than transactions, but much slower than virtual synchrony.
Database Replication Master/Slave Database replication can be used on many database management systems, usually with a master/slave relationship between the original and the copies. The master logs the updates, which then ripple through to the slaves. The slave outputs a message stating that it has received the update successfully, thus allowing the sending (and potentially re-sending until successfully applied) of subsequent updates.
Database Replication Master/Master Master/Master replication is where updates can be submitted to any database node, and then ripple through to other servers, is often desired, but introduces substantially increased costs and complexity which may make it impractical in some situations. The most common challenge that exists in multi-master replication is transactional conflict prevention or resolution. Most synchronous or eager replication solutions do conflict prevention, while asynchronous solutions have to do conflict resolution. For instance, if a record is changed on two nodes simultaneously, an eager replication system would detect the conflict before confirming the commit and abort one of the transactions.
Database Replication Lazy Replication Would allow both transactions to commit and run a conflict resolution during resynchronization. The resolution of such a conflict may be based on a timestamp of the transaction, on the hierarchy of the origin nodes or on much more complex logic, which decides consistently on all nodes.
Database Replication Multi-Master Replication Multi-master replication is a method of replication employed by databases to transfer data or changes to data across multiple computers within a group. Multi-master replication can be contrasted with a master-slave method (also known as single-master replication). The term Multi-master can also be applied to systems in general where a single piece of information can be updated by one of several systems. That is, no one system can be said to own the information and be able to control it consistency and accuracy.
Database Replication Benefits of Multi-Master Replication : If one master fails, other masters will continue to update the database. Masters can be located in several physical sites i.e. distributed across the network.
Database Replication Disadvantages of Multi-Master Replication Most multi-master replication systems are only loosely consistent, i.e. lazy and asynchronous, violating ACID properties. Eager replication systems are complex and introduce some communication latency. Issues such as conflict resolution can become intractable as the number of nodes involved rises and the required latency decreases
Database Replication Methods Log Based: A database transaction log is referenced to capture changes made to the database. For log-based transaction capturing, database changes can only be distributed asynchronously. Trigger Based: Triggers at the subscriber capture changes made to the database and submit them to the publisher. With trigger-based transaction capturing, database changes can be distributed either synchronously or asynchronously.
Database Replication Real World Example (For Lab) We perform one phase transactions using Java Database Connectivity (JDBC) which is part of the Java Software Development Kit (SDK). We add a JDBC driver to the SDK for a one phase implementation. We do a Class.forName to load the driver. We use a DriverManger.getConnection() to get the connection. We use the connection in processing. All Objects are interfaces implemented by the JDBC driver.
Database Replication Two-phase commit Real World Example To do an XA (2 phase) transactions we require an implementation of the javax packages. This occurs in an application server and is part of the JEE implementation stack. Your instructor will show a real world example.
Two Phased Commit Server 1 Savings Server 2 Checking Update Savings decrease by $500 Update Checking Increase by $500