710 likes | 1.06k Views
Distributed Concurrency Control. Motivation. World-wide telephone system World-wide computer network World-wide database system Collaborative projects – the project has a database composed of smaller local databases of each researcher
E N D
Motivation • World-wide telephone system • World-wide computer network • World-wide database system • Collaborative projects – the project has a database composed of smaller local databases of each researcher • A travel company organizing vacation – it consults a local subcontractors (local companies), which list prices and quality ratings for hotels, restaurants, and fares • A library service – people looking for articles query two or more libraries
Types of distributed systems • Homogeneous federation – servers participating in the federation are logically part of a single system; they all run the same suite of protocols, and they may even be under the control of a „master site” • Homogeneous federationis characterized by distribution transparency • Heterogeneous federation - servers participating in the federation are autonomous and heterogeneous; they may run different protocols, and there is no „master site”
Types of transactions and schedules • Local transactions • Global transactions
Preliminaries • Let the federation consists of n sites, and let T = {T1, ..., Tm} be a set of global transactions • Let s1, ..., sn be local schedules • Let D = Di, where Di is a local database at site i • We assume no replication (each replica is treated as a separate data item) • A global schedule for T and s1, ..., sn is a schedule s for T such that its local projection equals the local scheduleat each site, i.e. i(s) = si for all i, 1 i n
Preliminaries • i(s) denotes the projection of the schedule s onto site i • We call the projection of a transaction T onto site i a subtransaction of T (Ti), which comprises all steps of T at the site i • Global transactions formally have to have Commit operations at all sites at which they are active • Conflict serializability – a global [local] schedule is globally [locally] conflict serializable if there exists a serial schedule over the global [local] (sub-) transactions that is conflict equivalent to s
Example 1 • Consider the federation of two sites, where D1 = (x) and D2 = (y). Then, s1 = r1(x) w2(x) and s2 = w1(y) r2(y) are local schedules, and s = r1(x) w1(y) w2(x) c1 r2(y) c2 is a global schedule • 1(s) = s1 and 2(s) = s2 • Another form of the schedule server 1: r1(x) w2(x) server 2: w1(y) r2(y)
Example 2 • Consider the federation of two sites, where D1 = (x) and D2 = (y). Assume the following schedule server 1: r1(x) w2(x) server 2: r2(y) w1(y) The schedule is not conflict serializable since the conflict serialization graph will have a cycle
Global conflict serializability Let s be a global schedule with local schedules s1, s2, ..., sn involving a set T of transactions such that each si, 1 i n, is conflict serializable. Then, the following holds: s is globally conflict serializable iff there exists a total order ‘ < ‘ on T that is consistent with the local serialization orders of the transactions (proof)
Concurrency Control Algorithms • Distributed 2PL locking algorithms • Distributed T/O algorithms • Distributed optimistic algorithms
Distributed 2PL locking algorithms • The main problem is how to determine that a transaction has reached its ‘lock point’? • Primary site 2PL – lock management is done exclusively at a a distinguished site – primary site • Distributed 2PL – when a server wants to start unlocking phase for a transaction, it communicates with all other servers regarding the locking point of that transaction • Strong 2PL – all locks acquired on behalf of a transaction are held until the transaction wants to commit (2PC)
Distributed T/O algorithms • Assume that each local site (scheduler) executes its private T/O protocol for synchronizing accesses in its portion of the database: server 1: r1(x) w2(x) server 2: r2(y) w1(y) If timestamps were assigned as in the centralized case, each of the two servers would assign a value 1 to the first transaction that it sees locally; T1 on the server 1 and T2 on the server 2, which would lead to globally incorrect result
Distributed T/O algorithms • We have to find a way to assign globally unique timestamps to transactions at all sites: • Centralized approach – a particular server is responsible for generating and distributing timestamps • Distributed approach – each server generates a unique local timestamp using a clock or counter server 1: r1(x) w2(x) server 2: r2(y) w1(y) TS(T1) =(1,1) TS(T2) =(1,2)
Distributed T/O algorithms • Lamport clock – used to solve more general problem of fixing the notion of logical time in an asynchronous network • Sites communicate through messages • Logical time is a pair (c, i), where c is nonnegative integer and i is a transaction number • The clock variable gets increased by 1 at every transaction operation; the logical time of the operation is defined as the value of the clock immediately after the operation
Distributed optimistic algorithms • Under optimistic approach, every transaction is processed in three phases • Problem: how to ensure that validation comes to the same resultat every site where a global transaction has been active • Not implemented
Distributed Deadlock Detection • Problem: global deadlock, which cannot be detected by local means only (each server keeps a WFG locally) Site 3 Site 1 wait for message T1 T1 T2 T3 wait for lock T2 T3 Site 2
Distributed Deadlock Detection • Centralized detection – centralized monitor collecting local WFGs • performance • false deadlocks • Timeout approach • Distributed approaches: • Edge chasing • Path pushing
Distributed Deadlock Detection • Edge chasing – each transaction that becomes blocked in a wait relationship sends its identifier in a special message called probe to the blocking transaction. If a transaction receives a probe, it forwards it to all transactions by which it is itself blocked. If the probe comes back to the transaction by which it was initiated – this transaction knows that it is participating in a cycle and hence it is part of a deadlock
Distributed Deadlock Detection • Path pushing – entire paths are circulated between transactions instead of single transaction identifiers. • The basic algorithm is as follows: • Each server that has a wait-for path from transaction Ti to transaction Tj such that Ti has an incoming waits-for message edge and Tj has an outgoing waits-for message edge sends that path to the server along the outgoing edge, provided the identifier of Ti is smaller than that of Tj • Upon receiving a path, the server concatenates this with the local paths that already exists, and forwards the result along its outgoing edges again. If there exists a cycle among n servers, at least one of them will detect that cycle in at most n such rounds
Distributed Deadlock Detection • Consider the deadlock example: Site 1 Site 2 Site 3 T1 T2 T2 T3 T1 T2 T3 Site 3 knows that T3 T1 locally and detects global deadlock
Preliminaries • A heterogeneous distributed database system which integrates pre-existing external data sources to support global applications accessing more than one external data source • HDDBS vs LDBS • Local autonomy and heterogeneity of local data sources • Design autonomy • Communication autonomy • Execution autonomy • Local autonomy reflects the fact that local data sources were designed and implemented independently and were totally unaware of the integration process
Preliminaries • Design autonomy: it refers to the capability of a database system to choose its own data model and implementation procedures • Communication autonomy: it refers to the capability of a database system to decide what other systems it will communicate with and what information it will exchange with them • Execution autonomy: it refers to the capability of a database system to decide how and when to execute requests received from other systems
Difficulties • Actions of a transaction may be executed in different EDSs, one of which has system that use locks to guarantee the serializability, while another one may use timestamps • Guaranteeing the properties of transactions may restrict local autonomy, e.g. to guarantee the atomicity, the participating EDSs must execute some type of a commit protocol • EDSs may not provide the necessary functionality to implement the required global coordination protocols. Ref. To commit protocol, it is necessary for EDS to become prepared, guaranteeing that the local actions of a transaction can be completed. Existing EDSs may not allow a transaction to enter this state
HDDBS model Global transactions Global Transaction Manager (GTM) Local Transaction Manager (LTM) Local Transaction Manager (LTM) Local transactions Local transactions External Data Source EDS2 External Data Source EDS1
Basic notation • HDDBS consists of a set D of external data sources and a set of transactions T • D = {D1, D2, ..., Dn} Di – i-th external data source • = T T1 T2 ... Tn • T – a set of global transactions • Ti – a set of local transactions that access Di only
Example • Given a federation of two servers: D1 = { a, b} D2 = {c, d, e} D={a, b, c, d, e} • Local transactions: T1 = r(a) w(b) T2 = w(d) r(e) • Global transactions: T3 = w(a) r(d) T4 = w(b) r(c) w(e) • Local schedules: s1: r1(a) w3(a) c3 w1(b) c1 w4(b) c4 s2: r4(c) w2(d) r3(d) c3 r2(e) c2 w4(e) c4
Global schedule Let the heterogeneous federation consists of n sites, and let T1, ..., Tn be sets of local transactions at sites 1, ..., n, T be a set of global transactions. Finally, let s1, s2, ..., sn. A (heterogeneous) global schedule (for s1, ..., sn) is a schedule s for such that its local projection equals the local scheduleat each site, i.e. i(s) = si for all i, 1 i n
Correctness of schedules • Given a federation of two servers: D1 = { a } D2 = {b, c} • Given two global transactions T1 and T2 and a local transaction T3: T1 = r(a) w(b) T2= w(a) r(c) T3 = r(b) w(c) • Assume the following local schedules: server 1: r1(a) w2(a) server 2: r3(b) w1(b) r2(c) w3(c) • Transactions T1 and T2 are executed strictly serially at both sites – the global schedule is not globally serializable indirect conflict
Global serializability • In a heterogeneous federation GTM has no direct control over local schedules; the best it can do is to control the serialization order of global transactions by carefully controlling the order in which operations are sent to local systems for execution and in which these get acknowledged. • Indirect conflict: Ti and Tk are in indirect conflict in si if there exists a sequence T1, ..., Tr of transactions in si such that Ti is in si in a direct conflict with T1; Tj is in si in a direct conflict with Tj+1, 1jr-1, and Tr is in si in a direct conflict with Tk • Conflict equivalence: two schedules contain the same operations and the same direct and indirect conflicts
Global serializability • Global Conflict Serialization Graph: Let s be a global schedule for the local schedules s1, s2, ..., sn; let G(si) denote the conflict serialization graph of si, 1 i n, derived from direct and indirect conflicts. The global conflict serialization graph of s is defined as the union of all G(si), 1 i n, i.e. • Global serializability theorem Let the local schedules s1, s2, ..., sn be given, where each G(si), 1 i n, is acyclic. Let s be a global schedule for the si, 1 i n. The global schedule s is globally conflict serializable iff G(s) is acyclic
Global serializability - problems • To ensure the global serializability the serialization order of global transactions must be the same in all sites they execute • Serialization orders of local schedules must be validated by the HDDBS • These orders are neither reported by EDSs, nor • They can be determined by controlling the submission of the global subtransactions or observing their execution order to check
Example • Globall non-serializable schedule s1: w1(a) r2(a) T1 T2 s2: w2(c) r3(c) w3(b) r1(b) T2 T3 T1 • Globally serializable schedule s1: w1(a) r2(a) T1 T2 s2: w2(c) r1(b) • Globall non-serializable schedule s1: w1(a) r2(a) T1 T2 s2: w3(b) r1(b) w2(c) r3(c) T2 T3 T1
Quasi serializability • Rejecting global serializability as the correctness criterion • The basic idea: we assume that no value dependencies exist among EDSs so indirect conflicts can be ignored • Inorder to preserve global database consistency, only global transactions needs to be executed in a serializable way with proper consideration of the effects of local transactions
Quasi serializability • Quasi-serial schedule A set of local schedules {s1, ..., sn} is quasi serial if each si is conflict serializable and there exists a total order „<„ on the set T of global transactions such that Ti < Tj for Ti, Tj T, i j, implies that in each local schedule si, 1 i n, the Ti subtransaction occurs completely before Tj subtransaction • Quasi serializability A set of local schedules {s1, ..., sn} is quasi serializable if there exists a set {s1’, ..., sn’} of quasi serial local schedules such that si is conflict equivalent to si’ for 1 i n.
Example (1) • Given a federation of two servers: D1 = { a, b } D2 = {c, d, e} • Given two global transactions T1 and T2 and two local transactions T3 and T4: T1 = w(a) r(d)T2= r(b) r(c) w(e) T3 = r(a) w(b) T4= w(d) r(e) • Assume the following local schedules: s1: w1(a) r3(a) w3(b) r2(b) s2: r2(c) w4(d) r1(d) w2(e) r4(e)
Example (2) • The set {s1, s2} is quasi serializable, since it is conflict equivalent to the quasi serial set {s1, s2’}, where s2’ : w4(d) r1(d) r2(c) w2(e) r4(e) • The global schedule s: w1(a) r3(a) r2(c) w4(d) r1(d) c1 w3(b) c3 r2(b) w2(e) c2 r4(e) c4 is quasi serializable; however, s is not globally serializable • Since the quasi-serialization order is always compatible with the orderings of subtransactions in the various local schedules, quasi serializability is relatively easy to achieve for a GTM
Achieving Global Serializability through Local Guarantees - Rigorousness • GTM assume that local schedules are conflict serializable • There are various scenarios for guaranteeing global serializability • Rigorousness: local schedulers produce conflict-serializable rigorous schedules. The schedule is rigorous if it satisfies the following condition: oi(x) <s oj(x), i j, oi, oj in conflict aj<s oj(x) or cj <s oj(x) • Schedules in RG avoid any type of rw, wr, or ww conflict between uncommitted transactions
Achieving Global Serializability through Local Guarantees - Rigorousness • Given a federation of two servers: D1 = { a, b } D2 = {c, d } • Given two global transactions T1 and T2 and two local transactions T3 and T4: T1 = w(a) w(d) T2= w(c) w(b) T3 = r(a) r(b) T4= r(c) r(d) • Assume the following local schedules: s1: w1(a) c1 r3(a) r3(b) c3 w2(b) c2 s2: w2(c) c2 r4(c) r4(d) c4 w1(d) c1 • Both schedules are rigorous, but they yield different serialization orders
Achieving Global Serializability through Local Guarantees - Rigorousness • Commit-deferred transactions: A global transaction T is commit-deferred if its commit operation is sent by GTM to local sites only after the local executions of all data operations from T have been acknowledged at all sites • Theorem: If si RG, 1 i n, and all global transactions are commit-deferred, then s is globally serializable
Possible solutions • Bottom-up approach: observing the execution of global transactions at each EDS. Idea: the execution order of global transactions is determined by their serialization orders at each EDS Problem: how to determine serialization order of gl. trans. • Top-down approach: controlling the submission and execution order of global transactions Idea: GTM determines a global serialization order for global transactions before submitting them to EDSs. It is EDSs responsibility to enforce the order at local sites Problem: how the order is enforced at local sites
Ticket-Based Method • How GTM can obtain information about relative order of subtransactions of global transactions at each EDSs? • How GTM can guarantee that subtransactions of each global transaction have the same relative order in all participating EDSs? • Idea: to force local direct conflicts between global transactions or to convert indirect conflicts (not observable by the GTM) into direct (observable) conflicts
Ticket-Based Method • Ticket: a ticket is a logical timestamp whose value is stored as a special data item in each EDS • Each subtransaction is required to issue the Take_A_Ticket operation: r(ticket) w(ticket+1) (critical section) • Only subtransactions of global transactions have to take tickets • Theorem: If global transaction T1 takes its ticket before global transaction T2 in a server, then T1 will be serialized before T2 by that server • or tickets obtained by subtransactions determine their relative serialization order
Example (1) • Given a federation of two servers: D1 = { a } D2 = {b, c } • Given two global transactions T1 and T2 and a local transaction T3: T1 = r(a) w(b) T2= w(a) r(c) T3 = r(b) w(c) • Assume the following local schedules: s1: r1(a) c1 w2(a) c2 T1 T2 s2: r3(b) w1(b) c1 r2(c) c2 w3(c) c3 the schedule is not globally serializable: T2 T3 T1
Example (2) • Using tickets, the local schedules look as follows s1: r1(I1) w1(I1+1) r1(a) c1 r2(I1) w2(I1+1) w2(a) c2 s2: r3(b) r1(I2) w1(I2+1) w1(b) c1 r2(I2) w2(I2+1) r2(c) c2 w3(c) c3 • Indirect conflict between global transactions in the schedule s2 has been turned into an explicit one; the schedule s2 is not conflict serializable T3 T2 T1
Example (3) • Consider another set of schedules: s1: r1(I1) w1(I1+1) r1(a) c1 r2(I1) w2(I1+1) w2(a) c2 s2: r3(b) r2(I2) w2(I2+1)r1(I2) w1(I2+1) w1(b) c1 r2(c) c2 w3(c) c3 Now, both schedules are conflict serializable – tickets obtained by transactions determine their serialization order
Optimistic ticket method • Optimistic ticket method (OTM): GTM must ensure that the subtransactions have the same relative serialization order in their corresponding EDSs • Idea: is to allow the subtransactions to proceed but to commit them only if their ticket values have the same relative order in all participating EDSs • Requirement: EDSs must support a visible ‘prepare_to_commit’ state for all subtransactions • ‘Prepare_to_commit’ state is visible if the application program can decide whether the transaction should commit or abort
Optimistic ticket method • A global transaction T proceed as follows: • GTM sets a timeout for T • Submits all subtransactions of T to their corresponding EDSs • If they enter their ‘p_t_c’ state, they wait for the GTM to validate T • Commit or abort is broadcasted • GTM validates T using Ticket graph – the graph is tested for cycles involving T • Problems with OTM • Global aborts caused by ticket operations • Probability of global deadlocks increases
Cache Coherence and Concurrency Control for Data-Sharing Systems