370 likes | 384 Views
Explore the fundamentals of transactions, concurrent processing, types of failures, and recovery mechanisms in database management systems. Learn the principles of transaction viewpoints and interleaved executions. Discover OLTP and OLAP processing categories.
E N D
3. Transaction processing concepts What is a transaction? • Logical unit of work • Execution of a program that accesses or changes the database Fundamental property of a database: • The data are shared among several users and applications concurrent processing, possibly conflicting interests. AdvDB-3 J. Teuhola 2015
Transaction viewpoints • A transaction should lead the database from one consistent state to another. • Partial execution not allowed (principle:all or nothing) • Concurrent access to data by multiple transactions should be supported. • Transactions may end prematurely, due to system (hardware/software) failure;recovery mechanisms are needed. AdvDB-3 J. Teuhola 2015
Categories of database processing • OLTP = On-Line Transaction Processing • Needed for the everyday functioning of the organization’s main activities, accessing the ’operative’ databases • Short transactions and response times • OLAP = On-Line Analytic Processing • Decision support, data mining • Long read-only transactions • ’Data warehouse’ • The data need not be exactly up-to-date; refreshed periodically AdvDB-3 J. Teuhola 2015
Interleaved execution • Concurrenttransactions may be executed in an interleaved fashion. • Transactions consist of steps, between which the data may be inconsistent. • Arbitrary interleaving of steps will cause problems (interference): • Lost update problem • Temporary update problem (‘dirty read’) • Incorrect summary problem AdvDB-3 J. Teuhola 2015
Time Transaction 1: Transaction 2:Withdraw 200 from account 123 Withdraw 100 from account 123 bal = read_balance(123) = 1000 € bal = bal – 200 € bal = read_balance(123) = 1000 € bal = bal - 100 € write_balance(123, bal) = 800 € write_balance(123, bal) = 900 € Lost update problem Example: Bank account, two parallel withdrawals • The first update is lost. Note that the transactions are not aware of each other’s internal program variables (bal). AdvDB-3 J. Teuhola 2015
Time Transaction 1 Transaction 2Transfer 100 € from accountWithdraw 200 € from account 123123 to account 789 bal = read_balance(123)=1000€ bal = bal - 100€ write_balance(123, bal)=900€ bal = read_balance(123) = 900€ bal = read_balance(789)=2000€abort; recover_balance(123)=1000€ bal = bal - 200 € write_balance(123, bal)=700 €// Should be 800 € Temporary update (‘dirty read’) problem Example transactions on bank accounts: AdvDB-3 J. Teuhola 2015
Time Transaction 1 Transaction 2Sum of balances Transfer 100 € from 789 to 123 sum = 0 bal = read_balance(123) = 1000 € sum = sum + bal = 1000 € bal = read_balance(456) = 2000 € sum = sum + bal = 3000€ bal = read_balance(789) = 3000 € bal = bal – 100 € write_balance(789, bal) = 2900 € bal = read_balance(123) = 1000 € bal = bal + 100€ write_balance(123, bal) = 1100 € bal = read_balance(789) = 2900 € sum = sum + 2900 € = 5900 € // Should be 6000 € Incorrect summary problem AdvDB-3 J. Teuhola 2015
Reasons for recovery • DBMS must ensure that each transaction is either(1) executed in totality (all operations completed successfully; updates made permanent), or(2) has no effect on the database contents nor on other transactions. • In case of failure, the DBMS must roll backthe (partially executed) transaction so that (2) holds. AdvDB-3 J. Teuhola 2015
Types of failures • Hardware/software failure: contents of main memory may be lost. • Transaction error: e.g. illegal operation or user interrupt. • Logical error: Violation of database consistency constraints. • Concurrency control error: Interference of transactions, e.g. deadlock. • Disk failure, accidents, etc.: Some data are usually lost. AdvDB-3 J. Teuhola 2015
Transaction events • begin_transaction • read/write • commit: successful end; fix the changes • rollback (abort): unsuccessful ending • undo effects of a failed transaction • redo effects of a successful transaction • end_transaction AdvDB-3 J. Teuhola 2015
System log • On disk (permanent storage) • Enables recovery from failures • Stores data about transaction states and performed updates (before-/after-images) • Transactions identified by a unique id. • Log entries:start, [read,] write, commit, abort • Undo needs before-image (old value) • Redo needs after-image (new value) AdvDB-3 J. Teuhola 2015
Commit point • Database operations performed successfully • Effects recorded in the log. • At commit the log page in buffer must be force-written to disk. • Writing other buffer pages may be postponed. • At failure: • Uncommitted transactions are undone. • Committed transactions may have to be redone. AdvDB-3 J. Teuhola 2015
Checkpoint • Effects of write operations of committed transactions are forced to disk. • A transaction need not be redone at failure,if its commit-entry in the log is before the last checkpoint. • Checkpoint is taken periodically. AdvDB-3 J. Teuhola 2015
Checkpoint actions • Suspend execution. • Force-write committed updates. • Write the checkpoint info to the log: • active transactions • pointers to the first and last log entries of transactions (to speed-up recovery) • Resume execution. AdvDB-3 J. Teuhola 2015
Main memory Program 1 DBMS Program 2 DB buffer manages Logbuffer ‘Page frames’ Force-write at checkpoint Force-write at commit Before-/after-imagesof changedtuples Disk pages Log pages DB Transaction processing scenario AdvDB-3 J. Teuhola 2015
Desirable (ACID) properties of transactions • Atomicity: All or nothing. • Consistency: From one consistent state of the database to another. • Isolation: Updates are invisible before commit; no need for cascading rollbacks. • Durability: Committed changes are not lost (responsibility of the recovery method), exceptin catastrophic failures. AdvDB-3 J. Teuhola 2015
Levels of isolation 0: No overwrite of dirty reads. 1: No lost updates. 2. No dirty reads & no lost updates. 3. ‘True isolation’: No dirty reads & no lost updates & repeatable read. AdvDB-3 J. Teuhola 2015
Schedule • Assume a set of transactions {T1, ..., Tn}with ordered operations: Ti = <op1>i ; <op2>i ; ... • A schedule S is an ordered set of operations, where each Ti-sequence is a subsequenceof S. • Notice: Operations of different transactions may be interleaved. AdvDB-3 J. Teuhola 2015
Example schedule • Operation sequences of three transactions: T1: r1(x); r1(y); w1(x); c1; T2: r2(y); w2(y); c2; T3: r3(x); r3(y); w3(x); c3; • One possible interleaved schedule: r1(x); r2(y); r1(y); w1(x); r3(x); r3(y); w2(y); w3(x); c2; c1; c3; AdvDB-3 J. Teuhola 2015
Conflict of operations • Notation for operations: <op><trans-id>(<data item>)where <op> is one of {r, w, c, a} = {read, write, commit, abort} • Conflict (= danger): Two operations <op1>i(X) and <op2>j(X)may cause trouble if i j and (<op1> = w or <op2> = w) AdvDB-3 J. Teuhola 2015
Example of conflicts • Schedule:r1(x); r1(y); r2(x); w1(x); r2(y); w2(x); c1; c2; • Conflicting pairs of operations: • r1(x) and w2(x) • r2(x) and w1(x) • w1(x) and w2(x) • Not in conflict: • r1(y) and r2(y) AdvDB-3 J. Teuhola 2015
Complete schedule S is a complete schedule of T1, T2, ..., Tn, if (1) S contains operations of T1, T2, ..., Tn (but no others). (2) Operation order of each Ti is preserved. (3) Order of operations is fixed for each conflicting pair; otherwise ordering may be partial. (4) Commit/abort is the last operation of each Ti. AdvDB-3 J. Teuhola 2015
Committed projection • Committed projection of S =All operations of S belonging to committed transactions. • Reason for this concept:New transactions appear all the time;difficult to find a moment when no active transactions exist. AdvDB-3 J. Teuhola 2015
Recoverable schedule • Transaction T2 that reads X does not commit until T1 that wrote X (before the read) has committed. • Explanation:T2 can be rolled back as long as it has not committed. If T1 aborts, T2 should be rolled back to avoid consequences of dirty read. AdvDB-3 J. Teuhola 2015
Cascading rollback Cascade phenomenon: T2 has to be aborted because T1 aborted;T3 has to be aborted because T2 aborted; ... Prevention: Read an item only after the writing transaction has committed (cascadeless schedule). Strict schedule: Read/write X only after the last writer transaction of X has committed.Advantage: Simpler undo.Disadvantage: Lower level of concurrency AdvDB-3 J. Teuhola 2015
Serializability Aim: • Correct (but not necessarily serial) schedulesfor interfering transactions Means: • Sufficient isolation Serial schedule: • All operations of each transaction Ti are executed consecutively, without interleaving. AdvDB-3 J. Teuhola 2015
Serializability (cont.) Axiom: For independent transactions everyserial schedule is correct (n! alternativesfor n transactions). Problem: Limited concurrency; if one transaction waits for I/O, another cannot take over; the processor isidle. Serializable schedule: Equivalent to some serial schedule (same result, though interleaved execution); means logical correctness. AdvDB-3 J. Teuhola 2015
Forms of schedule equivalence • Result equivalence:May be accidental; it should hold for allpossible database states. • Conflict equivalence:The order of any twoconflicting operations is the same in both schedules. • View equivalence: A read operation reads the result of the same write operation (or none) in both schedules. AdvDB-3 J. Teuhola 2015
Forms of serializability Corresponding to the forms of equivalence: • Conflict serializability:Simple testing algorithm • View serializability:Less restrictive than conflict serializability.Testing is anNP-complete problem. Note. Every conflict-serializable schedule isalso view-serializable, but not vice versa. AdvDB-3 J. Teuhola 2015
Testing conflict serializability 1. For each transaction, create a node in a precedence graph. 2. For each pair of conflicting operations <op>Ti(X) and <op>Tj(X), occurring in this order, create an edge from Ti to Tj. 3. The schedule is serializable if the precedence graph has no cycles. AdvDB-3 J. Teuhola 2015
Serializable schedule equivalent serial schedule 1. Derive the precedence graph of transactionsas above. 2. Perform a topological sort of the graph nodes. 3. The resulting sequence is the solution(which may not be unique, in general). AdvDB-3 J. Teuhola 2015
Example of a precedence graph Schedule: r1(x), r2(y), w1(x), r3(x), r3(y), w3(x), w3(y), w2(y), c1, c2, c3 1 2 y x The schedule is not serializable, due to the cycle. y 3 AdvDB-3 J. Teuhola 2015
Problems of serializability • Difficult to determine the order of operations beforehand. • Scheduling must be done on-the-fly, based on waiting, priorities, system load, etc. • Testing serializability afterwards may be too optimistic (cf. optimistic concurrency control). • Serializability can be applied only to the committed projection. AdvDB-3 J. Teuhola 2015
Solution to maintaining serializability • Control the interleaving of operations by obeying rules (a protocol) that guarantee serializability without checking it.(E.g. two-phase locking protocol, see later) • Conclusion: Serializability theory helps to gain better understanding of protocols, to obtain correct interleaving and improved concurrency. AdvDB-3 J. Teuhola 2015
Debit-credit transactions • Less restrictive form of equivalence • Addition to and subtraction from anitem can be done in any order(commutativity). • A schedule is correct if<read, add/subtract, write> triplesare not interrupted. AdvDB-3 J. Teuhola 2015
Granularity • To be decided: Unit of data considered in scheduling • A smaller unit enables a higher level of concurrency (but involves also more effort to manage) • A (disk) page is often the simplest unit of operation. AdvDB-3 J. Teuhola 2015
Transaction support in SQL • Implicit BEGIN_TRANSACTION • Explicit COMMIT or ROLLBACK • SET TRANSACTION characteristics: • Access mode: READ ONLY or READ WRITE • Diagnostic area size: Number of feedback conditions held simultaneously. • Isolation level (from weakest to strongest): • READ UNCOMMITTED (allows dirty reads) • READ COMMITTED (allows non-repeatable reads) • REPEATABLE READ (allows ‘phantoms’) • SERIALIZABLE (actually stronger than defined earlier) AdvDB-3 J. Teuhola 2015