280 likes | 379 Views
§1. Recovery. Recovery (and concurrency) is tied to notion of transaction processing Transaction is a logical unit of work.
E N D
§1. Recovery Recovery (and concurrency) is tied to notion of transaction processing Transaction is a logical unit of work. Specifically in DBMS context, it is a sequence of (usually) several database operations that transform a consistent state of the database into another consistent state (or do not change it). Consistencyis not necessarily preservedat intermediate points during processing. http://csiweb.ucd.ie/Staff/acater/comp30150.html
Example: consider the relational-database tables: Supplier S# Sname City (S) S3 Acme Cork … … … Shipments S# P# Qty (SP) S3 P117 0144 … … … and the user request: “Change supplier no S3 to no S9” => must change both tables above -- “cascade” the update. http://csiweb.ucd.ie/Staff/acater/comp30150.html
TRANEX1: PROC OPTIONS ( MAIN ) ; /* declarations omitted */ GET LIST ( SX , SY ); /* get values from user */ EXEC SQL UPDATE S SET S# = :SY WHERE S# = :SX ; EXEC SQL UPDATE SP SET S# = :SY WHERE S# = :SX ; RETURN ; END /* TRANEX1 */ ; http://csiweb.ucd.ie/Staff/acater/comp30150.html
NB: This “simple update” involves 2 actual updates to database. In between, database is not consistent, SP records might exist with no S records corresponding. Must guard against doing 1 update but not the other, which would leave database inconsistent. Ideally one wants a guarantee that both updates will be carried out - but that is impossible. Many possible sources of failure: OS errors DBMS errors Program errors Operator mistake Hardware failures Power cuts Fire Sabotage etc http://csiweb.ucd.ie/Staff/acater/comp30150.html
But all is not lost - you can be guaranteed recovery The key to recovery is redundancy : redundant information is stored in dumps and logs, and provides the ability to partially or completely reconstruct the database. Outline of commonest recovery mechanism: Dump total database regularly; Record all changes in log; If failure occurs, either • Restore DB from dump and redo completed txns if possible • Undo uncompleted txns Duplexing is also possible: keep 2 copies of database and update both simultaneously. But ensure that they have different failure modes. http://csiweb.ucd.ie/Staff/acater/comp30150.html
With "transaction processing" you are guaranteed that if a failure occurs during updates, the work of transactions will either be redone (if properly finished) or undone. Transactions must appear atomic to the end-user -- executed in their entirety or not at all. Achieve this by commit and rollback operations. • Commit signals successful end-of-transaction: database is believed consistent, update can be made permanent -- “committed”. • Rollback = unsuccessful end-of-transaction; tells system that all updates within the transaction must be “rolled back” i.e. undone. http://csiweb.ucd.ie/Staff/acater/comp30150.html
TRANEX2: PROC OPTIONS ( MAIN ) ; /* declarations omitted */ EXEC SQL WHENEVER SQLERROR GO TO UNDO ; GET LIST ( SX , SY ); /* get values from user */ EXEC SQL UPDATE S SET S# = :SY WHERE S# = :SX ; EXEC SQL UPDATE SP SET S# = :SY WHERE S# = :SX ; EXEC SQL COMMIT ; GO TO FINISH ; UNDO: EXEC SQL ROLLBACK ; FINISH: RETURN ; END /* TRANEX2 */ ; http://csiweb.ucd.ie/Staff/acater/comp30150.html
In “TRANEX2” example we issue commit if we get through the updates successfully. But if either update fails it “raises an error condition” and a program-initiated rollback is issued to undo changes. Commit/Rollback operations may not appear in code but may be implicit - depends on implementation. Rollback of transactions in progress at the time of a crash should also happen automatically upon DBMS restart. Messages are also an issue: (see later). http://csiweb.ucd.ie/Staff/acater/comp30150.html
Undoing updates System keeps log ( = journal ) of all update operations. Log records values of items before and after any change, identifying the item changed/ deleted/ inserted, and the id of the transaction. This can be used to restore database to consistent state. When a transaction commits, log also records that fact. Log may be stored online or online/offline combination. For busy multi-user systems the log can quickly become very large. http://csiweb.ucd.ie/Staff/acater/comp30150.html
Synchronisation points (SYNCH points) Executing a Commit or Rollback operation establishes a synchpoint: it represents the boundary between two consecutive transactions - a point where the database is consistent. Synchpoints are only established by Commit, Rollback, and normal “program termination”. Once established: - all updates since previous synchpoint are committed or undone; - all database positioning is lost (see below - “cursors” are closed) - all record locks are released (see later - “concurrency”) http://csiweb.ucd.ie/Staff/acater/comp30150.html
Cursors When SQL is embedded in a host language like COBOL (or C or C++ or Java or …), there is a need to bridge between the set-at-a-time nature of SQL and the record-at-a-time nature of COBOL (…). This is done using cursors - pointers that allow you to run through a set of records, pointing to each one in turn. Cursors allow the host language to process the records in the way natural for it; SQL on its own does not need them. http://csiweb.ucd.ie/Staff/acater/comp30150.html
The process is illustrated in outline in the example, which is intended to retrieve supplier details (S#, SNAME, and STATUS) for all suppliers in the city given by the host variable Y. EXEC SQL DECLARE X CURSOR FOR /* define cursor X */ SELECT S#, SNAME, STATUS FROM S WHERE CITY = :Y; EXEC SQL OPEN X; /* execute the query /* DO WHILE (more-records-to-come); EXEC SQL FETCH X INTO :S#,:SNAME,:STATUS; /* fetch next supplier */ ........ /* and then do something! */ END; EXEC SQL CLOSE X; /*deactivate the cursor*/ http://csiweb.ucd.ie/Staff/acater/comp30150.html
Commit / Rollback terminate transactions, not the program. A given program may carry out numerous consecutive transactions. A transaction is a unit of work; it is also a unit of recovery. If a program commits, then its updates must be guaranteed even if a crash occurs before the updates are flushed to disk. System restart after a crash should install updates in the database from entries in the log. Implication: one must write the log before Commit operation finishes - so-called “Write-Ahead Log Protocol” http://csiweb.ucd.ie/Staff/acater/comp30150.html
System and Media Recovery Local failure -- affects only current transaction e.g. arithmetic overflow error Global failure -- affects all transactions in progress at time of failure. Include: • System failure ("soft crash") • no physical damage • Media failure ("hard crash") • database is physically damaged, eg disk head crash http://csiweb.ucd.ie/Staff/acater/comp30150.html
Media failure - physical damage. • Restore database from backup, redo transactions that had completed. • No need to undo. • Can use standard dump/restore software. System failure -- no damage, but • The contents of memory are lost; • so the state of transactions is lost; • so transactions cannot be completed; • so they must be undone at restart time. • May also have to redo transactions that had been finished but not flushed-to-disk. • If the log is very large, restart can be very expensive. http://csiweb.ucd.ie/Staff/acater/comp30150.html
T1 … T5 here are meant to be classes of transactions System uses checkpoints to help identify rapidly which transactions to undo, which transactions to redo. System "takes a checkpoint" at regular intervals: it flushes buffers to disk, and writes checkpoint record to physical log. Thus it records all transactions in progress at time of checkpoint. http://csiweb.ucd.ie/Staff/acater/comp30150.html
Classes of transactions to be undone: those in checkpoint record (like T3) or begun after it (like T5) without Commit in log Classes of transactions to be redone: those in checkpoint record (like T2) or begun after it (like T4) that do have a Commit in log (but their changes are perhaps not flushed to disk) System restart will use log to undo, and then redo, appropriate transactions. Only then will normal activity begin. http://csiweb.ucd.ie/Staff/acater/comp30150.html
Message handling Non-trivial in transaction processing. Eg: Transfer $100 from 68224 to 97636 "Transaction" should update database and issue message to user. If it does a Commit (or a voluntary Rollback), then an appropriate message should be sent. But in the event of a system failure, neither message should be displayed - just as if the transaction had never started. (One could display system-generated failure message.) So should not output messages until end-of-transaction. http://csiweb.ucd.ie/Staff/acater/comp30150.html
TRANSFER: PROC; GET (FROM, TO, AMOUNT); /* input message */ FIND UNIQUE (ACCOUNT WHERE ACCOUNT# = FROM); /* now decrement the FROM balance */ ASSIGN (BALANCE - AMOUNT) TO BALANCE; IF BALANCE < 0 THEN DO; PUT ('INSUFFICIENT FUNDS'); /*output msg*/ /* undo update & terminate transaction */ ROLLBACK; END; ELSE DO; FIND UNIQUE (ACCOUNT WHERE ACCOUNT# = TO); /* now increment the TO balance*/ ASSIGN (BALANCE + AMOUNT) TO BALANCE; PUT ('TRANSFER COMPLETE'); /*output msg*/ /* commit update and terminate transaction*/ COMMIT; END; END /* TRANSFER */ http://csiweb.ucd.ie/Staff/acater/comp30150.html
Place messages in pending queue, to be delivered on termination or discarded on failure. [In the case of a cash dispensing terminal, the delivery of your money is one message] Messages are handled by Data Communications Manager (DCM): when it receives input it places it in a queue. GET retrieves a copy from queue & logs it. PUT outputs to queue Commit/Rollback cause DCM to log messages, transmit them or clear input queues. DCM cancels output messages on failure. Log is used for redo. http://csiweb.ucd.ie/Staff/acater/comp30150.html
Transaction structure - general format • - accept input • - perform database processing • - send output http://csiweb.ucd.ie/Staff/acater/comp30150.html
Undo & Redo are idempotent : System can fail during Undo/Redo. Must ensure that • Undo(Undo(Undo...(x))) = Undo(x) • Redo(Redo(Redo...(x))) = Redo(x) ie that Undoing a change any number of times has the same effect as undoing it once; and similarly, Redoing it any number of times has the same effect as Redoing once. http://csiweb.ucd.ie/Staff/acater/comp30150.html
3 types of System Startup • Cold start: start from scratch. Normally only when 1st installed, but also possible after disasters (esp media failure) • Warm start: Startup after controlled shutdown. No need for Redo/Undo. • Emergency restart: a process invoked by operator after failure. Involves Redo/Undo and - perhaps - reloading the database. http://csiweb.ucd.ie/Staff/acater/comp30150.html
Two phase Commit When a transaction commits, you are guaranteed that a recovery manager will be able to redo the transaction in the event of a failure. (force-write log). If transaction involves 2 (or more) systems - eg in a distributed DBMS - recovery is more difficult: there are 2 (or more) independent recovery mechanisms. The aim is still to preserve the "all or nothing" principle of transaction processing. This leads to two phase commit. Need to be able to exercise control over different systems, so need one system to act as coordinator component. http://csiweb.ucd.ie/Staff/acater/comp30150.html
Individual transactions now send Commit/ Rollback to coordinator, which operates in 2 phases: • request all participants in the transaction to get ready to go, and send an acknowledgement to coordinator - OK or NOT OK • coordinator then broadcasts Commit to all participants, if all replies were OK; or Rollback if not all OK or if timeout occurred. http://csiweb.ucd.ie/Staff/acater/comp30150.html
IN COORDINATOR: for each participant send “get ready to commit” to participant; wait for reply or timeout; if all participants replied ‘OK’ then forcewrite “broadcasting COMMIT” to coordinator log for each participant until acknowledgement received send “COMMIT” to participant wait for acknowledgement or timeout else forcewrite “broadcasting ROLLBACK” to coordinator log for each participant until acknowledgement received send “ROLLBACK” to participant wait for acknowledgement or timeout http://csiweb.ucd.ie/Staff/acater/comp30150.html
IN PARTICIPANT: wait for “get ready to commit” message; force outstanding change records to local log; force “agree to commit” to local log; if errors occurred then send ‘NOT OK’ to coordinator else send ‘OK’ to coordinator wait for broadcast command from coordinator if command is “COMMIT” then commit changes to local resources if command is “ROLLBACK” then undo changes to local resources send acknowledgement to coordinator http://csiweb.ucd.ie/Staff/acater/comp30150.html
Note 1) Timeout 2) Resources are held until global termination - termination on all systems. In the event of failure :- • in coordinator • before broadcasting: restart should issue Rollback • after: restart procedure issues Commit or Rollback as appropriate • in participant • before "agree to commit”: restart issues "NOT OK" (note that timeout will have occurred anyway) • after: ask coordinator to rebroadcast message, and undo/redo transaction locally http://csiweb.ucd.ie/Staff/acater/comp30150.html