Backup and Recovery

Backup and Recovery Part 1

How oracle handle changes to data • Changes to data are results of: • Table being updated/deleted/inserted • Database objects (tables, views, users) being created/altered/dropped – (internally those result in changes to Oracle data dictionary) • Changes are always handled at the data block level: • Oracle keeps track of which blocks have changed and when they have changed

How oracle handle changes to data • When data block is changed: • It is modified in memory and marked as dirty • Change record (redo record) is added to the log buffer • Log buffer is later written to the current redo log file • Dirty blocks are later written to data files

Dirty blocks • Dirty blocks are written to data files: • When there is no space left in memory, e.g.: • Query execution needs to load data into memory • If there is no free memory, some old data needs to be flushed • If flushed buffers are dirty, they need to be written to data files • When database is being closed • During checkpoint • Sometimes, during log switch • In some other situations too (depending on configuration)

Log buffer • Log buffer – memory buffer • Every change to data block is recorded in a log buffer (redo record) • Log buffer is written to current redo log file: • At every commit • When the buffer is full (or almost full) • Redo records are never used during normal database operation, they are used for: • Automatic instance recovery • Manual media recovery

Changes to data – summary • Changes to data: • Are written to data files (delayed) • Are written to redo log files (almost immediately) • In case of instance failure: • Oracle uses redo log files to recover changes that were not written to data files

Redo log files • Database must contain at least two redo log files (online redo log files) • Redo log files are written sequentially: • Redo log 1 • Redo log 2 • … • Redo log n • Redo log 1 • At any time, there is exactly one current redo log file (the one being written by the database)

Redo log files • Online redo log can be in three states: • current – database is writing changes to this redo log (exactly one online redo log is always current) • active – if the database crashes now, redo log will be used for recovery – it contains changes not yet written to data files • inactive – redo log contains changes not required for database recovery – all information is already written to data files

Checkpoint • Redo log is active as long as there are dirty buffers in the database corresponding to changes written to that redo log • At checkpointall dirty buffers are written to data files • After checkpoint: • One redo log file is current • All other redo log files are inactive • Checkpoint can be triggered manually: • alter system checkpoint

Log switch • Online redo log files have fixed size • When the online redo log file becomes full, Oracle switches to the next file – log switch • log file to switch to must be inactive • log switch can be triggered manually by the DBA • alter system switch logfile

Checkpoint at log switch • Sometimes Oracle wants to do a log switch, but the next redo log is active, then Oracle: • stops all database operation • performs a checkpoint • when the next redo log file becomes inactive, resumes operation • prints message to alert log: "checkpoint not complete"

System change number • System change number – SCN: • Oracle counts all transactions in the database • Every time a transaction is committed, SCN is incremented • SCN is stored in: • Control file – to know the current SCN of the database • Data files – to "timestamp" the file. If file is replaced with an old version from backup – Oracle detects it by comparing the SCN in control file with SCN in data file

Oracle startup sequence • At startup Oracle performs 3 steps • STARTUP NOMOUNT – start the instance • STARTUP MOUNT – open all control files • OPEN DATABASE – open all remaining files – data files, temp files, online redo logs

Instance startup • During instance startup (STARTUP NOMOUNT) Oracle: • Reads instance parameters from PFILE or SPFILE • Starts database processes according to the parameters • Finds the location of control files, but does not open them

Mounting the database • During database mount: • Oracle opens and reads all copies of Control files • If there is a problem with any of the control files, the database cannot be mounted • Oracle reads location of all data files, temp files, redo log files, but does not open them

Opening the database • During database open Oracle: • Opens and reads headers of all data files • If the database was closed properly (SCN in each header file matches SCN in control file), there are no additional steps • If the database was not closed properly Oracle tries to perform automatic instance recovery

Automatic instance recovery • During instance recovery Oracle: • Restores all data files to the state just before the crash • Redo log files, which were active or current at the time of the crash are read and changes from them are applied to the data files • This phase is called rolling forward phase • Rolls back all transactions which were not committed at the time of the crash • This phase is called rolling back phase

Media recovery • Instance recovery can fail for the following reasons: • Data file can be missing, corrupted • Online redo log can be missing, corrupted • Data file can be too old to be recovered (e.g. data file was restored from old backup) • Data file is too new (e.g. all files except one were restored from the backup) • If instance recovery fails, manual media recovery must be used to recover the database

Backups • There are two basic types of backups: • Offline backup – backup of an inactive database, that was shut down properly • Online backup – backup of an open database • Oracle backups are performed by copying Oracle files at operating system level (Oracle is not involved).

Offline backup • When performing offline backup: • backup all data files, control files, server parameter file (or parameter file) • do not backup online redo log files! (redo log files are used for recovery, they are not used for clean database startup) • Note: database must be shut down cleanly (e.g. shutdown immediate). After shutdown abort redo logs are required to open the database

Restoring offline backup • To restore offline backup to original database directory: • Restore all control files, data files, temp files from backup • If necessary, restore parameter file or server parameter file • Do not restore online redo logs (you can delete old online redo logs if they are present) • Startup and open the database with resetlogs option: • STARTUP MOUNT • ALTER DATABASE OPEN RESETLOGS

Archivelog mode • Database can operate in two modes: • noarchivelogmode – redo logs can be overwritten as soon as they become inactive • archivelog mode – redo logs are archived to safe location before they can be overwritten • Archivelog mode enables to recover database after media failure • In noarchivelog • database can recover from instance failure • usually database cannot recover from media failure

Archivelog mode • To switch between archivelog and noarchivelog mode: • startup mount • alter database archivelog/noarchivelog • archive log list – shows information • alter database open • After switching database modes, shutdown the database and do offline backup.

Archivelog mode • Archivelog mode enables: • online backups – backups done while the database is running • media recovery – recovering from loss of a data file • point in time recovery – recovery until specified point in time or SCN – useful for recovering from human errors

Media recovery • Media recovery – recovery after a loss of a data file • Media recovery requires: • backup (online or offline) from before the crash • archived redo logs from the time of the backup to the time of the crash • online redo logs • Complete recovery – recovery of all committed transactions • Incomplete recovery – not all committed transactions are recovered, some transactions are lost

Incomplete recovery • Incomplete recovery is performed: • when some redo log files are missing, e.g. one of the archived redo logs is missingor online redo log is missing • when doing point in time recovery (recovery until specified time or SCN) • After incomplete recovery the database must be opened with RESETLOGS option

Incomplete recovery - example • Database running in ARCHIVELOG mode • SCN: 1000 – Full backup • SCN: 1250 – one of the archived redo logs is accidentally erased • SCN: 1500 – disk failure, data file is lost • Recovery: • restore the backup • recover database using archived redo logs • recovery stops at SCN 1250 and the remaining transactions are lost • we open the database at SCN 1250

Example cont. • After recovery – full backup at SCN 1250 • The database is running and fails again at SCN 1700 • We have two sets of archived logs: • SCN 1000 – 1500 (with missing 1250) • SCN 1250 – 1700 • If incorrect redo logs are used during recovery – database becomes corrupted • To prevent incorrect usage of archived logs, Oracle requires RESETLOGS option after incomplete recovery

Complete recovery • Online redo logs are required for complete recovery • Committed transactions are not lost (as in incomplete recovery) • There is no need to open database with RESETLOGS option

Performing recovery • Recovery can be performed on open or closed database • For open database recovery, recovered datafiles/tablespaces must be taken offline • SYSTEM tablespace cannot be taken offline - SYSTEM tablespace can only be recovered on closed database • It is possible to recover: • Single datafile: RECOVER DATAFILE 'path'; • Single tablespace: RECOVER TABLESPACE users; • Entire database: RECOVER DATABASE;

Closed database recovery • Copy damaged files from backup (only damaged files, not all data files) • Make archived redo logs available to database • startup mount • Issue recover command: • recover database (for entire database recovery) • recover tablespace users (tablespace recovery) • recover datafile ‘filename’ (datafile recovery) • alter database open (after complete recovery) • alter database open resetlogs (after incomplete recovery)

Open database recovery • Damaged datafiles are automatically taken offline by Oracle • Make damaged tablespace offline: • alter tablespace XXX offline temporary; • Copy damaged files from backup (only damaged files, not all data files) • Make archived redo logs available to database • Issue recover command: • recover tablespace XXX(tablespace recovery) • alter tablespace XXX online;

Opening damaged database • Database can be opened with some damaged datafiles (except files from SYSTEM tablespace) • In order to open the database: • startup mount; • alter database datafile ‘filename’ offline; • alter database open • alter tablespace XXX offline; • To recover the datafile/tablespace perform open database recovery

Recovery • In order to open database after recovery all datafiles must be recovered until the same SCN • Recovery on open database must be complete in order to make recovered file online • Incomplete recovery requires restoring full backup, not only damaged files • Incomplete recovery can be: • time based (recover until time) • change based (recover until SCN) • cancel based (user is prompted for redo logs and can stop the recovery at any time)

Incomplete recovery • Time based recovery: STARTUP MOUNT RECOVER DATABASE UNTIL TIME '2004-04-01:15:12:00‘ ALTER DATABASE OPEN RESETLOGS • Change based recovery: STARTUP MOUNT RECOVER DATABASE UNTIL CHANGE 100343 ALTER DATABASE OPEN RESETLOGS

Incomplete recovery • Cancel based recovery: STARTUP MOUNT RECOVER DATABASE UNTIL CANCEL ... answer questions ... CANCEL ALTER DATABASE OPEN RESETLOGS

NOARCHIVELOG database • Database in NOARCHIVELOG mode cannot be recovered • In case of failure – restore most recent backup of datafiles and controlfiles (don’t backup and restore online redo logs!) • Execute: STARTUP MOUNT RECOVER DATABASE UNTIL CANCEL CANCEL ALTER DATABASE OPEN RESETLOGS

Failures while Oracle is running • If Oracle detects disk failure while it is running: • control file or log file -> terminate instance • SYSTEM tablespace datafile -> terminate instance • other datafile: • in NOARCHIVELOG mode -> terminate instance • in ARCHIVELOG mode -> take datafile offline, continue running

Backup and Recovery