Standby Your Standby Too

Standby Your Standby Too Michael S. Abbey The Pythian Group Ottawa ON, Canada abbey@pythian.com

Agenda • Ensuring the standby can be used • Creating a sort segment • Standby too far behind master • Adding datafile or tablespace to master • Archived redo logs -- on the master and on the standby • A few special treats

Assumptions • Standby is set up • Archived redo logs are going over to remote server • Managed recovery or initiation of manual recovery in place • Mechanism exists to monitor standby

Do you wanna know a secret monitoring setup Setup is only a small piece of the pie, only gets you a fraction of the way there to having a true standby solution!!

Push INIT file to standby site Push successful hot backup to standby site Push archived redo logs to bring up-to-date De-compress hot backup if necessary Create standby control file Push and propogate standby control file Initiate recovery until standby runs out of archived redo logs Crontab master/standby Setup checklist – a review

Crontabs • Master • Timed log switches • Push archived redo logs to standby site • Standby • Timed recovery activity • Archived redo log cleanup

Ensuring the standby can be used • Double-edged sword • Is it up-to-date • Can it be opened read-only • Are the appropriate files in place • Oracle Net • tnsnames.ora • listener.ora • Initialization parameter file

Is it up-to-date SQL> select max(sequence#),max(recid), 2 to_char(max(completion_time), 3 'dd-mon-yyyy hh24:mi:ss') 4 from v$archived_log; 147 58 05-feb-2002 05:05:10 May differ

Can it be opened read-only set echo on spool stchk alter database open read only; select * from dual; shutdown immediate startup nomount alter database mount standby database; spool off exit 8i

Can it be opened read-only 9i set echo on spool stchk alter database open read only; select * from dual; spool off exit

Are the right files in the right places? • The right way • Tnsnames.ora, listener.ora, and initXYZ.ora • Replication of DBA-related scripts • Matching directory structure • Wrong way • Just run over to the master and get them as part of the failover !! • What master??

Time is of the essence • Make sure your failover does not become a failed-over • Ensure your standby does not mutate to a stood-by SQL> startup LRM-00109: could not open parameter file '/u01/app/oracle/product/8.1.7/dbs/inityatfg.ora' ORA-01078: failure in processing system parameters

Creating a sort segment • In read only mode • No changes can be made to any tablespaces • Data dictionary is read only • When activated • Archived redo logs do no tracking of temp file activity • Must match setup on master site

Creation of temp files Error: ORA-25153 Text: Temporary Tablespace is Empty --------------------------------------------------------------------------- Cause: An attempt was made to use space in a temporary tablespace with no files. Action: Add files to the tablespace using ADD TEMPFILE command. alter tablespace loc_temp add tempfile '/ora01/oradata/tpg/loc_temp01.dbf' reuse; alter tablespace loc_temp add tempfile '/ora02/oradata/tpg/loc_temp02.dbf' reuse;

Trap create statements as part of select 'alter tablespace '||dt.tablespace_name||' add tempfile'|| ''''||dtf.file_name||''''||' reuse;' from sys.dba_temp_files dtf, sys.dba_tablespaces dt where dtf.tablespace_name = dt.tablespace_name and dt.contents = 'TEMPORARY';

Time lag standby / master • Time to apply archived redo logs  check standby alert log Media Recovery Log /u01/app/… arch/prd_112286.arc Fri Mar 8 12:00:43 2002 • Gap between last written log on master and most recent applied log on standby select to_char(max(first_time), 'dd-mon-yyyy hh24:mi:ss') from v$log_history;

Time lag too long • Standby now 28 minutes behind master • Standby may never catch up (300Mb redo logs) • 300Mb of redo written on master every 2 minutes • 27 second push to standby • Standby applying 300Mb of redo in 2m10s • Falls behind 37 seconds every log switch

Time lag too long DBA-defined threshhold = 1 hour M S GAP 1h8m9s 10:49:52 09:41:43 1h26m39s 11:49:52 10:23:13 30 (switches) * 37 seconds = 18m30s further behind 30 log switches

Trouble!!! • Parallel recovery (if available) • Rebuild portion of standby affected on master • Adjust monitoring routine until standby "catches up"

Structural change on master • Adding a tablespace create tablespace mon123 datafile '/u01/oradata/tpg/mon123_01.dbf' size 2000m, '/u04/oradata/tpg/mon123_02.dbf' size 2000m; • Adding a datafile to an existing tablespace alter tablespace mon123 add datafile '/u02/oradata/tpg/mon123_03.dbf' size 2000m, '/u03/oradata/tpg/mon123_04.dbf' size 2000m;

Detection error stack ORA-00283: recovery session canceled due to errors ORA-01670: new datafile 147 needed for standby database recovery ORA-01157: cannot identify/lock data file 147 - see DBWR trace file ORA-01110: data file 147: '/u03/oradata/tpg/mon123_04.dbf' ORA-01112: media recovery not started

The fix alter database create datafile '/u03/oradata/tpg/mon123_04.dbf' as '/u03/oradata/tpg/mon123_04.dbf'; *ERROR at line 1:ORA-01516: nonexistent log file, datafile or tempfile '/u03/oradata/tpg/mon123_04.dbf'

tweak monitoring script manually run add datafile statements on standby remember standby is not truly accessible beware of many additions interfering with recovery The real fix (8i)

The real fix (9i & 10g) standby_file_management = AUTO

Archived redo logs • Impact on the master • Longer retention time to ensure received on standby • Compress after sending to standby(s) • On the standby • Alternate archived redo log storage • Compress after application

Archived redo logs Cannot find archived redo logs on standby?? #!/bin/ksh for f in $(find {arl_loc} -type f -mtime 3) do rm $f done

Archived redo logs • log_archive_dest_n on master can cause instance to suspend if standby site unreachable • Be conservative in keeping archived redo logs • Do it yourself rather than relying on Oracle Net Failover / failback of Oracle9i outside of EE may be impossible

A word on standby control files control_files = (/u01/oradata/tpg/control01.ctl, /u03/oradata/tpg/control02.ctl) IOUG> cp stdbyctl.ctl /u01/oradata/IOUG IOUG> cp stdbyctl.ctl /u03/oradata/IOUG IOUG> cd /u01/oradata/tpg IOUG> ln –s stdbyctl.ctl control01.ctl IOUG> cd /u01/oradata/tpg IOUG> ln –s stdbyctl.ctl control02.ctl . . . IOUG> cp stdbyctl.ctl /u01/oradata/tpg IOUG> cp stdbyctl.ctl /u03/oradata/tpg

Database cloning 401 • Replicate directory structure • Push hot backup to target server • Push INIT file to target server • Create standby control file • Push and propagate . . .

Database cloning 401 select 'scp '||{hot_backup_loc}|| file_name||'.gz'||' oracle@standby1:'||file_name||'.gz' from sys.dba_data_files; scp /hloc/u01/odata/tpg/system01.dbf.gz oracle@standby1:/u01/odata/tpg/system01.dbf.gz scp /hloc/u01/odata/tpg/gl01.dbf.gz oracle@standby1:/u01/odata/tpg/gl01.dbf.gz scp /hloc/u01/odata/tpg/apx01.dbf.gz oracle@standby1:/u01/odata/tpg/apx01.dbf.gz scp /hloc/u01/odata/tpg/mtl01.dbf.gz oracle@standby1:/u01/odata/tpg/mtl01.dbf.gz scp /hloc/u01/odata/tpg/ar01.dbf.gz oracle@standby1:/u01/odata/tpg/ar01.dbf.gz . . . . . .

Database cloning 401 Specify log: {=suggested | filename | AUTO | CANCEL}ORA-00279: change 9560077007 generated at 03/09/2002 13:10:35 needed for thread1ORA-00289: suggestion : /u01/oraarch/prd/arch1_3214.dbfORA-00280: change 9560077007 for thread 1 is in sequence #3214ORA-00278: log file '/u01/oraarch/prd/arch1_3213.dbf' no longer needed for this recoverySpecify log: {<Return>=suggested | filename | AUTO | CANCEL} <Return>ORA-00308: cannot open archived log '/u01/oraarch/prd/arch1_3214.dbf'ORA-27037: unable to obtain file statusSVR4 Error: 2: No such file or directoryAdditional information: 3

Database cloning 401 SQL> alter database activate 2 standby database; Database altered. SQL> shutdown SQL> startup mount SQL> alter database open;

Drawbacks • Previous backups invalid • Un-recoverable until first backup successfully written • Issues with database files created since most recent backup • Standby must be rebuilt from scratch

Reporting server? • One-time setup • Open readonly • Create tempfiles under LMTT • Schedule “out-of-synch” times • Liaise with user community • Adjust monitoring algorithm • Plan for potential recovery lag

Reporting server? Able to sacrifice time lost to reporting window ??

Switchover/switchback • Data Guard with EE • Register master site • Register standby site • Verify log transport services working • switchover to standby_db • background dump directory’s dcrSID.log • not as easy as it seems • well-versed at OSS for difficulties

Switchover/switchback • Last log switch on master • Shutdown master after archival • Ship last log to standby • Apply on standby • Shutdown standby • Ship online redo logs and control file from master to standby site

Switchover/switchback • Position online redo logs and control files • Startup master • Create standby control file • Ship to standby • Startup mount standby • Initiate recovery

Nothing short of a miracle ...

rman backups • As of 8.1.7.2, write backups on the standby • do not register the standby in the catalog • resync catalog; manually after every backup • Less overhead on the production site • Be very careful of test restores so as not to interfere with production • Different ORACLE_SID on standby

Standby Your Standby Too