760 likes | 955 Views
Oracle DataGuard Support Issues. Brian Hitchcock OCP 10g DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com. www.brianhitchcock.net. Brian Hitchcock October 23, 2007. Page 1. Oracle DataGuard. Focus on Logical Standby Support Issues Maintains a standby database
E N D
Oracle DataGuardSupport Issues Brian HitchcockOCP 10g DBA Sun Microsystems brian.hitchcock@sun.com brhora@aol.com www.brianhitchcock.net Brian Hitchcock October 23, 2007 Page 1
Oracle DataGuard • Focus on Logical Standby Support Issues • Maintains a standby database • Archived redo logs on primary • Sent to standby and applied • Simple idea • Many configuration options • Can become very complicated
DataGuard • Must be SYS to make changes • Sqlplus / as sysdba • Changes to DataGuard standby database • Some can’t be made while apply process running • Change Guard status • Support Issues • Create physical standby • Convert to logical standby • After logical standby is running • Refresh process
DataGuard Errors • DataGuard reports lot of errors • Standby database alert log • Many are for normal operation • Why reported as errors? • Monitoring of db alert log • Will report these ‘errors’ • Hard to filter out normal ‘errors’
Create Physical Standby • On Primary database • Enable Forced Logging • Create password file • Setup init.ora/spfile parameters • Can’t connect to standby • SYS password • Verify archiving enabled • Backup db (hot or cold) • Create standby control file
Create Physical Standby • On Standby database • Copy db backup files from primary • Copy standby control file from primary • Setup init.ora/spfile parameters • Start physical standby db • Trace file • Verify physical standby working • May not ‘see’ redo logs, register them • Redo logs not deleted, use RMAN
Convert to Logical Standby • On Primary database • Build LogMiner dictionary • On Standby database • Stop redo apply • Errors, no impact • Convert database to logical standby • Two trace files • Restart db • Open resetlogs • Verify logical standby working
Logical Standby is Running • Business requirements • Standby frozen most of the day • Standby catches up once per day • Alert log messages while catching up • Disk space for archived redo logs • Other issues • Apply process is slow • How to detect, resolve • Primary versus Standby backups • Impact, resolution
Logical Standby is Running • Other Issues • Constraint violations • Errors, resolution • No data found • Errors, resolution • ORA-16211 • Errors, Oracle Support • Primary db XDB schema issues • Fixed on primary, errors on standby
Logical Standby is Running • Other issues • ORA-07445 • Refresh cures all • Refresh process • After refresh • ORA-16211: unsupported record found in the archived redo log • Compile invalid objects • Import into standby database
Primary Can’t Connect • Standby not available • Reported on primary production database • “ORACLE not available” • Looks like production primary is down • Your monitoring may need to be adjusted Thu Oct 18 16:59:20 2007 Error 1034 received logging on to the standby Thu Oct 18 16:59:20 2007 Errors in file /shared/orahome01/admin/BRHPROD/bdump/brhprod_arc1_2635.trc: ORA-01034: ORACLE not available PING[ARC1]: Heartbeat failed to connect to standby ‘BRHPRSB'. Error is 1034.
SYS Password Issue • Primary tries to connect to standby Mon Oct 8 15:31:36 2007 Error 1017 received logging on to the standby ------------------------------------------------------------ Check that the primary and standby are using a password file and remote_login_passwordfile is set to SHARED or EXCLUSIVE, and that the SYS password is same in the password files. returning error ORA-16191 ------------------------------------------------------------ Mon Oct 8 15:31:36 2007 Errors in file /orahome01/admin/BRHBETA/bdump/brhbeta_arc0_2309.trc: ORA-16191: Primary log shipping client not logged on standby PING[ARC0]: Heartbeat failed to connect to standby ‘BRHBRSB'. Error is 16191.
SYS Password Issue • Verify SYS password is the same • On primary and standby • Sqlplus sys/<password> • Verify password file has same password • On primary and standby • Cat $ORACLE_HOME/dbs/orapw<SID> • Refresh password file • Alter user SYS identified by <password> • Update password file
DataGuard Trace File • Physical Standby • Start log apply process • Trace file created • Stops when log apply process stops • See file contents later…
Can’t ‘See’ Redo Logs • Physical Standby • Creating or Refreshing standby • Primary configured, sending redo logs • Standby not yet created/running • Standby may not register redo logs • Our scripts maintain primary archived redo logs • Compress to save disk space, delete after 2 days • Manually register • Alter database register logfile ‘<logfile>’; • DataGuard applies redo log
Can’t ‘See’ Redo Logs BRHBETA> select * from v$archive_gap; THREAD# LOW_SEQUENCE# HIGH_SEQUENCE# ---------- ------------- -------------- 1 1959 1976 BRHBETA> select sequence#, applied from v$archived_log order by sequence#; SEQUENCE# APP ---------- --- 1956 YES 1957 YES 1958 YES 1977 NO 1978 NO 1979 NO 1980 NO 1981 NO 1982 NO 1983 NO 1984 NO 11 rows selected. 1959 thru 1976 on standby
Redo Logs Not Deleted • Physical Standby • After applied to standby • Unlike logical standby • SQL apply process does delete them • Use RMAN • Possible disk space issues on standby • How long will you need to store redo logs? • Not an issue if converting to logical soon
Stop Physical Standby • Log Apply Process start • Starts trace file • When physical standby first created • Ends when log apply stops • Normal processing • Trace file looks like a problem
Turn Off Apply Process • Physical standby • Turn off apply process • Regular maintenance • Converting to Logical Standby • Generates ‘error’ • Why is this an error? • Typical of DataGuard • Everything seems to be an ‘error’ • Even when it is perfectly routine • Makes support more difficult • When is an error something to worry about?
Standby Alert Log Tue Oct 9 16:34:34 2007 Physical Standby Database mounted. Completed: ALTER DATABASE MOUNT Tue Oct 9 16:34:36 2007 alter database recover managed standby database disconnect from session Tue Oct 9 16:34:36 2007 Attempt to start background Managed Standby Recovery process (BRHBETA) MRP0 started with pid=11, OS id=13474 Tue Oct 9 16:34:36 2007 MRP0: Background Managed Standby Recovery process started (BRHBETA) Managed Standby Recovery not using Real Time Apply parallel recovery started with 7 processes Log apply process started when physical standby created
Standby Alert Log Wed Oct 10 10:15:15 2007 alter database recover managed standby database cancel Wed Oct 10 10:15:19 2007 MRP0: Background Media Recovery cancelled with status 16037 Wed Oct 10 10:15:19 2007 Errors in file /orahome01/admin/BRHBETA/bdump/brhbeta_mrp0_13474.trc: ORA-16037: user requested cancel of managed recovery operation Recovery interrupted! Wed Oct 10 10:15:20 2007 Errors in file /orahome01/admin/BRHBETA/bdump/brhbeta_mrp0_13474.trc: ORA-16037: user requested cancel of managed recovery operation Wed Oct 10 10:15:20 2007 MRP0: Background Media Recovery process shutdown (BRHBETA) Wed Oct 10 10:15:21 2007 Managed Standby Recovery Canceled (BRHBETA) Wed Oct 10 10:15:21 2007 Completed: alter database recover managed standby database cancel Log apply process stopped – preparing to convert to logical standby
Trace File $ more /orahome01/admin/BRHBETA/bdump/brhbeta_mrp0_13474.trc /orahome01/admin/BRHBETA/bdump/brhbeta_mrp0_13474.trc Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - 64bit Production With the Partitioning, OLAP and Data Mining options ORACLE_HOME = /orahome01/product/10.2.0 System name: SunOS Node name: brh-beta1-zone04 Release: 5.10 Version: Generic_118833-36 Machine: sun4u Instance name: BRHBETA Redo thread mounted by this instance: 1 Oracle process number: 11 Unix process pid: 13474, image: oracle@beta1-zone04 (MRP0) *** SERVICE NAME:() 2007-10-09 16:34:36.298 *** SESSION ID:(394.1) 2007-10-09 16:34:36.298 ARCH: Connecting to console port... *** 2007-10-09 16:34:36.299 60639 kcrr.c MRP0: Background Managed Standby Recovery process started Start applying redo logs to physical standby
Trace File *** 2007-10-09 16:34:41.302 1018 krsm.c Managed Recovery: Initialization posted. *** 2007-10-09 16:34:41.303 60639 kcrr.c Managed Standby Recovery not using Real Time Apply Recovery target incarnation = 2, activation ID = 0 Influx buffer limit = 27762 (50% x 55524) Successfully allocated 7 recovery slaves Using 158 overflow buffers per recovery slave Start recovery at thread 1 ckpt scn 8257757517457 logseq 1956 block 5 *** 2007-10-09 16:34:42.124 Media Recovery add redo thread 1 *** 2007-10-09 16:34:42.124 1018 krsm.c Managed Recovery: Active posted. ORA-00367: checksum error in log file header ORA-00305: log 1 of thread 1 inconsistent; belongs to another database ORA-00312: online log 1 thread 1: '/shared/oralogs01/BRHBETA/redo01a.log' *** 2007-10-09 16:34:42.147 60639 kcrr.c Clearing online redo logfile 1 /shared/oralogs01/BRHBETA/redo01a.log *** 2007-10-09 16:36:15.066 *** 2007-10-09 16:36:15.066 60639 kcrr.c Clearing online redo logfile 1 complete ORA-00367: checksum error in log file header ORA-00305: log 2 of thread 1 inconsistent; belongs to another database ORA-00312: online log 2 thread 1: '/shared/oralogs01/BRHBETA/redo02a.log' Recreating redo logs
Trace File *** 2007-10-09 16:36:15.100 60639 kcrr.c Clearing online redo logfile 2 /shared/oralogs01/BRHBETA/redo02a.log *** 2007-10-09 16:37:51.473 *** 2007-10-09 16:37:51.473 60639 kcrr.c Clearing online redo logfile 2 complete ORA-00367: checksum error in log file header ORA-00305: log 3 of thread 1 inconsistent; belongs to another database ORA-00312: online log 3 thread 1: '/shared/oradata02/BRHBETA/redo03b.log' *** 2007-10-09 16:37:51.479 60639 kcrr.c Clearing online redo logfile 3 /shared/oradata02/BRHBETA/redo03b.log *** 2007-10-09 16:39:26.048 *** 2007-10-09 16:39:26.048 60639 kcrr.c Clearing online redo logfile 3 complete ORA-00367: checksum error in log file header ORA-00305: log 4 of thread 1 inconsistent; belongs to another database ORA-00312: online log 4 thread 1: '/shared/oradata02/BRHBETA/redo04b.log' *** 2007-10-09 16:39:26.488 60639 kcrr.c Clearing online redo logfile 4 /shared/oradata02/BRHBETA/redo04b.log *** 2007-10-09 16:41:00.447 *** 2007-10-09 16:41:00.447 60639 kcrr.c Clearing online redo logfile 4 complete *** 2007-10-09 16:41:00.469 60639 kcrr.c Media Recovery Waiting for thread 1 sequence 1956
Trace File Applying redo logs to physical standby *** 2007-10-09 16:41:00.469 60639 kcrr.c Fetching gap sequence in thread 1, gap sequence 1956-1976 *** 2007-10-09 16:41:30.782 ----------------------------------------------------------- Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization parameter is defined to a value that is sufficiently large enough to maintain adequate log switch information to resolve archivelog gaps. ----------------------------------------------------------- *** 2007-10-09 16:54:31.045 *** 2007-10-09 16:54:31.045 60639 kcrr.c Fetching gap sequence in thread 1, gap sequence 1956-1956 *** 2007-10-09 16:55:01.154 ----------------------------------------------------------- Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization parameter is defined to a value that is sufficiently large enough to maintain adequate log switch information to resolve archivelog gaps. ----------------------------------------------------------- *** 2007-10-09 16:56:31.179 Media Recovery Log /oraarch01/BRHBETA/LOG_1956_1_629245032.arc *** 2007-10-09 16:56:33.431 Media Recovery Log /oraarch01/BRHBETA/LOG_1957_1_629245032.arc *** 2007-10-09 16:56:44.495
Trace File *** 2007-10-09 16:56:44.495 60639 kcrr.c Media Recovery Waiting for thread 1 sequence 1958 *** 2007-10-09 16:56:44.495 60639 kcrr.c Fetching gap sequence in thread 1, gap sequence 1958-1976 *** 2007-10-09 16:57:14.647 ----------------------------------------------------------- Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization parameter is defined to a value that is sufficiently large enough to maintain adequate log switch information to resolve archivelog gaps. ----------------------------------------------------------- *** 2007-10-09 17:05:14.785 Media Recovery Log /oraarch01/BRHBETA/LOG_1958_1_629245032.arc *** 2007-10-09 17:05:18.043 60639 kcrr.c Media Recovery Waiting for thread 1 sequence 1959 *** 2007-10-09 17:05:18.043 60639 kcrr.c Fetching gap sequence in thread 1, gap sequence 1959-1976 *** 2007-10-09 17:05:48.284 ----------------------------------------------------------- Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization parameter is defined to a value that is sufficiently large enough to maintain adequate log switch information to resolve archivelog gaps. -----------------------------------------------------------
Trace File *** 2007-10-09 17:07:18.309 Media Recovery Log /oraarch01/BRHBETA/LOG_1959_1_629245032.arc *** 2007-10-09 17:07:21.114 Media Recovery Log /oraarch01/BRHBETA/LOG_1960_1_629245032.arc *** 2007-10-09 17:07:22.945 Media Recovery Log /oraarch01/BRHBETA/LOG_1961_1_629245032.arc *** 2007-10-09 17:07:27.300 Media Recovery Log /oraarch01/BRHBETA/LOG_1962_1_629245032.arc *** 2007-10-09 17:07:29.637 Media Recovery Log /oraarch01/BRHBETA/LOG_1963_1_629245032.arc *** 2007-10-09 17:07:29.709 60639 kcrr.c Media Recovery Waiting for thread 1 sequence 1964 *** 2007-10-09 17:07:29.709 60639 kcrr.c Fetching gap sequence in thread 1, gap sequence 1964-1976 *** 2007-10-09 17:07:59.858 ----------------------------------------------------------- Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization parameter is defined to a value that is sufficiently large enough to maintain adequate log switch information to resolve archivelog gaps. ----------------------------------------------------------- *** 2007-10-09 17:08:29.866 Media Recovery Log /oraarch01/BRHBETA/LOG_1964_1_629245032.arc
Trace File *** 2007-10-09 17:08:31.924 Media Recovery Log /oraarch01/BRHBETA/LOG_1965_1_629245032.arc *** 2007-10-09 17:09:12.510 Media Recovery Log /oraarch01/BRHBETA/LOG_1966_1_629245032.arc *** 2007-10-09 17:09:21.050 Media Recovery Log /oraarch01/BRHBETA/LOG_1967_1_629245032.arc *** 2007-10-09 17:09:40.234 Media Recovery Log /oraarch01/BRHBETA/LOG_1968_1_629245032.arc *** 2007-10-09 17:09:45.055 Media Recovery Log /oraarch01/BRHBETA/LOG_1969_1_629245032.arc *** 2007-10-09 17:09:50.572 Media Recovery Log /oraarch01/BRHBETA/LOG_1970_1_629245032.arc *** 2007-10-09 17:09:58.968 Media Recovery Log /oraarch01/BRHBETA/LOG_1971_1_629245032.arc *** 2007-10-09 17:10:03.922 Media Recovery Log /oraarch01/BRHBETA/LOG_1972_1_629245032.arc *** 2007-10-09 17:10:13.196 Media Recovery Log /oraarch01/BRHBETA/LOG_1973_1_629245032.arc *** 2007-10-09 17:10:21.927 Media Recovery Log /oraarch01/BRHBETA/LOG_1974_1_629245032.arc *** 2007-10-09 17:10:34.064 Media Recovery Log /oraarch01/BRHBETA/LOG_1975_1_629245032.arc *** 2007-10-09 17:10:42.420 60639 kcrr.c Media Recovery Waiting for thread 1 sequence 1976
Trace File *** 2007-10-09 17:10:42.421 60639 kcrr.c Fetching gap sequence in thread 1, gap sequence 1976-1976 *** 2007-10-09 17:11:12.538 ----------------------------------------------------------- Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization parameter is defined to a value that is sufficiently large enough to maintain adequate log switch information to resolve archivelog gaps. ----------------------------------------------------------- *** 2007-10-09 17:12:42.563 Media Recovery Log /oraarch01/BRHBETA/LOG_1976_1_629245032.arc *** 2007-10-09 17:12:45.563 Media Recovery Log /oraarch01/BRHBETA/LOG_1977_1_629245032.arc *** 2007-10-09 17:12:48.534 Media Recovery Log /oraarch01/BRHBETA/LOG_1978_1_629245032.arc *** 2007-10-09 17:13:00.505 Media Recovery Log /oraarch01/BRHBETA/LOG_1979_1_629245032.arc *** 2007-10-09 17:13:02.054 Media Recovery Log /oraarch01/BRHBETA/LOG_1980_1_629245032.arc *** 2007-10-09 17:13:03.231 Media Recovery Log /oraarch01/BRHBETA/LOG_1981_1_629245032.arc *** 2007-10-09 17:13:03.902 Media Recovery Log /oraarch01/BRHBETA/LOG_1982_1_629245032.arc
Trace File *** 2007-10-09 17:13:04.492 Media Recovery Log /oraarch01/BRHBETA/LOG_1983_1_629245032.arc *** 2007-10-09 17:13:08.171 Media Recovery Log /oraarch01/BRHBETA/LOG_1984_1_629245032.arc *** 2007-10-09 17:13:26.860 *** 2007-10-09 17:13:26.860 60639 kcrr.c Media Recovery Waiting for thread 1 sequence 1985 *** 2007-10-09 17:16:07.172 Media Recovery Log /oraarch01/BRHBETA/LOG_1985_1_629245032.arc *** 2007-10-09 17:16:08.067 Media Recovery Log /oraarch01/BRHBETA/LOG_1986_1_629245032.arc *** 2007-10-09 17:16:08.131 Media Recovery Log /oraarch01/BRHBETA/LOG_1987_1_629245032.arc *** 2007-10-09 17:16:08.195 60639 kcrr.c Media Recovery Waiting for thread 1 sequence 1988 *** 2007-10-09 17:16:13.202 Media Recovery Log /oraarch01/BRHBETA/LOG_1988_1_629245032.arc *** 2007-10-09 17:16:13.268 60639 kcrr.c Media Recovery Waiting for thread 1 sequence 1989 *** 2007-10-09 21:14:01.119 Media Recovery Log /oraarch01/BRHBETA/LOG_1989_1_629245032.arc *** 2007-10-09 21:14:16.922 *** 2007-10-09 21:14:16.922 60639 kcrr.c Media Recovery Waiting for thread 1 sequence 1990
Trace File *** 2007-10-10 09:32:33.399 *** 2007-10-10 09:32:33.399 60639 kcrr.c Fetching gap sequence in thread 1, gap sequence 1990-1990 *** 2007-10-10 09:33:05.187 Media Recovery Log /oraarch01/BRHBETA/LOG_1990_1_629245032.arc *** 2007-10-10 09:33:22.505 Media Recovery Log /oraarch01/BRHBETA/LOG_1991_1_629245032.arc *** 2007-10-10 09:33:22.570 Media Recovery Log /oraarch01/BRHBETA/LOG_1992_1_629245032.arc *** 2007-10-10 09:33:22.631 Media Recovery Log /oraarch01/BRHBETA/LOG_1993_1_629245032.arc *** 2007-10-10 09:33:22.693 Media Recovery Log /oraarch01/BRHBETA/LOG_1994_1_629245032.arc *** 2007-10-10 09:33:22.761 Media Recovery Log /oraarch01/BRHBETA/LOG_1995_1_629245032.arc *** 2007-10-10 09:33:22.807 Media Recovery Log /oraarch01/BRHBETA/LOG_1996_1_629245032.arc *** 2007-10-10 09:33:22.864 Media Recovery Log /oraarch01/BRHBETA/LOG_1997_1_629245032.arc *** 2007-10-10 09:33:22.918 Media Recovery Log /oraarch01/BRHBETA/LOG_1998_1_629245032.arc *** 2007-10-10 09:33:23.199 Media Recovery Log /oraarch01/BRHBETA/LOG_1999_1_629245032.arc
Trace File Stop Log Apply Process Ready to convert to Logical Standby *** 2007-10-10 09:33:23.255 60639 kcrr.c Media Recovery Waiting for thread 1 sequence 2000 *** 2007-10-10 10:11:07.685 Media Recovery Log /oraarch01/BRHBETA/LOG_2000_1_629245032.arc *** 2007-10-10 10:11:08.422 60639 kcrr.c Media Recovery Waiting for thread 1 sequence 2001 *** 2007-10-10 10:14:48.843 Media Recovery Log /oraarch01/BRHBETA/LOG_2001_1_629245032.arc *** 2007-10-10 10:14:49.013 60639 kcrr.c Media Recovery Waiting for thread 1 sequence 2002 *** 2007-10-10 10:15:19.072 *** 2007-10-10 10:15:19.072 60639 kcrr.c MRP0: Background Media Recovery cancelled with status 16037 ORA-16037: user requested cancel of managed recovery operation ----- Redo read statistics for thread 1 ----- Read rate (ASYNC): 619732Kb in 63640.12s => 0.01 Mb/sec Total physical reads: 619732Kb Longest record: 28Kb, moves: 0/2001133 (0%) Change moves: 779641/4101685 (19%), moved: 141Mb Longest LWN: 1023Kb, moves: 117/175493 (0%), moved: 23Mb Last redo scn: 0x0782.a8f27f37 (8257761607479) ---------------------------------------------- *** 2007-10-10 10:15:19.088 Media Recovery drop redo thread 1
Trace File *** 2007-10-10 10:15:20.864 1018 krsm.c Managed Recovery: Not Active posted. ORA-16037: user requested cancel of managed recovery operation ARCH: Connecting to console port... *** 2007-10-10 10:15:20.871 60639 kcrr.c MRP0: Background Media Recovery process shutdown *** 2007-10-10 10:15:20.871 1018 krsm.c oraarch01/BRHBETA $
Convert to Logical Standby • SQL Apply Process • When applying redo logs • Generates 2 trace files • What are they? • Trace files • One shows start of kcrrwkx • Second shows end of kcrrwkx • What are these for? • Neither show up in alert log • Both continue as long as SQL apply process runs
First Trace File /orahome01/admin/BRHBETA/bdump $ more brhbeta_arc0_13168.trc /orahome01/admin/BRHBETA/bdump/brhbeta_arc0_13168.trc Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - 64bit Production With the Partitioning, OLAP and Data Mining options ORACLE_HOME = /orahome01/product/10.2.0 System name: SunOS Node name: brh-beta1-zone04 Release: 5.10 Version: Generic_118833-36 Machine: sun4u Instance name: BRHBETA Redo thread mounted by this instance: 1 Oracle process number: 24 Unix process pid: 13168, image: oracle@beta1-zone04 (ARC0) *** SERVICE NAME:() 2007-10-10 10:40:26.358 *** SESSION ID:(188.2) 2007-10-10 10:40:26.358 kcrrwkx: nothing to do (start) *** 2007-10-10 10:45:26.240 kcrrwkx: nothing to do (start) *** 2007-10-10 10:46:35.388 kcrrwkx: nothing to do (end) ... ...
Second Trace File /orahome01/admin/BRHBETA/bdump $ more brhbeta_arc1_13170.trc /orahome01/admin/BRHBETA/bdump/brhbeta_arc1_13170.trc Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - 64bit Production With the Partitioning, OLAP and Data Mining options ORACLE_HOME = /orahome01/product/10.2.0 System name: SunOS Node name: brh-beta1-zone04 Release: 5.10 Version: Generic_118833-36 Machine: sun4u Instance name: BRHBETA Redo thread mounted by this instance: 1 Oracle process number: 9 Unix process pid: 13170, image: oracle@beta1-zone04 (ARC1) *** SERVICE NAME:() 2007-10-10 10:40:26.358 *** SESSION ID:(396.1) 2007-10-10 10:40:26.358 kcrrwkx: nothing to do (start) *** 2007-10-10 10:41:26.315 kcrrwkx: nothing to do (end) *** 2007-10-10 10:42:26.322 kcrrwkx: nothing to do (end) … …
DataGuard Likes to Chat • Physical Standby • While applying archived redo logs • Trace file documents everything standby does • Logical Standby • Once converted to logical standby • Two trace files generated • Contain messages for start/stop of each log apply • Why are these generated? • Why not have DataGuard alert logs? • Trace files tell me that something is wrong
Normal Operation • Logical Standby catching up to Primary • Apply process turned off during the day • Catches up at night • Apply process failed • Catch up after fix (skip table in the example) • Typical alert log messages • Redo log from primary registered with DG • Redo logs applied to standby • Redo logs deleted from standby
Standby Catching Up Tue Oct 16 15:13:22 2007 Completed: ALTER DATABASE STOP LOGICAL STANDBY APPLY Tue Oct 16 15:14:16 2007 Incremental checkpoint up to RBA [0x7.a0aa2.0], current log tail at RBA [0x7.b8e2c.0] Tue Oct 16 15:14:45 2007 ALTER DATABASE START LOGICAL STANDBY APPLY Tue Oct 16 15:14:45 2007 ALTER DATABASE START LOGICAL STANDBY APPLY (BRHBETA) Tue Oct 16 15:14:45 2007 No optional part Attempt to start background Logical Standby process LSP0 started with pid=21, OS id=5041 LOGSTDBY status: ORA-16111: log mining and apply setting up Tue Oct 16 15:14:46 2007 LOGMINER: Parameters summary for session# = 1 LOGMINER: Number of processes = 3, Transaction Chunk Size = 201 LOGMINER: Memory Size = 30M, Checkpoint interval = 150M Tue Oct 16 15:14:46 2007 Completed: ALTER DATABASE START LOGICAL STANDBY APPLY LOGMINER: session# = 1, builder process P001 started with pid=7 OS id=10018 LOGMINER: session# = 1, reader process P000 started with pid=34 OS id=10014 LOGMINER: session# = 1, preparer process P002 started with pid=36 OS id=10020 LSP2 started with pid=23, OS id=5043 Stop SQL Apply process Start SQL Apply process after skipping table
Standby Catching Up Tue Oct 16 15:14:48 2007 LOGMINER: Begin mining logfile: /oraarch01/BRHBETA/LOG_2048_1_629245032.arc LOGSTDBY Analyzer process P003 started with pid=13 OS id=10051 Tue Oct 16 15:14:48 2007 LOGMINER: Turning ON Log Auto Delete LOGSTDBY Apply process P004 started with pid=40 OS id=10054 LOGSTDBY Apply process P006 started with pid=42 OS id=10062 LOGSTDBY Apply process P007 started with pid=17 OS id=10064 LOGSTDBY Apply process P005 started with pid=15 OS id=10060 Tue Oct 16 15:22:02 2007 Beginning log switch checkpoint up to RBA [0x8.2.10], SCN: 8295181217591 Thread 1 advanced to log sequence 8 Current log# 4 seq# 8 mem# 0: /shared/oradata02/BRHBETA/redo04b.log Current log# 4 seq# 8 mem# 1: /shared/oralogs01/BRHBETA/redo04a.log Tue Oct 16 15:25:28 2007 Completed checkpoint up to RBA [0x8.2.10], SCN: 8295181217591 Tue Oct 16 15:34:32 2007 Incremental checkpoint up to RBA [0x8.4cbae.0], current log tail at RBA [0x8.65553.0] Tue Oct 16 15:42:40 2007 LOGMINER: End mining logfile: /oraarch01/BRHBETA/LOG_2048_1_629245032.arc Tue Oct 16 15:42:40 2007 LOGMINER: Begin mining logfile: /oraarch01/BRHBETA/LOG_2049_1_629245032.arc ... ... ... Processing redo logs
Standby Catching Up Tue Oct 16 17:20:48 2007 LOGMINER: End mining logfile: /oraarch01/BRHBETA/LOG_2049_1_629245032.arc Tue Oct 16 17:20:48 2007 LOGMINER: Begin mining logfile: /oraarch01/BRHBETA/LOG_2050_1_629245032.arc Tue Oct 16 17:20:54 2007 LOGMINER: End mining logfile: /oraarch01/BRHBETA/LOG_2050_1_629245032.arc ... ... ... Tue Oct 16 18:39:13 2007 LOGMINER: Log Auto Delete - deleting: /oraarch01/BRHBETA/LOG_2048_1_629245032.arc Deleted file /oraarch01/BRHBETA/LOG_2048_1_629245032.arc ... ... ... Tue Oct 16 18:43:40 2007 LOGMINER: Begin mining logfile: /oraarch01/BRHBETA/LOG_2082_1_629245032.arc Tue Oct 16 18:43:59 2007 LOGMINER: End mining logfile: /oraarch01/BRHBETA/LOG_2082_1_629245032.arc Tue Oct 16 18:43:59 2007 LOGMINER: Begin mining logfile: /oraarch01/BRHBETA/LOG_2083_1_629245032.arc Processing redo logs Deleting redo logs Processing redo logs
Standby Catching Up Tue Oct 16 18:44:01 2007 LOGMINER: Log Auto Delete - deleting: /oraarch01/BRHBETA/LOG_2056_1_629245032.arc Deleted file /oraarch01/BRHBETA/LOG_2056_1_629245032.arc Tue Oct 16 18:44:01 2007 LOGMINER: Log Auto Delete - deleting: /oraarch01/BRHBETA/LOG_2057_1_629245032.arc Deleted file /oraarch01/BRHBETA/LOG_2057_1_629245032.arc Tue Oct 16 18:44:01 2007 LOGMINER: Log Auto Delete - deleting: /oraarch01/BRHBETA/LOG_2058_1_629245032.arc Deleted file /oraarch01/BRHBETA/LOG_2058_1_629245032.arc ... … … Deleting redo logs
Standby Catching Up Standby is at 2087 Tue Oct 16 18:44:15 2007 LOGMINER: Begin mining logfile: /oraarch01/BRHBETA/LOG_2087_1_629245032.arc Tue Oct 16 18:48:37 2007 Completed checkpoint up to RBA [0xa.2.10], SCN: 8295181577382 Tue Oct 16 18:55:18 2007 Incremental checkpoint up to RBA [0xa.12dad.0], current log tail at RBA [0xa.1314b.0] Tue Oct 16 19:01:31 2007 RFS[1]: No standby redo logfiles created RFS[1]: Archived Log: '/oraarch01/BRHBETA/LOG_2153_1_629245032.arc' Tue Oct 16 19:01:32 2007 RFS LogMiner: Registered logfile [/oraarch01/BRHBETA/LOG_2153_1_629245032.arc] to LogMiner session id [1] Tue Oct 16 19:15:22 2007 Incremental checkpoint up to RBA [0xa.142b2.0], current log tail at RBA [0xa.143fe.0] Tue Oct 16 19:29:01 2007 LSP0: warning -- apply server 2, sid 384 waiting on user sid 196 for event (since 0 seconds): Tue Oct 16 19:29:01 2007 LOGMINER: End mining logfile: /oraarch01/BRHBETA/LOG_2087_1_629245032.arc ... … … Primary is at 2153
Standby Catching Up Unsupported DDL – Standby doesn’t execute Tue Oct 16 19:30:58 2007 LOGSTDBY stmt: CREATE PFILE = '/tmp/datatools/BRHBETA.PFILE.19144.1192413665' FROM SPFILE =… LOGSTDBY status: ORA-16226: DDL skipped due to lack of support LOGSTDBY id: XID 0x0003.02d.00013e70, hSCN 0x0782.a9c2fdb8, lSCN 0x0782.a9c2fdb8, Thread 1, RBA …LOGSTDBY stmt: create pfile='/orahome01/oradba/tmp/ora_adm_sqlbt_bkp.tmp1.17449.BRHBETA' from spfile …LOGSTDBY status: ORA-16226: DDL skipped due to lack of support LOGSTDBY id: XID 0x000b.001.000126cf, hSCN 0x0782.a9c2fe15, lSCN 0x0782.a9c2fe15, Thread 1, RBA… LOGSTDBY stmt: CREATE PFILE = '/tmp/datatools/BRHBETA.PFILE.19695.1192413687' FROM SPFILE = …LOGSTDBY status: ORA-16226: DDL skipped due to lack of support LOGSTDBY id: XID 0x0003.00c.00013e62, hSCN 0x0782.a9c2fe4a, lSCN 0x0782.a9c2fe4a, Thread 1, RBA… LOGSTDBY stmt: ALTER DATABASE BACKUP CONTROLFILE TO '/tmp/datatools/dtodump_…LOGSTDBY status: ORA-16226: DDL skipped due to lack of support LOGSTDBY id: XID 0x0009.007.00011453, hSCN 0x0782.a9c2feb4, lSCN 0x0782.a9c2feb4, Thread 1, RBA… Tue Oct 16 19:30:58 2007 ALTER TABLESPACE "SYSTEM" BEGIN BACKUP Completed: ALTER TABLESPACE "SYSTEM" BEGIN BACKUP Tue Oct 16 19:30:58 2007 ALTER TABLESPACE "SYSTEM" END BACKUP Completed: ALTER TABLESPACE "SYSTEM" END BACKUP ... … …
Standby Catching Up Standby catches up at 2158 Tue Oct 16 21:29:19 2007 LOGMINER: Begin mining logfile: /oraarch01/BRHBETA/LOG_2157_1_629245032.arc Tue Oct 16 21:30:03 2007 LOGMINER: End mining logfile: /oraarch01/BRHBETA/LOG_2157_1_629245032.arc Tue Oct 16 21:35:52 2007 Incremental checkpoint up to RBA [0xa.f41b7.0], current log tail at RBA [0xa.f41cc.0] Tue Oct 16 21:55:56 2007 Incremental checkpoint up to RBA [0xa.f43b5.0], current log tail at RBA [0xa.f43b5.0] Tue Oct 16 22:11:16 2007 RFS[1]: No standby redo logfiles created RFS[1]: Archived Log: '/oraarch01/BRHBETA/LOG_2158_1_629245032.arc' Tue Oct 16 22:11:16 2007 RFS LogMiner: Registered logfile [/oraarch01/BRHBETA/LOG_2158_1_629245032.arc] to LogMiner session id [1] Tue Oct 16 22:11:16 2007 LOGMINER: Begin mining logfile: /oraarch01/BRHBETA/LOG_2158_1_629245032.arc Tue Oct 16 22:11:20 2007 LOGMINER: End mining logfile: /oraarch01/BRHBETA/LOG_2158_1_629245032.arc
Archived Redo Logs • Logical Standby • After applied to standby • SQL apply process does delete them • Unlike physical standby • Possible disk space issues on standby • How long will you need to store redo logs? • If standby frozen all day • Weekends? Holidays? • If standby fails • How many days to fix failures?
Archived Redo Logs • How long are redo logs available on primary? • If not on disk when needed for standby • Recover from backup • Dataguard may not see these redo logs • Register redo logs • Logical standby • Also generates its own archived redo logs • Needed to recover standby db • Unique standby db objects?
SQL Apply Process Slow • Detect long-running transaction • Compute estimate of time to complete • Identify and skip problem table
Long Running Transaction • Standby Alert Log • SQL apply process applying redo log 2049 • Doesn’t move on within a few minutes • Current time is Tue Oct 16 08:09:55 2007 • Shows start time for this redo log • Has been processing for over 24 hours Mon Oct 15 05:52:29 2007 LOGMINER: Begin mining logfile: /oraarch01/BRHBETA/LOG_2049_1_629245032.arc
Long Running Transaction • What is apply process doing? • Check redo logs waiting to be applied • Where is processing in current redo log? • How long to complete current redo log?