300 likes | 521 Views
Data Guard Fast-Start Failover April 8, 2010. Justin Liu justin.liu@turner.com Turner Broadcasting System, Inc. Fast-Start Failover. A feature of Data Guard Oracle’s Disaster Recover (DR) solution Automate failover between primary db and standby db. Why Fast-Start Failover ?. Faster
E N D
Data Guard Fast-Start Failover April 8, 2010 Justin Liu justin.liu@turner.com Turner Broadcasting System, Inc.
Fast-Start Failover • A feature of Data Guard • Oracle’s Disaster Recover (DR) solution • Automate failover between primary db and standby db
Why Fast-Start Failover ? • Faster • Failover is automatic, no manual intervention • Reliable • Eliminates human error (switchover or failover) • Uses the correct procedure to minimize data loss • Simple • Automatically determines if failover criteria is met • Original primary database is automatically reinstated as a new standby database following failover
Fast-Start failover Requirements • Requirements • Require a Data Guard Broker configuration • Both Primary and Target Standbys are managed by the DG Broker. • Observer machine – DGMGRL utility is installed and Oracle Net connectivity to both primary and standby • Maximum Availability Mode – synchronous redo apply • Flashback database must be activated in both Primary and target standby
When is a Fast-Start Failover Triggered? • Database conditions: • Server crash or shutdown • Database instance failure (or last instance failure in a RAC configuration) • Shutdown abort (or shutdown abort of the last instance in a RAC configuration) • Datafiles offline due to I/O errors • Network conditions: • When both the Observer and the standby database lose their network connection to the primary database
FFO Setup: init.ora parameters – Primary db • *.local_listener='listener_demop1.turner.com • *.remote_listener='all_listeners_demop1.turner.com'*.db_file_name_convert='demos1','demop1'*.log_file_name_convert='demos1','demop1'*.db_unique_name='demop1'*.dg_broker_start=TRUE*.fal_client='demop1' # primary db unique id*.fal_server='demos1' # standby db unique id *.LOG_ARCHIVE_CONFIG='DG_CONFIG=(demop1,demos1)'*.LOG_ARCHIVE_DEST_1='LOCATION=/oracle/arc/demop1 VALID_FOR=(ALL_LOGFILES,ALL_ROLES) DB_UNIQUE_NAME=demop1'*.LOG_ARCHIVE_DEST_2='SERVICE=demos1 lgwr sync affirm VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=demos1' Note: FAL - Fetch Archive Log
FFO Setup : init.ora parameters – Primary db • *.log_archive_dest_state_1='ENABLE'*.log_archive_dest_state_2='ENABLE'*.service_names='demop1'*.standby_archive_dest='/oracle/arc/demop1'*.standby_file_management='AUTO'*.dg_broker_config_file1='/oracle/admin/demop1/dgdat/dr1demop1.dat'*.dg_broker_config_file2='/oracle/admin/demop1/dgdat/dr2demop1.dat'*.db_recovery_file_dest_size=20G; • *.db_flashback_retention_target=720 # minutes -12 hours
FFO Setup : init.ora parameters – Standby db • local_listener='listener_demos1.turner.com'remote_listener='all_listeners_demop1.turner.com'dg_broker_start=TRUEservice_names='demos1'db_unique_name=demos1dg_broker_start=truedb_file_name_convert=('demop1','demos1')log_file_name_convert=('demop1','demos1')LOG_ARCHIVE_CONFIG='DG_CONFIG=(demop1,demos1)'LOG_ARCHIVE_DEST_1='LOCATION=/oracle/arc/demop1 VALID_FOR=(ALL_LOGFILES,ALL_ROLES) DB_UNIQUE_NAME=demos1'LOG_ARCHIVE_DEST_2='SERVICE=demop1 lgwr sync affirm VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=demop1'
FFO Setup : init.ora parameters – Standby db 1 • LOG_ARCHIVE_DEST_STATE_1=ENABLELOG_ARCHIVE_DEST_STATE_2=ENABLESTANDBY_FILE_MANAGEMENT=AUTOSTANDBY_ARCHIVE_DEST='/oracle/arc/demop1'FAL_SERVER=demop1FAL_CLIENT=demos1*.dg_broker_config_file1='/oracle/admin/demop1/dgdat/dr1demos1.dat'*.dg_broker_config_file2='/oracle/admin/demop1/dgdat/dr2demos1.dat'*.db_recovery_file_dest_size=20G;*.db_flashback_retention_target=720
FFO Setup : Listener.ora on Primary demop1 • SID_LIST_LISTENER = (SID_LIST = (SID_DESC =(GLOBAL_DBNAME = demop1_DGMGRL.turner.com) (ORACLE_HOME = /oracle/product/database/10.2.0) (SID_NAME = demop1) ) (SID_DESC = (GLOBAL_DBNAME = demop1.turner.com) (ORACLE_HOME = /oracle/product/database/10.2.0) (SID_NAME = demop1) ) ) )LISTENER = (DESCRIPTION_LIST = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = demodb1)(PORT = 1530)) ) (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC)) ) )Note: demop1_DGMGRL is to enable the broker to start the db when switchover
FFO Setup : Listener.ora on Standby demos1 • SID_LIST_LISTENER = (SID_LIST = (SID_DESC = (GLOBAL_DBNAME = demos1_DGMGRL.turner.com) (ORACLE_HOME = /oracle/product/database/10.2.0) (SID_NAME = demos1) ) (SID_DESC = (GLOBAL_DBNAME = demos1.turner.com) (ORACLE_HOME = /oracle/product/database/10.2.0) (SID_NAME = demos1) ) ) )LISTENER = (DESCRIPTION_LIST = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = demodb2)(PORT = 1530)) ) (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC)) ) )
FFO Setup : tnsnames.ora on Primary demop1 • ALL_LISTENERS_demop1.turner.com= (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = demodb1.turner.com)(PORT = 1530)) (ADDRESS = (PROTOCOL = TCP)(HOST = demodb2.turner.com)(PORT = 1530)) )LISTENER_demop1.TURNER.COM = (ADDRESS = (PROTOCOL = TCP)(HOST = demodb1.turner.com)(PORT = 1530))demop1.TURNER.COM = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST =demodb1.turner.com)(PORT = 1530)) ) (CONNECT_DATA = (SERVICE_NAME = demop1.turner.com) • (instance_NAME = demop1) ) )demos1.TURNER.COM = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = demodb2.turner.com)(PORT = 1530)) ) (CONNECT_DATA = (SERVICE_NAME = demos1.turner.com) • (instance_NAME = demos1) ) )
FFO Setup : tnsnames.ora on Stamdby demos1 • ALL_LISTENERS_demop1.turner.com= (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = demodb1.turner.com)(PORT = 1530)) (ADDRESS = (PROTOCOL = TCP)(HOST = demodb2.turner.com)(PORT = 1530)) )LISTENER_demos1.TURNER.COM = (ADDRESS = (PROTOCOL = TCP)(HOST = demodb2.turner.com)(PORT = 1530))demop1.TURNER.COM = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST =demodb1.turner.com)(PORT = 1530)) ) (CONNECT_DATA = (SERVICE_NAME = demop1.turner.com) • (instance_NAME = demop1) ) )demos1.TURNER.COM = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = demodb2.turner.com)(PORT = 1530)) ) (CONNECT_DATA = (SERVICE_NAME = demos1.turner.com) • (instance_NAME = demos1) ) )
FFO Setup : Duplicate Standby DB using RMAN • At Primary Site • Use spfile with new dg parameters • SQL> startup mount pfile='/oracle/admin/demop1/pfile/initdemop1.ora';SQL> ALTER DATABASE ARCHIVELOG;SQL> alter database flashback on;SQL> alter database force logging;SQL> create spfile '/oracle/admin/demop1/spfile$SID.ora' from pfile='/oracle/admin/demop1/pfile/init$SID.ora';SQL> shutdown;SQL> startup • Add Standby logs – The total of standby logs = # of redo log +1 • SQl> ALTER DATABASE ADD STANDBY LOGFILE GROUP 4 ('/oracle/dbf/redo/demop1/srl01.log') SIZE 50M;SQL>ALTER DATABASE ADD STANDBY LOGFILE GROUP 5 ('/oracle/dbf/redo/demop1/srl02log') SIZE 50M;SQL>ALTER DATABASE ADD STANDBY LOGFILE GROUP 6 ('/oracle/dbf/redo/demop1/srl03log') SIZE 50M;
FFO Setup : Duplicate Standby Db using RMAN • At Primary site • 1) $rman • Rman> run { allocate channel c1 type disk format '/oracle/backup/demop1/%d_%p_%t.dbf'; backup database include current controlfile for standby; sql 'alter system archive log current'; } • 2) scp all rman backupsets to the standby host.
FFO Setup : Duplicate Standby Db using RMAN • At Standby Site • Startup standby db with spfile • SQL> startup nomount pfile='/oracle/admin/demos1/pfile/initSID.ora';SQL> ALTER DATABASE ARCHIVELOG;SQL> alter database flashback on;SQL> alter database force logging;SQL> create spfile '/oracle/admin/demos1/spfile$SID.ora' from pfile='/oracle/admin/demos1/pfile/init$SID.ora'; • SQL> shutdown immediate;SQL> startup nomount; • $rman • Rman> run{ duplicate target database for standby nofilenamecheck dorecover; } SQL> shutdown immediate;
FFO Setup : Configure DG Broker • $dgmgrl # dg broker command line utility • DGMGRL> connect sys@demop1--- use db_unique_name to configure DG BrokerDGMGRL> create configuration 'demop1_dg' as primary database is 'demop1' connect identifier is demop1;DGMGRL> add database 'demos1' as connect identifier is demos1 maintained as physical;DGMGRL> enable configuration;DGMGRL> edit database 'demop1' set property 'LogXptMode'=‘sync';DGMGRL> edit database 'demos1' set property 'LogXptMode'=‘sync';DGMGRL> edit database 'demop1' set property FastStartFailoverTarget='demos1';DGMGRL> edit database 'demos1' set property FastStartFailoverTarget='demop1';DGMGRL> edit configuration set protection mode as maxavailability; • DGMGRL> enable fast_start failover; # Observer process needs to be started.DGMGRL> edit configuration set property FastStartFailoverThreshold=60; • DGMGRL> show configuration;
FFO Setup : Start Observer • $ nohup dgmgrl sys/pwd@demop1 "start observer“ &
Demo: DG Broker • DGMGRL> show database verbose demop1 • Database Name: demop1 Role: PRIMARY Enabled: YES Intended State: ONLINE Instance(s): demop1 Properties: InitialConnectIdentifier = 'demop1.turner.com' LogXptMode = 'SYNC‘ .………… • FastStartFailoverTarget = 'demos1' HostName = 'demodb1' SidName = 'demop1' LocalListenerAddress = '(ADDRESS=(PROTOCOL=TCP)(HOST=demodb1)(PORT=1530))' LogArchiveFormat = 'demop1_%t_%r_%s.arc' • Current status for "demop1":SUCCESS
Demo: DG Broker • DGMGRL> show database verbose demos1 • Database Name: demos1 Role: PHYSICAL STANDBY Enabled: YES Intended State: ONLINE Instance(s): demos1 Properties: InitialConnectIdentifier = 'demos1.turner.com' LogXptMode = 'SYNC‘ .………… • FastStartFailoverTarget = 'demop1' HostName = 'demodb2' SidName = 'demos1' LocalListenerAddress = '(ADDRESS=(PROTOCOL=TCP)(HOST=demodb2)(PORT=1530))' LogArchiveFormat = 'demop1_%t_%r_%s.arc‘ • Current status for "demos1":SUCCESS
Connect Time Failover Setup – Create Service SQL> exec DBMS_SERVICE.CREATE_SERVICE ( service_name => ‘ECOM_TEST.TURNER.COM', network_name => ‘ECOM_TEST.TURNER.COM‘); SQL> exec DBMS_SERVICE.START_SERVICE(‘ECOM_TEST.TURNER.COM'); SQL> select name from v$active_services; NAME ------------------------------------------------ ECOM_TEST.TURNER.COM
Connect Time Failover Setup– Create a Trigger Configure startup trigger for service SQL> CREATE OR REPLACE TRIGGER set_svc after db_role_change or startup on database DECLARE role VARCHAR(30); BEGIN SELECT DATABASE_ROLE INTO role FROM V$DATABASE; IF role = 'PRIMARY' THEN DBMS_SERVICE.START_SERVICE(‘ECOM_TEST.turner.com'); ELSE DBMS_SERVICE.STOP_SERVICE(‘ECOM_TEST.turner.com'); END IF; END;
Connect Time Failover Setup – tnsname.ora Client TNS entry (OCI) Configuration • Ecom_test.turner.com = • (DESCRIPTION = • (SDU = 32767) • (ADDRESS = (PROTOCOL = TCP)(HOST = nrac8db1-vip.turner.com)(PORT = 1530)) • (ADDRESS = (PROTOCOL = TCP)(HOST = nrac8db2-vip.turner.com)(PORT = 1530)) • (ADDRESS = (PROTOCOL = TCP)(HOST = nrac8db3-vip.turner.com)(PORT = 1530)) • (ADDRESS = (PROTOCOL = TCP)(HOST = nrac8db4-vip.turner.com)(PORT = 1530)) • (CONNECT_DATA = • (SERVER = DEDICATED) • (SERVICE_NAME = ecom_test.turner.com) • (FAILOVER_MODE = • (TYPE = SELECT) • (METHOD = BASIC) • (RETRIES = 180) • (DELAY = 5) • ) • ) • )
Connect Time Failover Setup – Jdbc connection string Thin jdbc connection string jdbc:oracle:thin:@(DESCRIPTION= (ADDRESS=(PROTOCOL=TCP)(HOST=nrac8db1-vip.turner.com)(PORT=1530))(ADDRESS=(PROTOCOL=TCP)(HOST=nrac8db2-vip.turner.com)(PORT=1530))(ADDRESS=(PROTOCOL=TCP)(HOST=nrac8db3-vip.turner.com)(PORT=1530))(ADDRESS=(PROTOCOL=TCP)(HOST=nrac8db4-vip.turner.com)(PORT=1530))(CONNECT_DATA=(service_name=ecom_test.turner.com)))
Demo: Failover / Switchover 1. Configure Broker and Fast_Start Failover 2. Start Observer 3. Shutdown abort on the primary database (Failover) 4. Wait until Fast_Start failover occurs 5. Restart the old primary 6. Verify that DG reinstates old primary database
Demo: Switchover.. • DGMGRL> show configurationConfiguration Name: ecomtr_dg Enabled: YES Protection Mode: MaxAvailability Fast-Start Failover: ENABLED Databases: ecomtr - Primary database ecomtrs - Physical standby database - Fast-Start Failover targetCurrent status for "ecomtr_dg":SUCCESSDGMGRL>
Demo: Swtichover • DGMGRL> switchover to ecomtrs;Performing switchover NOW, please wait...Operation requires shutdown of instance "ecomt1" on database "ecomtr"Shutting down instance "ecomt1"...ORA-01109: database not openDatabase dismounted.ORACLE instance shut down.Operation requires shutdown of instance "ecomts1" on database "ecomtrs"Shutting down instance "ecomts1"...ORA-01109: database not openDatabase dismounted.ORACLE instance shut down.Operation requires startup of instance "ecomt1" on database "ecomtr"Starting instance "ecomt1"...ORACLE instance started.Database mounted.Operation requires startup of instance "ecomts1" on database "ecomtrs"Starting instance "ecomts1"...ORACLE instance started.Database mounted.Switchover succeeded, new primary is "ecomtrs"DGMGRL>
Demo: Switchover • DGMGRL> show configurationConfiguration Name: ecomtr_dg Enabled: YES Protection Mode: MaxAvailability Fast-Start Failover: ENABLED Databases: ecomtr - Physical standby database - Fast-Start Failover target ecomtrs - Primary databaseCurrent status for "ecomtr_dg":SUCCESS
Lessons and Learns • tcp timeout issue (thin jdbc vs fat jdbc (oci)) – • tcp_ip_abort_cinterval = 10000 (default is 180000) • tcp_keepalive_interval = 240000 (default is 7200000) • tcp_ip_abort_interval = 60000 (default is 480000) • or • sqlnet.outbound_connect_timeout=10 in sqlnet.ora if fat jdbc or oci driver is used. (oracle bug in 10.2.0.3) • Async redo apply in long distance via vpn connection • Vpn via internet might not be the good approach if standby is thousands miles away using async/sync redo apply • Bug in 10.2.0.3 – logical standby andsqlnet.outbound_connect_timeout • dgmgrl> show database primary_DB 'sendqentries'; # to check if both primary and standby are in sync.
Lessons and Learns • Observer location concern • Issue when observer is on the same network as primary • Issue when observer is on the same network as standby • Observer should reside on different network subnet • Need to know how to failover using sqlplus • Both alertSID.log and drSID.log should be monitored to make sure the data guard is functioning correctly • Cascade redo log destination [ID 409013.1 ] • Cascading logical standby databases from a logical standby database is not supported. • Cascading standby databases (logical or physical) from a primary database that is part of an Oracle Real Application Cluster (RAC) • a Data Guard Broker environment is not supported.