470 likes | 730 Views
Maximizing Oracle Real Application Clusters 11g: Sharing the Load with Services. James Harding - Database Manager Sabre Sherri Trojan - Database Architect HP October 2, 2009. Today. Tomorrow. Objective. Improve: Capacity & Scalability Operation Support
E N D
Maximizing Oracle Real Application Clusters 11g: Sharing the Load with Services James Harding - Database Manager Sabre Sherri Trojan - Database Architect HP October 2, 2009
Objective • Improve: • Capacity & Scalability • Operation Support • Performance Monitoring & Performance Management Features • Phase 1: • Connection Load Balance • Home grown connection pooling • Clients notify us that there is a problem. • Phase 2: • Runtime Load Balance • Oracle connection pooling • We notify the client that there is a problem.
Any sufficiently advanced technology is indistinguishable from magic. Arthur C. Clark
Services provide for: Performance Trace AWR Failover Load balance Monitor Event Load balance to server with least load
Sharing the Load with Services • Client side connection load balancing • Server side connection load balancing • Server side connection load balancing with Load Balancing advisor • Runtime Load balancing with Load balancing advisor • 4 Main Options:
Load Balance On Connection Client Side - TNSNAMES CLIENT1 = (DESCRIPTION = (LOAD_BALANCE = ON) # ON=Random Connections OFF=Sequential connections (FAILOVER = ON) # Activates failover when TCP timeout is reached (ADDRESS = (PROTOCOL = TCP)(HOST = raclinux1-vip)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = raclinux2-vip)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = racdb) ) ) Random connections one listener is not overwhelmed
Server side connection load balancingBalances incoming connections across all of the instances that provide the requested database service.Load Balancing Advisor – Uses current service levels
On Connection – Server Side Load Balancing Advisor : ON or OFF Connection Load Balancing • Short - Use Load Balancing Advisory, when Load Balancing Advisory is enabled with Service Goals. • Long - Balances the number of connections per instance using session count per service. Service Goals • None • Service Time - Best response time – elapsed time per call metric. Example: Internet shopping • Throughput - Better throughput – rate that work is completed per second metric. Example: Batch
Adding a service Can be done with DBCA, OEM, SRVCTL • Add a service: srvctl add service -d RACDB1 -s SERV1 -r LINUX1 -a LINUX2 -P basic • -r = Preferred list • -a = available list (failover) • Start service: srvctl start service -d RACDB1 -s SERV1 • List services in OCR: srvctl config service -d RACDB1 -a • Create TNS Entry • Use DBMS_SERVICE.MODIFY_SERVICE to set goals and failover • Verify service in listener: lsnrctl status • List services in listener: lsnrctl services
Service with Load Balance Advisor From sqlplus, modify service for goals: BEGIN DBMS_SERVICE.MODIFY_SERVICE( service_name => 'racload' ,failover_method => DBMS_SERVICE.FAILOVER_METHOD_BASIC ,failover_type => DBMS_SERVICE.FAILOVER_TYPE_SESSION, ,failover_retries => 10, ,failover_delay => 1, ,goal => DBMS_SERVICE.GOAL_SERVICE_TIME ,clb_goal => DBMS_SERVICE.CLB_GOAL_SHORT ); END; /
Server Side Load Balancing • PMON process communicates with other listeners • PMON process registers database with database’s local listener • Cross-registers database with listeners on all other nodes in the cluster • PMON by default checks CPU every 3 seconds • PMON updates the listener with statistics. • By default, PMON updates TNSLSNR every 10 minutes. • Listener routes incoming connections to the least loaded instance. PMON Listener Listener Listener
Server Side Load Balancing How does PMON which listeners to register with? init.ora: REMOTE_LISTENERS=LISTENERS_RACDB PMON Listener Listener Listener TNSNAMES entry LISTENERS_RACDB = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = rac1-vip)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = rac2-vip)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = rac3-vip)(PORT = 1521))
Load Balancing Advisor processes Updated every 30s Has FAN events MMNL sys.sys$ Service_ metrics_tab MMON PMON Updated every few seconds - 10 minutes racgimon Fan Event Listener Listener Listener ONS $CRS_HOME/opmn/conf/ons.config AQ Fast connection failover OCI .NET Java JDBC MMNL - Memory Monitor Light MMON - Manageability Monitor ONS - Oracle Notification Service
LBSCORE – Surge Protection • Oracle 10.2 and higher: Listener uses a local value "lbscore“ • Lbscore : two dynamic values "goodness" and "delta" • Continuously updated by PMON. • PMON contacts the listener and provides starting values for goodness and delta... Listener lbscore = Goodness from PMON Listener Delta = Delta from PMON • To account for changes in load between PMON updates the listener will increment • the lbscore after each new incoming connection... • Listener lbscore = Listener lbscore(previous) + Listener Delta 2009-03-02 11:54:47.373624 : nsglb:instance:RACNY11 new lbscore:300 2009-03-02 11:54:48.422227 : nsglb:instance:RACNY21 new lbscore:100 2009-03-02 11:54:49.477567 : nsglb:instance:RACNY21 new lbscore:200 2009-03-02 11:54:50.536465 : nsglb:instance:RACNY21 new lbscore:300 2009-03-02 11:54:51.589902 : nsglb:instance:RACNY21 new lbscore:400 2009-03-02 11:54:52.642743 : nsglb:instance:RACNY11 new lbscore:400 2009-03-02 11:54:53.703777 : nsglb:instance:RACNY21 new lbscore:500
Update listener information • PMON updates the listener with instance load and dispatcher information • PMON SERVICE_UPDATE’s vary according to the workload of the instance • The maximum interval between these service updates is 10 minutes. Check Listener log. When busy, it updated every few seconds 05-MAR-2009 13:31:15 * service_update * RACNY11 * 0 05-MAR-2009 13:31:16 * service_update * RACTX22 * 0 05-MAR-2009 13:31:19 * service_update * RACTX21 * 0 05-MAR-2009 13:31:31 * service_update * RACTX22 * 0 05-MAR-2009 13:31:52 * service_update * RACNY12 * 0 Setting a Listener Trace:’ lsnrctl set trc_level 16’ produces a trace log that shows the Load Balance Advisory updates 2009-03-05 16:24:17.185722 : nsglgrDoRegister:inst loads: ld1:102 mld1:81920 ld2:17 mld2:6000 2009-03-05 16:24:17.185746 : nsglgrDoRegister:service:RACNY_STAGING what:4 value:1 2009-03-05 16:24:17.185763 : nsglgrDoRegister:service:RACNY_STAGING what:2 value:0 2009-03-05 16:24:17.185799 : nsglgrDoRegister:service:RACNY_RTG what:4 value:100 2009-03-05 16:24:17.185817 : nsglgrDoRegister:service:RACNY_RTG what:2 value:100 2009-03-05 16:24:17.186112 : nsglgrDoRegister:service:RACNY1 what:4 value:1 2009-03-05 16:24:17.186129 : nsglgrDoRegister:service:RACNY1 what:2 value:2 2009-03-05 16:24:17.186157 : nsglgrDoRegister:exit 2009-03-05 16:24:17.186193 : nsdo:entry What:4 = Delta What: 2 = Goodness ld1 = node load - load average ld2 = instance load - number user sessions mld1 = max load data/ max cpu capacity mld2 = max load data/sessions init.ora
Load Balancing Server side Load Balancing across clusters
Load Balancing Server side Load Balancing across clusters GOODNESS 1 Goodness: ranking for the quality of service that the service is experiencing at an instance level, including whether access has been restricted from an instance. Lower number is better
Server side Load Balancing across clusters Verify service in listener: lsnrctl status List services in listener: lsnrctl services
Bringing a multi-cluster online • Duplicate rac database • Start replication • Add services with goals • Add entry to tnsnames remote_listener entry: • Alter system set remote_listener='LISTENERS_RACDB1'; • $ lsnctl status should show all 4 instances for each oracle service Service "RACDB_MYAPP" has 4 instance(s). Instance "RACDB11", status READY, has 16 handler(s) for this service... Instance "RACDB12", status READY, has 16 handler(s) for this service... Instance "RACDB21", status READY, has 17 handler(s) for this service... Instance "RACDB22", status READY, has 16 handler(s) for this service... • Client should set a TCP connect timeout. Syntax varies by client type Example: SQLNET.OUTBOUND_CONNECT_TIMEOUT=4
Issue: Dispatcher blocked • Listener log: TNS-12520 errors • lsnrctl services: • Service "RACDB_AS" has 2 instance(s). Instance "RACDB1", status READY, has 11 handler(s) for this service... Handler(s): D009" established:15499 refused:0 current:972 max:972 state:blocked • Sqlplus: select count(*) from v$circuit does not show many sessions
Dispatcher Blocked • Reason: Dispatcher defaults to public host. • Check in: ‘network' column of v$dispatcher • 'lsnrctl services‘ • It's also described in metalink note: 578524.1 • Fix: • alter system set • dispatchers='(address=(protocol=tcp)(host=racdb-vip))(dispatchers=10)’ • (index=0) scope=both sid=‘RACDBxx'; • Info: • A listener will hand off a connection to a dispatcher if it is local. • If it is remote, it will send the client a new host/port and the client • will reconnect to the dispatcher. • The fix was to modify the dispatcher with a host vip entry. This specifies • the endpoint listener.
Load Balancing HOST LOAD COUNT RACNY RACTX
Swingbench with a slow ramp up of 250 sessions load racny11 racny12 ractx21 tactx22 cpu goodness
Runtime Load BalancingAllows Oracle Clients to provide intelligent allocations of connections based on the current service level provided by the database instances when applications request a connection to complete some work.
Interface • OCI - Oracle Call Interface. Used in C and C++ as well as other languages that are built on top of OCI layer such as perl and PHP • Thick JDBC - Build around OCI libraries to implement JCBC API standard for Java Language • Thin JDBC - Native JDBC. Pure JAVA Implementation that does not need underlying OCI layer. Does not support TAF. Does support Fast Connection Failover. • ODP.NET - Oracle Data Provider (ODP) for Microsoft’s .NET development platform. Based on OCI
Terms • Oracle Notification Service (ONS) • Communicates/interacts with components like the listener, application, etc • server-side ONS through ORA_CRS_HOME/opmn/conf/ons.config. Use racgons to add middle-tier node info to the OCR • No ONS daemon needed on the client when using Remote ONS Subscription • Oracle Clusterware and RAC utilize ONS to propagate FAN messages both within the RAC cluster, and to client or mid-tier machines • FAN - Fast Application Notification • quickly alert applications about configuration and workload service level changes • FCF - Fast Connection Failover. (do not use TAF if using FCF) • Supports both JDBC Thin and JDBC Oracle Call Interface (OCI) drivers. • Requires ONS • Subscribes to FAN events
Fast Application Notification • To use FAN callouts: • place an executable in the directory: CRS_home/racg/usrco on every node that runs CRS • Example • #! /bin/ksh • FAN_LOGFILE= [your path name]/admin/log/`hostname`_uptime.log • echo $* "reported="`date` >> $FAN_LOGFILE & • FAN events are published using ONS and Advanced Queuing. • The Connection Manager (CMAN) and Oracle Net Services Listeners are • integrated with FAN events. • Listener and CMAN immediately de-register services provided by a failed instance • ONS is used to propagate FAN messages. • Verify ONS is running: • srvctl status nodeapps -n {server} • $CRS_HOME/bin/onsctl ping • $CRS_HOME/bin/onsctl debug
Runtime Load Balancing • JAVA JDBC and JAVA thin • use ONS to subscribe to FAN events • Turning on Fast Connection Failover feature with JDBC includes Runtime Connection Load Balancing • OCI • Uses the System Alert Queue (aq) to subscribe to FAN events. • Runtime connection load balancing is enabled by default in a release 11.1 or higher client talking to server of 10.2 or higher. • The application must have been linked with the threads library. • The OCI environment must be created in OCI_EVENTS and OCI_THREADED mode • DBMS_SERVICE.MODIFY_SERVICE( service_name => 'MYSERVICE‘ ,aq_ha_notifications => true ,goal => DBMS_SERVICE.GOAL_SERVICE_TIME ,clb_goal => DBMS_SERVICE.CLB_GOAL_SHORT ); END; / • To view: DBA_OUTSTANDING_ALERTS or DBA_ALERT_HISTORY.
Runtime Load balancing • Balance connections across all of the Oracle RAC instances • when a service starts; this is preferable to directing the sessions that are defined • for the connection pool to the first Oracle RAC instance that supports the service • Remove terminated connections immediately when a service is declared • DOWN at an instance, and immediately when nodes are declared DOWN • Report errors to clients immediately when Oracle detects the • NOT RESTARTING state, instead of making the client wait while the service • repeatedly attempts to restart • Balance work requests at run time using load balancing advisory events
Transaction Application Failover - Transaction Application Failover can also be associated with a service using the dbms_service package when using the BASIC method TNSNAMES Entry STOCK = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = uspls330-vip)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = uspls331-vip)(PORT = 1521)) (LOAD_BALANCE = yes) (FAILOVER=on) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = stock) (FAILOVER_MODE = (TYPE = SELECT) (METHOD = BASIC) (RETRIES = 180) (DELAY = 5) ) ) ) session: Fails over the session; Does not attempt to recover selects. select: Allows users with open cursors to continue fetching on them after failure. Involves overhead on the client side in normal select operations. none: This is the default, in which no failover functionality is used. METHOD: Specify how fast failover is to occur from the primary node to the backup node: basic: Establishes connections at failover time. This option requires almost no work on the backup database server until failover time. preconnect: Pre-establishes connections. This provides faster failover but requires that the backup instance be able to support all connections from every supported instance.
DBMS_SERVICE package SQL> BEGIN DBMS_SERVICE.MODIFY_SERVICE( service_name => ‘SERV1' ,goal => DBMS_SERVICE.GOAL_SERVICE_TIME ,clb_goal => DBMS_SERVICE.CLB_GOAL_SHORT ,failover_method => DBMS_SERVICE.FAILOVER_METHOD_BASIC ,failover_type => DBMS_SERVICE.FAILOVER_TYPE_SELECT ,failover_retries => 50 ,failover_delay => 5); END;
Transaction Application Failover • What To Expect After TAF Failover • TYPE=SESSION • If any uncommitted transaction existed in the session • ORA-25402: transaction must roll back • If no active transaction • SELECT statements will get ORA-25401: can not continue fetches • Even though error may be returned, session will be connected to • surviving instance and work may continue • TYPE=SELECT • Similar behavior to above, but SELECT statements will continue fetches without interruption if no transaction in progress
Monitoring Services Use OEM Database Maintenance page - Services – Cluster Management Database Services Set Thresholds per service on elapsed or cpu Identify notification of alerts via Notification Rules: Service CPU Time (per user call) (microseconds) Service Response Time (per user call) (microseconds)
Tracing Services Start: execute dbms_monitor.serv_mod_act_trace_enable( - service_name => 'SERV1', - module_name => 'xyz', - (default is all modules) waits => TRUE, - binds => FALSE); select * from dba_enabled_traces; Disable: execute dbms_monitor.serv_mod_act_trace_disable( - service_name => 'SERV1', - module_name => 'xyz'); select * from dba_enabled_traces; To set the module: BEGIN DBMS_APPLICATION_INFO.set_module(module_name => 'add_order', action_name => 'insert into orders'); END;
Stop a Service Stop a service srvctl stop service -d RACDB -s SERV1 Stop a service and force off connections srvctl stop service -d RACDB -s SERV1 -i RACDB1 -f Disconnect all sessions for a service: DBMS_SERVICE.DISCONNECT_SESSION(‘SERV1'); To gracefully take a service offline: Stop the service on an instance to prevent further connections. At a predefined time, disconnect the remaining sessions. srvctl stop service -d RACDB -s SERV1 DBMS_SERVICE.DISCONNECT_SESSION(‘SERV1'); App App App App App App App App
While at Oracle OpenWorld…. Visit HP in Moscone South, Booth #1301 Assess your IT environment – for a chance to win an HP Mini Notebook and donate to Feeding America! Take the TCO Challenge Modernization Impact Quiz Storage and Server Assessment Adaptive Infrastructure Maturity Model ERP Optimization Assessment 47 14 August 2014