950 likes | 1.22k Views
RAC 11gR2 performance with OEM 12c and AWR data. Interconnect Troubleshooting: lessons learned from a critical production system Session UGF9804. Tim Quinlan Scotiabank Toronto, Ontario Canada. Introduction. Discuss networking issues in a virtualized environment
E N D
RAC 11gR2 performance with OEM 12c and AWR data. Interconnect Troubleshooting: lessons learned from a critical production systemSession UGF9804 Tim Quinlan Scotiabank Toronto, Ontario Canada
Introduction • Discuss networking issues in a virtualized environment • New RAC implementation of a critical system • Review in detail all of the tools used to resolve problems. • Interconnect issues • Including those in a virtualized environment • Quick review of global buffer flows and waits • Tracing & logging • oswatcher, sniffer traces and more • Standard reporting provided by Oracle • OEM 12c • AWR reports • ADDM • Customized reporting using AWR data
Background: The Situation • Major upgrade to a critical non-RAC system • Volume and size increase of over 50% • System had reached limits on vertical scalability • Performance issues when batch jobs ran into the online window • Very high availability requirements • RAC was chosen for availability and horizontal scaling • RAC was new to our company and to the app vendor • Oracle corp. was involved early as infrastructure consultant • The environment was a virtualized one
Background: The Situation Virtualized environment • Hardware: AIX ran on an LPAR using virtualized I/O • Servers were virtual/logical images • the os and hardware type are not critical to the concepts and tools covered in this session. • virtual I/O server shared by all LPARs on the physical server including all public network traffic and physical I/O. • Virtualized network • “virtual private” interconnect. • not a physically separate and private interconnect • only private interconnect traffic would go through it. • Physical switches used for interconnect were to be “quiet” • Shared storage was used
Background: The Situation Virtualized environment (cont.) • We had concerns about the above. • Shared SAN performance could change at any time for some outside activity (e.g. another app) • Oracle reiterated that best practice was to use a physically separate private interconnect. • This was well understood, but a very fast, high-performing network was promised for the interconnect. • Virtual I/O on the server was already being used in production, so not considered to be a big problem at the outset.
What Happened -and how did we deal with it? Node evictions • Started having regular node evictions - no apparent reason. • happened even when we were not running anything • Switches: quiet & performing well according to network group • Servers were not busy • Virtual I/O servers & other lpars on server not busy • we (the dba’s) could not easily verify this • RAC was thought to be the problem • Sys Admins, network group and management all had the impression node evictions were due to a RAC bug • The project was in jeopardy • Up to the DBA’s to figure things out in this virtual environment
What Happened -and how did we deal with it? What to do next? • Oracle corp. RAC centre of excellence in Orlando -> performed a full RAC assessment • re-checked hardware, os and software settings and parms • All was fine • Got network group and sys admins more involved. • More monitoring was turned on • oswatcher: traceroute every 20 secs; vmstat, iostat; netstat… • procwatcher: monitor DB and G.I. pocesses on the server • network sniffer: detailed monitoring of packets • OEM 12c • awr, addm and ash reports • awr data: our home-built reports
What Happened -and how did we deal with it? • DB and Grid Infrastructure (GI) logs showed us • a node eviction occurred • when it occurred (important for inspecting other logs) • action taken by the GI and DB software • They did not show us: • why the node eviction occurred • All fingers pointing to a RAC problem • Let’s take a look at the logs
Problem analysis – ocssd log No network heartbeat and we see the node eviction
Problem analysis – crsd log OCR master change; vip & service from evicted node failover
Problem analysis – DB alert log We see the instance shutting down and restarting on the evicted node
Problem analysis – oswatcher and procwatcher • Turned on oswatcher and procwatcher • procwatcher: provides information on DB and GI • background processes that were running • did not help us much in this situation • oswatcher • within it is a traceroute (like ping) • traceroute turned on to run every 20 seconds • normally should return: “0 ms 0 ms 0 ms” • before a node eviction we would see “* * *” • - the node could not be pinged • - we had a networking issue
Problem analysis – oswatcher We see the break in communication by * * * below.
Problem analysis – digging deeper • There were still questions that the lack of communication could have been caused by the RAC software. • We shutdown the DB and Grid infrastructure software and simply ran traceroute’s. • with oswatcher • turned off monitoring (including oswatcher) and ran native traceroute • Periodically still had problems • - proved this was a server or network issue. • Time for a sniffer trace on the private interconnect
Problem analysis – network sniffer DRDA packets on our private interconnect?
Problem Analysis - summary • Better network monitoring was needed • Could not rely on oswatcher, procwatcher or network sniffers on a continual basis in production • reviewing traces was very time consuming. • almost too much information: easy to miss things • one sniffer trace had a 10 second gap • no error message. Only traceroute, ping or close reading of sniffer trace can find this. • Start with tools provided to us • OEM 12c (12.1.0.2 cloud control) • AWR standard reports including rac report • ADDM • ASH (through oem) • Add customized reporting
Flow: to get a block not cached locally • gc block request is sent • If requesting instance is block resource master it’s 2-way. • A “Direct Request” is made to the other node (delay). • If requester is not the Master, it’s 3-way. • small message is sent to LMS Resource Master (delay) • Msg is processed there & another msg sent (see 3 below) • A small message is sent to the node with the block. Incurs transmission and delay time of the process msg & block. • If block exists at the other node & updates not yet logged: • LMS gets the lgwr to flush rcvy info to redo (delay). • If “curr”, block is pinned and if “cr” it is a build (delay) • Block is shipped back: creates transmission delay • Requesting instance gets the block. • Shows up as gc current block [2/3]-way. Wait event is complete!
What if the block is not cached anywhere? Not at local or any remote node • Resource master sends a “grant” to the requesting node to perform a disk read. • If the requesting node is the resource master • This happens immediately. • If the requesting node is not the master • The grant is sent as a small message. • This is always 2-way • Once the grant is received a disk read is performed.
There are a lot of possible waits Here are some of them: • gc curr/cr request: placeholder wait during a cr block or curr buffer rqst. • gc curr/cr block 2/3 way: block received immediate. Not congested/busy • gc curr/cr block busy: request received by LMS but send was delayed • gc curr/cr grant 2-way: grant is given without delay. • gc curr grant busy: blocked due to others ahead of it or other delay. • gc curr/cr block/grant congested: request spent > 1ms in internal queues. • gc curr/cr failure/retry: not received due to a failure • gc buffer busy: buffer arrival time < buffer pin time. • log flush: blocks requested were changed but not yet logged • message sent indirect: messages pooled/queued and sent. • gc blocks lost: could be a bug, bad os patch level, network issue. • ge waits: serialize access to a resource needed by > 1 instance • cr build: time to build the cr block with undo on holding instance • curr pin: wait at holding node while block being prepared to ship • dynamic resource mastering (drm): several events. • occurs due to heavy activity on blocks at a different node.
OEM 12cUseful for quick view of RAC Performance home for the database. Aim for < 1ms cache block access latency
OEM 12c Graphical chart (1st of 2 display formats for cluster cache coherency)
OEM 12c Tabular chart (2nd of 2 display formats for cluster cache coherency)
OEM 12c • Below is what cluster performance should look like • Latency > 1ms
OEM 12c Instance -> Performance -> ASH analytics • Can get more interconnect details. • Example below is Dynamic Resource Mastering
Non-OEM reporting Provided out-of-the-box • awr rac global cluster-wide stats report: awrgrpt.sql • For the DB: all instances in cluster. Can be run on any instance • Useful to see that a problem did occur • BUT: difficult to spot trends and causes (we have a solution) • Includes several worthwhile sections • SysStat and Global Messaging – RAC • Global Cache efficiency • Global cache and enqueue workload and messaging • cr and current blocks served stats • Global cache transfer stats • Interconnect ping, client and device stats • Dynamic remastering stats • sql ordered by cluster wait time (global)
awrgrpt report messaging sections Failures & lost blocks; volumes of blocks rcvd/served & msgs sent/rcvd (direct & indirect); cpu seconds also “by trx”
awrgrpt report messaging sections Compare the local % to remote % & disk % Look for trends & low cr, cu & enqueue ms
awrgrpt report messaging sections Should have low times < 1ms Look for failures, errors, flush queue issues & disk reads. Also for trends or very high numbers
awrgrpt report messaging sections Look for high pct of long running pins, flushes & writes.
awrgrpt report messaging sections Can be used for trend analysis and changes in trends. Look for high pct immediate & low cr and curr times in ms
awrgrpt report messaging sections • Check for lost blocks and ratio of 2-hop to 3-hop transfers • cr and curr times should be low (< 1 ms)
awrgrpt report messaging sections • List of private interconnect names and ip addresses
awrgrpt report messaging sections Good for problem solving. Ping times for 500 bytes & 8k with averages and std dev’s. - Use of an interconnect device by an instance. - Look at volumes & ratios of all sent/rcvd: mb (cache, ipq, dlm, ping, misc) - Not as useful for performance due lack of timings
awrgrpt report messaging sections • Same as previous report except this puts things in perspective by second. • Good for capacity analysis
awrgrpt report messaging sections There’s an error in the query below. Predicates are on: snap_id, dbid, instance_number. Should also include if_name, ip_addr, net_mask, flags and mtu. Rows are incorrectly repeated 4 times.The correct interconnect device stats query is shown later. also “by second” drops,overrun,lost,error
awrgrpt report messaging sectionsDynamic Remastering Occurs when blocks mastered at 1 node are used heavily at another node. • Nbr. of remastering ops. in a period of time • Counts of objects involved in the remastering by type • secs per op and sync secs per op. by type • Can also see the locks sent and received per operation • Difficult to spot trends unless you look at multiple reports
awrgrpt report messaging sections • sql incurring high cluster wait times: compare to elapsed(s)
awrgrpt report messaging sections • All messaging stats: total and per sec • only a subset are shown here
Standard awrrpt report cluster info • Run on each instance • An instance view of stats that awrgrpt showed for the DB. • Includes similar sections • Global Cache load profile • Global Cache efficiency • Global cache and enqueue workload and messaging • Cluster interconnect • Global messaging statistics • Cr and current blocks served stats • Global cache transfer stats & times (including immediate stats) • Interconnect ping, client and device stats • Dynamic remastering stats • sql ordered by cluster wait time • See “Appendix A” for a sample of these sections
ADDM reporting • ADDM will report on cluster interconnect performance
or … ADDM for RAC in OEM • Cluster (database) home page • Choose: addm findings count brings you to addm for the cluster
Augmenting Standard Reporting • It is difficult to spot problems and trends using the above reports • Goal is to capture awr data in a spreadsheet format • Generate quickly as needed • Get performance, capacity, problems and trends over time • Do busy times or errors occur at different times of day? • Look at trends with different degrees of granularity. • months, weeks, days, hours, minutes (e.g. 10 minutes) • When did we start getting errors? • Are volumes increasing? • Will we soon have a capacity issue? • Helps to: • Spot and resolve problems • Avoid problems: trend analysis stats help capacity planning
Augmenting Standard Reporting • Want to run these in an automated manner. For example: SQL>@awricdevicestats 2013-02-06-08:00:00 2013-02-06-20:00 • Run with start and end times that coincide with snapshot intervals. • Can generate many rows in a spreadsheet (.csv) format in a smaller time interval to spot problems. • Below creates 6 files (rows) that can be concatenated into a single spreadsheet for analysis purposes for a single hour. SQL>@awricdevicestats 2013-02-06-08:00:00 2013-02-06-08:10 SQL>@awricdevicestats 2013-02-06-08:10:00 2013-02-06-08:20 SQL>@awricdevicestats 2013-02-06-08:20:00 2013-02-06-08:30 SQL>@awricdevicestats 2013-02-06-08:30:00 2013-02-06-08:40 SQL>@awricdevicestats 2013-02-06-08:40:00 2013-02-06-08:50 SQL>@awricdevicestats 2013-02-06-08:50:00 2013-02-06-09:00
Augmenting Standard Reporting • Wrapper script to take time as input and translate to snap_id’s is in the white paper in “Appendix C” • Build report name • Gets snap_id’s by date and time
Augmenting Standard Reporting • Snap_id’s and variables are set. • Get the report name • Run the report
The queries to evaluate interconnect problems, performance and capacity trends • 13 queries are shown that can produce trending spreadsheets • We use the same queries that are used in the awrgrpt.sql and awrrpt.sql • Data is written in .csv (spreadsheet) format. • Could be inserted into tables if you like • Timeframes used must be awr snapshot boundaries • I keep 6 months of data in 10 minute intervals in awr. • The queries, reports and how they are used are shown next
The Queries: 1. Interconnect device statistics Table: dba_hist_ic_device_stats Useful for • Send/receive errors • Frame errors • Buffer overruns • Trends of buffer packets sent and received • Problem solving • Trend analysis • Use small time intervals for problem solving
The Queries: 2. Dynamic remastering stats & ops Table: dba_hist_dyn_remaster_stats Use to find out: • Remaster op count • When they occur • Objects involved • Secs per op • Sync secs per op • Locks sent and received per op • Trend analysis
The Queries: 2. Dynamic remastering stats & ops By instance