Tim Quinlan Scotiabank Toronto, Ontario Canada

RAC 11gR2 performance with OEM 12c and AWR data. Interconnect Troubleshooting: lessons learned from a critical production systemSession UGF9804 Tim Quinlan Scotiabank Toronto, Ontario Canada

Introduction • Discuss networking issues in a virtualized environment • New RAC implementation of a critical system • Review in detail all of the tools used to resolve problems. • Interconnect issues • Including those in a virtualized environment • Quick review of global buffer flows and waits • Tracing & logging • oswatcher, sniffer traces and more • Standard reporting provided by Oracle • OEM 12c • AWR reports • ADDM • Customized reporting using AWR data

Background: The Situation • Major upgrade to a critical non-RAC system • Volume and size increase of over 50% • System had reached limits on vertical scalability • Performance issues when batch jobs ran into the online window • Very high availability requirements • RAC was chosen for availability and horizontal scaling • RAC was new to our company and to the app vendor • Oracle corp. was involved early as infrastructure consultant • The environment was a virtualized one

Background: The Situation Virtualized environment • Hardware: AIX ran on an LPAR using virtualized I/O • Servers were virtual/logical images • the os and hardware type are not critical to the concepts and tools covered in this session. • virtual I/O server shared by all LPARs on the physical server including all public network traffic and physical I/O. • Virtualized network • “virtual private” interconnect. • not a physically separate and private interconnect • only private interconnect traffic would go through it. • Physical switches used for interconnect were to be “quiet” • Shared storage was used

Background: The Situation Virtualized environment (cont.) • We had concerns about the above. • Shared SAN performance could change at any time for some outside activity (e.g. another app) • Oracle reiterated that best practice was to use a physically separate private interconnect. • This was well understood, but a very fast, high-performing network was promised for the interconnect. • Virtual I/O on the server was already being used in production, so not considered to be a big problem at the outset.

What Happened -and how did we deal with it? Node evictions • Started having regular node evictions - no apparent reason. • happened even when we were not running anything • Switches: quiet & performing well according to network group • Servers were not busy • Virtual I/O servers & other lpars on server not busy • we (the dba’s) could not easily verify this • RAC was thought to be the problem • Sys Admins, network group and management all had the impression node evictions were due to a RAC bug • The project was in jeopardy • Up to the DBA’s to figure things out in this virtual environment

What Happened -and how did we deal with it? What to do next? • Oracle corp. RAC centre of excellence in Orlando -> performed a full RAC assessment • re-checked hardware, os and software settings and parms • All was fine • Got network group and sys admins more involved. • More monitoring was turned on • oswatcher: traceroute every 20 secs; vmstat, iostat; netstat… • procwatcher: monitor DB and G.I. pocesses on the server • network sniffer: detailed monitoring of packets • OEM 12c • awr, addm and ash reports • awr data: our home-built reports

What Happened -and how did we deal with it? • DB and Grid Infrastructure (GI) logs showed us • a node eviction occurred • when it occurred (important for inspecting other logs) • action taken by the GI and DB software • They did not show us: • why the node eviction occurred • All fingers pointing to a RAC problem • Let’s take a look at the logs

Problem analysis – ocssd log No network heartbeat and we see the node eviction

Problem analysis – crsd log OCR master change; vip & service from evicted node failover

Problem analysis – DB alert log We see the instance shutting down and restarting on the evicted node

Problem analysis – oswatcher and procwatcher • Turned on oswatcher and procwatcher • procwatcher: provides information on DB and GI • background processes that were running • did not help us much in this situation • oswatcher • within it is a traceroute (like ping) • traceroute turned on to run every 20 seconds • normally should return: “0 ms 0 ms 0 ms” • before a node eviction we would see “* * *” • - the node could not be pinged • - we had a networking issue

Problem analysis – oswatcher We see the break in communication by * * * below.

Problem analysis – digging deeper • There were still questions that the lack of communication could have been caused by the RAC software. • We shutdown the DB and Grid infrastructure software and simply ran traceroute’s. • with oswatcher • turned off monitoring (including oswatcher) and ran native traceroute • Periodically still had problems • - proved this was a server or network issue. • Time for a sniffer trace on the private interconnect

Problem analysis – network sniffer DRDA packets on our private interconnect?

Problem Analysis - summary • Better network monitoring was needed • Could not rely on oswatcher, procwatcher or network sniffers on a continual basis in production • reviewing traces was very time consuming. • almost too much information: easy to miss things • one sniffer trace had a 10 second gap • no error message. Only traceroute, ping or close reading of sniffer trace can find this. • Start with tools provided to us • OEM 12c (12.1.0.2 cloud control) • AWR standard reports including rac report • ADDM • ASH (through oem) • Add customized reporting

Flow: to get a block not cached locally • gc block request is sent • If requesting instance is block resource master it’s 2-way. • A “Direct Request” is made to the other node (delay). • If requester is not the Master, it’s 3-way. • small message is sent to LMS Resource Master (delay) • Msg is processed there & another msg sent (see 3 below) • A small message is sent to the node with the block. Incurs transmission and delay time of the process msg & block. • If block exists at the other node & updates not yet logged: • LMS gets the lgwr to flush rcvy info to redo (delay). • If “curr”, block is pinned and if “cr” it is a build (delay) • Block is shipped back: creates transmission delay • Requesting instance gets the block. • Shows up as gc current block [2/3]-way. Wait event is complete!

What if the block is not cached anywhere? Not at local or any remote node • Resource master sends a “grant” to the requesting node to perform a disk read. • If the requesting node is the resource master • This happens immediately. • If the requesting node is not the master • The grant is sent as a small message. • This is always 2-way • Once the grant is received a disk read is performed.

There are a lot of possible waits Here are some of them: • gc curr/cr request: placeholder wait during a cr block or curr buffer rqst. • gc curr/cr block 2/3 way: block received immediate. Not congested/busy • gc curr/cr block busy: request received by LMS but send was delayed • gc curr/cr grant 2-way: grant is given without delay. • gc curr grant busy: blocked due to others ahead of it or other delay. • gc curr/cr block/grant congested: request spent > 1ms in internal queues. • gc curr/cr failure/retry: not received due to a failure • gc buffer busy: buffer arrival time < buffer pin time. • log flush: blocks requested were changed but not yet logged • message sent indirect: messages pooled/queued and sent. • gc blocks lost: could be a bug, bad os patch level, network issue. • ge waits: serialize access to a resource needed by > 1 instance • cr build: time to build the cr block with undo on holding instance • curr pin: wait at holding node while block being prepared to ship • dynamic resource mastering (drm): several events. • occurs due to heavy activity on blocks at a different node.

OEM 12cUseful for quick view of RAC Performance home for the database. Aim for < 1ms cache block access latency

OEM 12c Graphical chart (1st of 2 display formats for cluster cache coherency)

OEM 12c Tabular chart (2nd of 2 display formats for cluster cache coherency)

OEM 12c • Below is what cluster performance should look like • Latency > 1ms

OEM 12c Instance -> Performance -> ASH analytics • Can get more interconnect details. • Example below is Dynamic Resource Mastering

Non-OEM reporting Provided out-of-the-box • awr rac global cluster-wide stats report: awrgrpt.sql • For the DB: all instances in cluster. Can be run on any instance • Useful to see that a problem did occur • BUT: difficult to spot trends and causes (we have a solution) • Includes several worthwhile sections • SysStat and Global Messaging – RAC • Global Cache efficiency • Global cache and enqueue workload and messaging • cr and current blocks served stats • Global cache transfer stats • Interconnect ping, client and device stats • Dynamic remastering stats • sql ordered by cluster wait time (global)

awrgrpt report messaging sections Failures & lost blocks; volumes of blocks rcvd/served & msgs sent/rcvd (direct & indirect); cpu seconds also “by trx”

awrgrpt report messaging sections Compare the local % to remote % & disk % Look for trends & low cr, cu & enqueue ms

awrgrpt report messaging sections Should have low times < 1ms Look for failures, errors, flush queue issues & disk reads. Also for trends or very high numbers

awrgrpt report messaging sections Look for high pct of long running pins, flushes & writes.

awrgrpt report messaging sections Can be used for trend analysis and changes in trends. Look for high pct immediate & low cr and curr times in ms

awrgrpt report messaging sections • Check for lost blocks and ratio of 2-hop to 3-hop transfers • cr and curr times should be low (< 1 ms)

awrgrpt report messaging sections • List of private interconnect names and ip addresses

awrgrpt report messaging sections Good for problem solving. Ping times for 500 bytes & 8k with averages and std dev’s. - Use of an interconnect device by an instance. - Look at volumes & ratios of all sent/rcvd: mb (cache, ipq, dlm, ping, misc) - Not as useful for performance due lack of timings

awrgrpt report messaging sections • Same as previous report except this puts things in perspective by second. • Good for capacity analysis

awrgrpt report messaging sections There’s an error in the query below. Predicates are on: snap_id, dbid, instance_number. Should also include if_name, ip_addr, net_mask, flags and mtu. Rows are incorrectly repeated 4 times.The correct interconnect device stats query is shown later. also “by second” drops,overrun,lost,error

awrgrpt report messaging sectionsDynamic Remastering Occurs when blocks mastered at 1 node are used heavily at another node. • Nbr. of remastering ops. in a period of time • Counts of objects involved in the remastering by type • secs per op and sync secs per op. by type • Can also see the locks sent and received per operation • Difficult to spot trends unless you look at multiple reports

awrgrpt report messaging sections • sql incurring high cluster wait times: compare to elapsed(s)

awrgrpt report messaging sections • All messaging stats: total and per sec • only a subset are shown here

Standard awrrpt report cluster info • Run on each instance • An instance view of stats that awrgrpt showed for the DB. • Includes similar sections • Global Cache load profile • Global Cache efficiency • Global cache and enqueue workload and messaging • Cluster interconnect • Global messaging statistics • Cr and current blocks served stats • Global cache transfer stats & times (including immediate stats) • Interconnect ping, client and device stats • Dynamic remastering stats • sql ordered by cluster wait time • See “Appendix A” for a sample of these sections

ADDM reporting • ADDM will report on cluster interconnect performance

or … ADDM for RAC in OEM • Cluster (database) home page • Choose: addm findings count brings you to addm for the cluster

Augmenting Standard Reporting • It is difficult to spot problems and trends using the above reports • Goal is to capture awr data in a spreadsheet format • Generate quickly as needed • Get performance, capacity, problems and trends over time • Do busy times or errors occur at different times of day? • Look at trends with different degrees of granularity. • months, weeks, days, hours, minutes (e.g. 10 minutes) • When did we start getting errors? • Are volumes increasing? • Will we soon have a capacity issue? • Helps to: • Spot and resolve problems • Avoid problems: trend analysis stats help capacity planning

Augmenting Standard Reporting • Want to run these in an automated manner. For example: SQL>@awricdevicestats 2013-02-06-08:00:00 2013-02-06-20:00 • Run with start and end times that coincide with snapshot intervals. • Can generate many rows in a spreadsheet (.csv) format in a smaller time interval to spot problems. • Below creates 6 files (rows) that can be concatenated into a single spreadsheet for analysis purposes for a single hour. SQL>@awricdevicestats 2013-02-06-08:00:00 2013-02-06-08:10 SQL>@awricdevicestats 2013-02-06-08:10:00 2013-02-06-08:20 SQL>@awricdevicestats 2013-02-06-08:20:00 2013-02-06-08:30 SQL>@awricdevicestats 2013-02-06-08:30:00 2013-02-06-08:40 SQL>@awricdevicestats 2013-02-06-08:40:00 2013-02-06-08:50 SQL>@awricdevicestats 2013-02-06-08:50:00 2013-02-06-09:00

Augmenting Standard Reporting • Wrapper script to take time as input and translate to snap_id’s is in the white paper in “Appendix C” • Build report name • Gets snap_id’s by date and time

Augmenting Standard Reporting • Snap_id’s and variables are set. • Get the report name • Run the report

The queries to evaluate interconnect problems, performance and capacity trends • 13 queries are shown that can produce trending spreadsheets • We use the same queries that are used in the awrgrpt.sql and awrrpt.sql • Data is written in .csv (spreadsheet) format. • Could be inserted into tables if you like • Timeframes used must be awr snapshot boundaries • I keep 6 months of data in 10 minute intervals in awr. • The queries, reports and how they are used are shown next

The Queries: 1. Interconnect device statistics Table: dba_hist_ic_device_stats Useful for • Send/receive errors • Frame errors • Buffer overruns • Trends of buffer packets sent and received • Problem solving • Trend analysis • Use small time intervals for problem solving

The Queries: 1. Interconnect device statistics

The Queries: 2. Dynamic remastering stats & ops Table: dba_hist_dyn_remaster_stats Use to find out: • Remaster op count • When they occur • Objects involved • Secs per op • Sync secs per op • Locks sent and received per op • Trend analysis

The Queries: 2. Dynamic remastering stats & ops By instance

Tim Quinlan Scotiabank Toronto, Ontario Canada

Tim Quinlan Scotiabank Toronto, Ontario Canada

Presentation Transcript

2005 Pearson Education Canada Inc., Toronto, Ontario

Private Investigator Investigation Service Canada, Ontario, Toronto

USB Duplication Services | Toronto, Ontario

Printing-Services-Company-Ryerson-Toronto-Ontario-Canada

Sources for Toronto, Ontario

Graphics Launch Scotiabank

Toronto , Canada

Public Health 2014 Toronto, Ontario Canada May 26-29

Toronto, Ontario

Scotiabank – SJBS

Kathleen Quinlan

Tim Quinlan TLQ Consulting Inc.

Best Loft Rental Services in Toronto | Ontario Canada

Rent Your Property At Circl | Toronto Ontario Canada

Astrologer in Scarborough, Toronto, Ontario, Canada

Cheap Apartments for rent in Canada, Toronto - Ontario

Rishi Astrologer in Toronto - Astrologer in Mississauga, Ontario, Toronto, Canada

Debt Consolidation Toronto Ontario

Best health and safety training in Toronto, Ontario, Canada

Paving Supplier Ontario Canada

Best Indian restaurant with Liquor, bar in Ontario, Toronto, Canada

Famous Astrologer Psychic in Toronto, Ontario, Canada