620 likes | 811 Views
S311441 Practical Performance Management for Oracle Real Application Clusters. Michael Zoll, Consulting Member of Technical Staff Barb Lundhild, Product Manager, Oracle Real Application Clusters.
E N D
S311441 Practical Performance Management for Oracle Real Application Clusters Michael Zoll, Consulting Member of Technical Staff Barb Lundhild, Product Manager, Oracle Real Application Clusters
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions.The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
Agenda <Insert Picture Here> • Oracle RAC Infrastructure and Technical Fundamentals • Application and Database Design • Common Problems and Symptoms • Diagnostics and Problem Determination • Appendix
Objective • Convey a few simple and fundamental concepts of Oracle RAC performance • Summarize application level performance and scalability information • Provide some simple sizing hints • Give exemplary overview of common problems and solutions • Builds on similar presentation from OOW 2008 http://www.oracle.com/technology/products/database/clustering/pdf/s298716_oow2008_perf.pdf
<Insert Picture Here> Oracle RAC Infrastructure: Technical Fundamentals, Sizing and Configuration
Oracle RAC Architecture /…/ public network VIPn VIP2 VIP1 Service Service Service Node 2 Node n Listener Node1 Listener Listener SCAN_Listener SCAN_Listener SCAN_Listener instance 1 instance 2 instance n ASM ASM ASM Oracle Clusterware Oracle Clusterware Oracle Clusterware Operating System Operating System Operating System shared storage Redo / Archive logs all instances Database / Control files Managed by ASM OCR and Voting Disks
Log buffer Buffer Cache Global Cache and Global Enqueue ServiceProcesses and Functions SGA Runs in Real Time Priority Library Cache Dictionary Cache Global Resource Directory Global Enqueue Service Global Cache Service Oracle Process Oracle Process LMD0 LMON LMSx DBW0 LGWR Cluster Private High Speed Network
Legend: Immediate direct send: > 96% Post 2 3 Log write and send : < 4% LMS : Buffer Cache 5 Post 6 Send 1 4 Flush redo Global Cache Access LGWR Shadow process: Receive
Basic Performance Facts • Global Cache access is 100 - 500 usecs ( roundtrip ) • Data immediately served from remote instances via private, high speed interconnect • Redo may have to be written to log file before send if data was changed and has not been committed yet • Performance varies with network infrastructure and network protocol • Maximum network hops is 3 messages • For clusters with more than 2 nodes, independent of total cluster size • CPU cost per OLTP transaction • Dependent on locality of access , I.E. messages per tx
Basic Performance Facts: Latency (UDP/GbE and RDS/IB ) Lower CPU cost relative to protocols and network infrastructure • Actual interconnect latency is generally not the problem unless you have exceeded capacity or you are experiencing errors
Private Interconnect • Network between the nodes of an Oracle RAC cluster MUST be private/dedicated to traffic between Oracle RAC nodes • Large ( Jumbo ) Frames for GbE recommended • Avoids fragmentation and reassembly ( 8K / 1500 MTU = 6 fragments ) • Interconnect bandwidth should be tested with non-Oracle utilities ( e.g. iPerf ) • No packet loss at 75% - 80% of bandwidth utilization
Interconnect Bandwidth • Generally, 1Gb/sec sufficient for performance and scalability in OLTP. • DSS/DW systems should be designed with > 1Gb/sec capacity categorically • Prediction of interconnect traffic is difficult • Depends on transaction instruction length per message • Empirical rule of thumb: 1Gb/sec per 32 CPU Cores • Infiniband and 10GbE are supported for scale-out
<Insert Picture Here> Performance and Scalability of Applications and Database Design with RAC
General Scalability • Scaling OLTP workloads, DML intensive • Scale well, if contention is little and database/working set size scales ( I.E. add node when demand grows) • Read intensive workloads scale predictably and linearly • Bigger cache when adding more nodes • Faster read access to global cache than to disk, less disk IO • If cluster-size and database size growth are balanced, system will perform and scale well
Performance and Scaling in Application and Database Design Response Time Impact • Index contention on INSERTS when index is right-growing • system generated “artificial” keys such as consecutive order numbers or “natural” keys such as dates • UPDATES or DELETES to rows in a small working set • Session logging and tracking • First-in first-out queues • State of messages in queues • Bulk INSERTS of large amounts of data • LOBS
DML Contention and Serialization Modification intensive operations on small set of ( cached) blocks “busy blocks” “busy blocks” Table T Table T’ Index I Index I …… …… INSERT INTO I WHERE Key = sequence UPDATE T SET … WHERE row in blocks[1..n] and n is a small number
Performance and Scaling in Application and Database Design CPU Cost due to Inter-Instance Messaging and non-linear scaling • In-memory databases • Working set spans multiple buffer caches • Frequent modifications and reads of recent modifications • Working set fits into memory of one instance • Locality of access worsens when node are added and users are load balanced • Scale as long as sufficient CPU power is available
Read-intensive Buffer Cache 32GB Buffer Cache 32GB Cache Transfer Disk Transfer Read Read Working Set on Disk 64GB …… …… Eventually all blocks cached, Larger read cache No messages in 11g
Performance and Scalability • Good linear or near-linear scaling out of box • IO and CPU intensive applications with large working sets and low proximity of access • Self-Service Web Applications ( Shopping Carts etc. ) • CRM • Document storage and retrieval • Business Analytics and Data Warehousing
Performance and Scalability • Partitioning or load direction may optimize performance • High proximity of access , e.g. adding and removing from message queues • Advanced Queuing and Workflow • Batch and bulk processes • Order processing and Inventory • Payroll processing
Identifying Performance and Scaling Bottlenecks in Database Design The Golden Rules: • #1:For first approximation, disregard read-mostly objects and focus on the INSERT, UPDATE and DELETE intensive indexes and tablespace • #2:If DML access to data is random, no worries if CPU is not an issue • #3:Standard SQL and schema tuning solves > 80% of performance problems. There is usually only a few problem SQL and Tables. • #4:Almost everything can be scaled out quickly with load-direction and load balancing
Identifying Performance and Scaling Bottlenecks in Database Design • Look for indexes with right-growing characteristics • Keys comprising DATE columns or keys generated by sequence numbers • Find frequent updates of “small” and compact tables • “small”=fits into a single buffer cache • Identify frequently and concurrently modified LOBs
HOW ? • Look at segment and SQL statistics in the Automatic Workload Repository • Use Oracle Enterprise Manager Access Advisories and Automatic Database Diagnostics Monitor (ADDM) • Instrumentation with MODULE and ACTION helps identify and quantify components of the workload
Quick FixesWithout modifying Application • Indexes with right-growing characteristics • Cache sequence numbers per instance • Hash or range partition table with LOCAL indexes • Frequent updates of “small” and compact tables • Reduce block size ( 2K ) and row density of blocks (PCTFREE 99 ) • Frequently modified LOBS • Hash partitions ( 128 – 256 ) • FREE POOLS
Quick Fixes • Application Modules which may not scale or cannot be quickly reorganized can be directed to particular nodes via cluster managed services • For Administrator Managed and older releases create service with 1 preferred node and the rest available • For Policy Managed databases use a singleton service • Some large scale and high performance applications may be optimized by Data Partitioning ( range, hash, or composites) and routing per partitioning key in application server tier • E.g. hash by CLIENT_ID, REGION etc.
RAC Database 30% Work I’m busy Pool Instance1 10% Work Application I’m very busy I’m idle Instance2 60% Work Instance3 Leverage Connection Pools UCP: Load Balancing and Affinity
Performance and Scalability Enhancements in 11.1 and 11.2 • Read Mostly • Automatic policy detects read and disk IO intensive tables • No interconnect messages when policy kicks in -> CPU savings • Direct reads for large ( serial and parallel ) scans • No locks , no buffer cache contention • Good when IO subsystem is fast or IO processing is offloaded to storage caches or servers ( e.g. Exadata ) • Fusion Compression • Reduces message sizes and therefore CPU cost • Dynamic policies to make trade-off between disk IO and global cache transfers
<Insert Picture Here> Performance Diagnostics and Checks: Metrics and Method
Normal Behaviour <Insert Picture Here> • It is normal to see time consumed in • CPU • Db file sequential/scattered read • Direct read • Gc cr/current block 2-way/3-way ( Transfer from remote cache ) • Gc cr/current grant 2-way ( Correlates with buffered disk IOs ) • Average latencies should be within baseline parameters • Most problems boil down to CPU, IO, network capacity or applications issues
AVG Waits Time(s) (ms) %Time db file sequential read 2,627,295 21,808 8 43.2 % CPU time 9,156 18.2 20.8% Normality, Baselines and Significance Most significant response time component gc current block 3-way 3,289,371 4,019 1 8.0 gc buffer busy acquire 373,777 3,272 9 6.5 gc current block 2-way 3,982,284 3,192 1 6.3 gc current block busy 125,595 2,931 4 GC waits are influenced by interconnect or remote effects which are not always obvious Contention Avg < 1 ms
Distributed Cause and Effect Example: Cluster-wide Impact of a Log File IO Problem Node 2 Node 2 ROOT CAUSE Node 1 Node 1 Disk Capacity Disk or Controller Bottleneck
Global Metrics View WORKLOAD REPOSITORY report for Instance OOW8 Host oowdb8 Local Symptom Event: gc current block busy 23ms WORKLOAD REPOSITORY report for Instance OOW4 Host oowdb4 Log file paralel write 20ms Cause Global Cache Transfer Stats Avg global cache current block flush time (ms): 21 ms Inst # Busy % 4 data block 114,426 95.9 24.1 Remote instance table
Global Cache Transfer Stats Inst Block Blocks % % No Class Received Immed Busy 4 data block 114,426 95.9 4.1 7 data bloc 162,630 76.6 23.4 Log file IO Avg global cache current block flush time (ms): 3.7 Investigate Serialization gc buffer busy 9 ms Waits for gc current block busy 4 ms Not OK!
Example: Segment Statistics Segments by Global Cache Buffer Busy ES_BILLING TABLE 97.41 % Segments by Current Blocks Received ES_BILLING TABLE 85.81 % ANALYSIS: TABLE ES_BILLING is frequently read and modified on all nodes. The majority of global cache accesses and serialization can be attributed to this .
Comprehensive Cluster-wide Analysis via Global ADDM Courtesy of Cecilia Gervasio, Oracle Server Technologies, Diagnostics and Manageability
<Insert Picture Here> Common Problems and Symptoms
Common Problems and Symptoms <Insert Picture Here> • Interconnect or Switch Problems • Slow or bottlenecked disks • High Log file Sync Latency • System load and scheduling
Symptoms of Interconnect Problems Serialization High latencies Capacity Limit Congestion Dropped Packets ROOT CAUSE
Symptoms of an Interconnect Problem Top 5 Timed EventsAvg %Total ~~~~~~~~~~~~~~~~~~ wait Call Event Waits Time(s)(ms) Time Wait Class ---------------------------------------------------------------------------------------------------- log file sync 286,038 49,872 174 41.7 Commit gc buffer busy 177,315 29,021 164 24.3 Cluster gc cr block busy 110,348 5,703 52 4.8 Cluster gc cr block lost 4,272 4,953 1159 4.1 Cluster cr request retry 6,316 4,668 739 3.9 Other Should never be here Always a severe performance problem
UDP: Packet receive errors Socket buffer overflows • IP: • 1201 Fragments dropped after timeout • Reassembly failure • Incoming packets discarded TX errors:135 dropped: overruns: RX errors: 0 dropped:27 overruns: Interconnect or IPC problems Applications: Oracle gc blocks lost Protocol processing:IP,UDP netstat –s Device Drivers Ifconfig -a NIC2 NIC1 Ports Queues Switch
Cluster-wide Impact of a Database File IO Problem ROOT CAUSE Node 2 Node 1 Disk Capacity Disk or Controller Bottleneck IO intensive Queries
Cluster-Wide Disk I/O Impact Node 1 Top 5 Timed EventsAvg %Total ~~~~~~~~~~~~~~~~~~ wait Call Event Waits Time(s)(ms) Time ------------------------------ ------------ ----------- ------ ------ log file sync 286,038 49,872 174 41.7 gc buffer busy 177,315 29,021 164 24.3 gc cr block busy 110,348 5,703 52 4.8 `` CAUSE: Expensive Query in Node 2 Causes IO bottleneck Node 2 1. IO on disk group containing redo logs is slow Load Profile ~~~~~~~~~~~~ Per Second --------------- Redo size: 40,982.21 Logical reads: 81,652.41 Physical reads: 51,193.37 2. Block shipping for frequently modified blocks is delayed by log flush IO 3. Serialization builds up
Log File Sync Latency: Causes and Symptoms Courtesy of Vinay Srihari, Oracle Server Technologies, Recovery
Causes of High Commit Latency • Symptom of Slow Log Writes • I/O service time spike may last only seconds or minutes • Threshold-based warning message in LGWR trace file • “Warning: log write elapsed time xx ms, size xxKB” • Dumped when write latency >= 500ms • Large log_buffer makes a bad situation worse. • Fixes • Smooth out log file IO on primary system and standby redo apply I/O pattern • Primary and Standby storage subsystem should be configured for peaks • Apply bug fixes in appendix Courtesy of Vinay Srihari, Oracle Server Technologies, Recovery
Block Server Process Busy or Starved Node 2 Node 1 ROOT CAUSE Too few LMSs LMS not in High Prio Memory Problems ( Swapping)
Block Server Process Busy or Starved Top 5 Timed Events Avg %Total ~~~~~~~~~~~~~~~~~~ wait Call Event Waits Time (s) (ms) Time Wait Class ------------------------------ - ----------- ----------- ------ ------ ---------- gc cr grant congested 26,146 28,761 1100 39.1 Cluster gc current block congested 13,237 13,703 1035 18.6 Cluster gc cr grant 2-way 340,281 12,810 38 17.4 Cluster gc current block 2-way 119,098 4,276 36 5.8 Cluster gc buffer busy 8,109 3,460 427 4.7 Cluster On remote note : Avg message sent queue time (ms): 16.1 “Congested” : LMS could not dequeue messages fast enough
Block Server Processes Busy • Increase # LMS based on • Occurrence of “congested” wait events • Heuristics: 75 – 80 % busy is ok • Avg send q time > 1ms • Caveat: # of CPUs should always be >= # of LMS to avoid starvation • On NUMA architectures and CMT • Bind LMS to NUMA board or cores in processor set • Fence off Hardware interrupts from the processor sets
High Latencies in Global Cache ROOT CAUSE
<Insert Picture Here> Transient Problems and Hangs