420 likes | 658 Views
Good Afternoon. Diane Petersen, Sr. Oracle DBA, ServerCare, Inc.16 years Oracle experience, 3 years RACFinancial, High-Tech and Bio-Tech IndustriesIntended for everyone with basic knowledge of RAC. Today's Agenda. RAC Introduction, Architecture, Cache FusionConfiguration
E N D
1. Diane PetersenServerCare, Inc.Session #121
2. Good Afternoon Diane Petersen, Sr. Oracle DBA, ServerCare, Inc.
16 years Oracle experience, 3+ years RAC
Financial, High-Tech and Bio-Tech Industries
Intended for everyone with basic knowledge of RAC
3. Today’s Agenda RAC Introduction, Architecture, Cache Fusion
Configuration & Monitoring
Tuning Areas
Best Practices
Conclusion
4. Objectives Identify areas requiring tuning
How to obtain metrics
How to resolve bottlenecks
Cover tuning areas with the most benefit
Only RAC specific items will be covered
5. RAC Introduction RAC provides high availability, is flexible and scalable
Shared database is accessible from all nodes in the cluster
Runs on lower cost hardware such as Linux-based x86
Requires proper monitoring and tuning
Capabilities and limitations should be understood
6. Glossary Of Terms ADDM - Automatic Database Diagnostic Monitor, tuning advice
AWR - Automatic Workload Repository, performance statistics
Cache Fusion - Shares data in memory across nodes
GCS - Global Cache Service, guarantees cache coherency
GRD - Global Resource Directory, maps data in memory
HBA – Host bus adapter, connects host to network and storage
7. Glossary Of Terms – cont’d Interconnect - High speed, low latency private network
Jumbo Frames - Network Maximum Transfer Unit (MTU)
LMS - Lock Manager Service, transports blocks across nodes
NIC Bonding - Logically combining 2 or more physical NICs
UDP - User Datagram Protocol, supported for the Interconnect
VIP - Virtual Internet Protocol, allows failover for high availability
8. RAC Architecture Clustered nodes
Interconnect network
Shared storage
9. Overview of Cache Fusion Major component of RAC
Enables sharing of data in memory across nodes
Performed by Lock Manager Service (LMS)
Maintained in Global Resource Directory (GRD)
Guarantees cache coherency, read consistency
10. SGA Structure & Processes Details of Interconnect and Cache Fusion processes
11. Configuration &Monitoring
12. Interconnect Interconnect is non-routable, private network
Dedicated switch, gigabit or faster
Protocols UDP (RDS – new for use in 10.2.0.3 and higher)
Typical bandwidth utilization
Normal 20 – 30%
Saturated >70%
13. Verify Interconnect IP Addresses Ensure Interconnect IP is not using public network
[oracle@rac1]$ oifcfg getif
bond0 10.10.10.0 global cluster_interconnect
eth0 172.16.150.0 global public
[oracle@rac2]$ oifcfg getif
bond0 10.10.10.0 global cluster_interconnect
eth0 172.16.150.0 global public
Database instance alert log posts Interconnect and protocol
Query from the database:
v$cluster_interconnects, v$configured_interconnects
14. Network Statistics Use ifconfig -a
Check configuration, RX & TX errors, overruns
[oracle@rac2]$ /sbin/ifconfig -a
bond0 Link encap:Ethernet HWaddr 00:11:25:A8:6C:35
inet addr:10.10.10.2 Bcast:10.10.10.3 Mask:255.255.255.252
. . . . . . . . . .
RX packets:657830061 errors:0 dropped:0 overruns:0 frame:0
TX packets:527418621 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:579340506510 (539.5 GiB) TX bytes:430094970294 (400.5 GiB)
eth0 Link encap:Ethernet HWaddr 00:11:25:A8:6C:34
15. Network Packet Info by Protocol Use netstat –s
Contains details of network with packet information
[oracle@rac1]$ netstat -s
Ip: . . . .
Tcp: . . . .
Udp:
137338287 packets received
7376 packets to unknown port received.
0 packet receive errors
148822392 packets sent
Use the ping utility to determine packet loss and timing
16. Verify Cluster Configuration Make sure cluster connection configuration is correct
[oracle@rac1]$ cluvfy comp nodecon –n rac1
Verifying node connectivity...
Checking node connectivity...
Node connectivity check passed for subnet “10.10.10.0” with nodes(s) rac1.
Node connectivity check passed for subnet “172.16.150.0” with node(s) rac1.
17. System Monitoring CPU utilization – top, mpstat
Disk I/O times – iostat
Memory – free
Kernel messages - /var/log/messages, /var/log/dmesg
Obtain cluster statistics – crs_stat, srvctl
18. Tuning
19. General Stress test application on single instance database first
Simulate I/O load (tools such as Orion)
Modify OS parameters
Modify Clusterware parameters
Modify Database parameters
20. AWR Report Global Cache Load Profile
Global Cache Efficiency Percentages
Messaging Statistics
Consistent Read (CR) and Current Block Segments
Concentrate on top 5 wait events
21. Cache Fusion data block & messaging traffic Global Cache Load Profile
~~~~~~~~~~~~~~~~~~~~
Per Second Per Transaction
---------------- ---------------------
Global Cache blocks received: 4.30 3.65
Global Cache blocks served: 23.44 19.90
GCS/GES messages received: 133.03 112.96
GCS/GES messages sent: 78.61 66.75
DBWR Fusion writes: 0.11 0.10
Est Interconnect traffic (KB) 263.20
Calculate Network Traffic from AWR report
Network traffic received = Global Cache blocks received * DB block size = 4.3 * 8192 = .01 Mb/sec
Network traffic generated = Global Cache blocks served * DB block size = 23.44 * 8192 = .20 Mb/sec
22. Global Cache Efficiency Percentages
Data blocks retrieved from local cache or remote instance
Global Cache Efficiency Percentages (Target local+remote 100%)?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer access - local cache %: 99.12
Buffer access - remote cache %: 0.75
Buffer access - disk %: 0.13
Messaging Statistics
Statistics on messages sent
Should be less than 1 millisecond
Global Cache and Enqueue Services - Messaging Statistics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Avg message sent queue time (ms): 0.4
Avg message sent queue time on ksxp (ms): 0.2
Avg message received queue time (ms): 0.0
Avg GCS message process time (ms): 0.0
Avg GES message process time (ms): 0.0
% of direct sent messages: 49.64
% of indirect sent messages: 24.73
% of flow controlled messages: 25.64
23. Segments by CR Blocks Received
-> Total CR Blocks Received: 329
-> Captured Segments account for 84.2% of Total
CR
Tablespace Subobject Obj. Blocks
Owner Name Object Name Name Type Received %Total
---------- -------- ------------------- ---------- --------- ------------ ----------
PAYMENTECH DATA BATCH TABLE 90 27.36
SYS SYSTEM SMON_SCN_TIME TABLE 25 7.60
PAYMENTECH DATA IDX_BATCH_ORDER_ID INDEX 21 6.38
SYS SYSAUX SYS_IOT_TOP_8782 INDEX 16 4.86
SYS SYSAUX WRI$_ADV_PARAMETERS_ INDEX 16 4.86
Segments by Current Blocks Received
-> Total Current Blocks Received: 2,667
-> Captured Segments account for 96.7% of Total
CR
Tablespace Subobject Obj. Blocks
Owner Name Object Name Name Type Received %Total
---------- -------- ------------------- ---------- --------- ------------ ----------
SCECOMM DATA ACCOUNT_SERVICE TABLE 461 17.29
SCECOMM LDATA ACCOUNT TABLE 377 14.14
SCECOMM LDATA PAYMENT_INSTRUMENT_F TABLE 283 10.61
SCECOMM LDATA IDX_ACCT_EMAIL INDEX 211 7.91
SCECOMM LDATA PK_ACCOUNT_ID INDEX 191 7.16
24. RAC Wait Events Broader category called Cluster Wait Class
Characterized as Current or CR
Current - blocks read into memory for the first time
Consistent Read (CR) - denotes block for read access
25. GC Current Block 2-way Occurs during cache fusion process
Instance A requests block from master instance B
If the block is available on B then it is sent to A
26. GC Current Block 3-way Maximum three hops, not dependent on number of nodes in cluster
Instance A requests block from master instance B
B does not have block, directs to instance holding block or
B directs request to disk
27. GC CR/current block congested
LMS not keeping up under heavy load
Block transfer process delayed, indicates low CPU resources
GC CR/current block busy
Delay before block is sent, indicates write contention
GC current grant busy
Permission to access block is granted, but is blocked
GC CR/current block request
Placeholder event, active while waiting for a block More Global Cache Waits
28. Block Access Cost Cost of retrieving the block, made up of the following:
Message propagation delay
Inter process CPU
Block Server Load
29. Block Access Latency Factors affecting request processing time:
Operating System
Oracle processing time
Available Interconnect network throughput
CPU load on other nodes
30. Operating System Block latency related to CPU utilization
LMS process is CPU intensive
Typically one LMS for every 2 CPU’s
Waits - GC CR/current block congested
Apply OS and kernel patches
31. I/O Capacity High I/O can be a result of:
Node addition, increased usage, database size
Bad queries
Dissimilar disks within disk group
Wait event “gc cr block busy” is an indicator
Global Cache Consistent Read
32. Best Practices
33. General Ensure adequate resources on surviving nodes
Benchmark cluster configuration
Load test on single instance first
Avoid serialization in application design
Apply few changes at a time
34. Network Use Jumbo Frames for Interconnect, increased MTU
JF lowers CPU utilization, reduces bonding overhead
Fewer frames needed for large I/O’s
All components in network must support JF
Monitor dropped packets, timeouts, buffer overflows, transmit and receive errors
35. Hardware Redundancy - server, storage, network components
Add HBA cards, switches, disk array controllers
Load balance LUNs across HBA ports
Enable hyperthreading at the OS level
Use asynchronous I/O
Set “aio-max-size” to 1,048,576, “aio-max-ns” to 56k
36. Monitoring & Tuning Use OEM Database Control or Grid Control
View overall system status, status of cluster, alert logs
Monitor throughput across Interconnect
Make decisions to add or redistribute resources
Tune SQL plans and schemas for better optimization
37. New Features
38. New Features in 10gR2 & 11g FAN – Fast Application Notification, aware of current cluster configuration, connects only to instances able to respond
ASM options – sysadm role, new ASMCMD commands
AWM – Automatic Workload Manager, manages distribution for optimal performance, services restored onto surviving nodes
Extended distance (stretch) clusters, physically separate
CRS - Now provides HA for non-Oracle applications
39. Conclusion
40. Items Learned in this Session RAC databases are complex in nature
Scalability, availability start with initial configuration
Proper configuration is essential
Monitoring and tuning requires RAC skills and knowledge
DBA needs specialized training, experience
41. Where to Find More Information Additional sessions here at Collaborate08
Plenty of information available on the internet
Oracle Technology Network
http://www.oracle.com/technology/index.html
Ask me
diane@servercare.com
1-888-918-6309
42. Questions? Covered many RAC topics today
Additional questions, please contact me
diane@servercare.com
1-888-918-6309
43. Session #121: RAC 11g Best Practices & Tuning
Thank You!
Please fill out evaluations! Congratulations, you’re done! The best way to receive feedback is via the evaluation forms. Make sure you ask the attendees to complete the forms. Provide your name, session name and session # for them to fill out on the form. Attendees or those who read your session from the web/CD may want to contact you with further questions; optionally you can provide your contact information.Congratulations, you’re done! The best way to receive feedback is via the evaluation forms. Make sure you ask the attendees to complete the forms. Provide your name, session name and session # for them to fill out on the form. Attendees or those who read your session from the web/CD may want to contact you with further questions; optionally you can provide your contact information.