1 / 42

Diane Petersen ServerCare, Inc. Session 121

Good Afternoon. Diane Petersen, Sr. Oracle DBA, ServerCare, Inc.16 years Oracle experience, 3 years RACFinancial, High-Tech and Bio-Tech IndustriesIntended for everyone with basic knowledge of RAC. Today's Agenda. RAC Introduction, Architecture, Cache FusionConfiguration

alta
Download Presentation

Diane Petersen ServerCare, Inc. Session 121

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Diane Petersen ServerCare, Inc. Session #121

    2. Good Afternoon Diane Petersen, Sr. Oracle DBA, ServerCare, Inc. 16 years Oracle experience, 3+ years RAC Financial, High-Tech and Bio-Tech Industries Intended for everyone with basic knowledge of RAC

    3. Today’s Agenda RAC Introduction, Architecture, Cache Fusion Configuration & Monitoring Tuning Areas Best Practices Conclusion

    4. Objectives Identify areas requiring tuning How to obtain metrics How to resolve bottlenecks Cover tuning areas with the most benefit Only RAC specific items will be covered

    5. RAC Introduction RAC provides high availability, is flexible and scalable Shared database is accessible from all nodes in the cluster Runs on lower cost hardware such as Linux-based x86 Requires proper monitoring and tuning Capabilities and limitations should be understood

    6. Glossary Of Terms ADDM - Automatic Database Diagnostic Monitor, tuning advice AWR - Automatic Workload Repository, performance statistics Cache Fusion - Shares data in memory across nodes GCS - Global Cache Service, guarantees cache coherency GRD - Global Resource Directory, maps data in memory HBA – Host bus adapter, connects host to network and storage

    7. Glossary Of Terms – cont’d Interconnect - High speed, low latency private network Jumbo Frames - Network Maximum Transfer Unit (MTU) LMS - Lock Manager Service, transports blocks across nodes NIC Bonding - Logically combining 2 or more physical NICs UDP - User Datagram Protocol, supported for the Interconnect VIP - Virtual Internet Protocol, allows failover for high availability

    8. RAC Architecture Clustered nodes Interconnect network Shared storage

    9. Overview of Cache Fusion Major component of RAC Enables sharing of data in memory across nodes Performed by Lock Manager Service (LMS) Maintained in Global Resource Directory (GRD) Guarantees cache coherency, read consistency

    10. SGA Structure & Processes Details of Interconnect and Cache Fusion processes

    11. Configuration & Monitoring

    12. Interconnect Interconnect is non-routable, private network Dedicated switch, gigabit or faster Protocols UDP (RDS – new for use in 10.2.0.3 and higher) Typical bandwidth utilization Normal 20 – 30% Saturated >70%

    13. Verify Interconnect IP Addresses Ensure Interconnect IP is not using public network [oracle@rac1]$ oifcfg getif bond0 10.10.10.0 global cluster_interconnect eth0 172.16.150.0 global public [oracle@rac2]$ oifcfg getif bond0 10.10.10.0 global cluster_interconnect eth0 172.16.150.0 global public Database instance alert log posts Interconnect and protocol Query from the database: v$cluster_interconnects, v$configured_interconnects

    14. Network Statistics Use ifconfig -a Check configuration, RX & TX errors, overruns [oracle@rac2]$ /sbin/ifconfig -a bond0 Link encap:Ethernet HWaddr 00:11:25:A8:6C:35 inet addr:10.10.10.2 Bcast:10.10.10.3 Mask:255.255.255.252 . . . . . . . . . . RX packets:657830061 errors:0 dropped:0 overruns:0 frame:0 TX packets:527418621 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:579340506510 (539.5 GiB) TX bytes:430094970294 (400.5 GiB) eth0 Link encap:Ethernet HWaddr 00:11:25:A8:6C:34

    15. Network Packet Info by Protocol Use netstat –s Contains details of network with packet information [oracle@rac1]$ netstat -s Ip: . . . . Tcp: . . . . Udp: 137338287 packets received 7376 packets to unknown port received. 0 packet receive errors 148822392 packets sent Use the ping utility to determine packet loss and timing

    16. Verify Cluster Configuration Make sure cluster connection configuration is correct [oracle@rac1]$ cluvfy comp nodecon –n rac1 Verifying node connectivity... Checking node connectivity... Node connectivity check passed for subnet “10.10.10.0” with nodes(s) rac1. Node connectivity check passed for subnet “172.16.150.0” with node(s) rac1.

    17. System Monitoring CPU utilization – top, mpstat Disk I/O times – iostat Memory – free Kernel messages - /var/log/messages, /var/log/dmesg Obtain cluster statistics – crs_stat, srvctl

    18. Tuning

    19. General Stress test application on single instance database first Simulate I/O load (tools such as Orion) Modify OS parameters Modify Clusterware parameters Modify Database parameters

    20. AWR Report Global Cache Load Profile Global Cache Efficiency Percentages Messaging Statistics Consistent Read (CR) and Current Block Segments Concentrate on top 5 wait events

    21. Cache Fusion data block & messaging traffic Global Cache Load Profile ~~~~~~~~~~~~~~~~~~~~ Per Second Per Transaction ---------------- --------------------- Global Cache blocks received: 4.30 3.65 Global Cache blocks served: 23.44 19.90 GCS/GES messages received: 133.03 112.96 GCS/GES messages sent: 78.61 66.75 DBWR Fusion writes: 0.11 0.10 Est Interconnect traffic (KB) 263.20 Calculate Network Traffic from AWR report Network traffic received = Global Cache blocks received * DB block size = 4.3 * 8192 = .01 Mb/sec Network traffic generated = Global Cache blocks served * DB block size = 23.44 * 8192 = .20 Mb/sec

    22. Global Cache Efficiency Percentages Data blocks retrieved from local cache or remote instance Global Cache Efficiency Percentages (Target local+remote 100%)? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Buffer access - local cache %: 99.12 Buffer access - remote cache %: 0.75 Buffer access - disk %: 0.13 Messaging Statistics Statistics on messages sent Should be less than 1 millisecond Global Cache and Enqueue Services - Messaging Statistics ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Avg message sent queue time (ms): 0.4 Avg message sent queue time on ksxp (ms): 0.2 Avg message received queue time (ms): 0.0 Avg GCS message process time (ms): 0.0 Avg GES message process time (ms): 0.0 % of direct sent messages: 49.64 % of indirect sent messages: 24.73 % of flow controlled messages: 25.64

    23. Segments by CR Blocks Received -> Total CR Blocks Received: 329 -> Captured Segments account for 84.2% of Total CR Tablespace Subobject Obj. Blocks Owner Name Object Name Name Type Received %Total ---------- -------- ------------------- ---------- --------- ------------ ---------- PAYMENTECH DATA BATCH TABLE 90 27.36 SYS SYSTEM SMON_SCN_TIME TABLE 25 7.60 PAYMENTECH DATA IDX_BATCH_ORDER_ID INDEX 21 6.38 SYS SYSAUX SYS_IOT_TOP_8782 INDEX 16 4.86 SYS SYSAUX WRI$_ADV_PARAMETERS_ INDEX 16 4.86 Segments by Current Blocks Received -> Total Current Blocks Received: 2,667 -> Captured Segments account for 96.7% of Total CR Tablespace Subobject Obj. Blocks Owner Name Object Name Name Type Received %Total ---------- -------- ------------------- ---------- --------- ------------ ---------- SCECOMM DATA ACCOUNT_SERVICE TABLE 461 17.29 SCECOMM LDATA ACCOUNT TABLE 377 14.14 SCECOMM LDATA PAYMENT_INSTRUMENT_F TABLE 283 10.61 SCECOMM LDATA IDX_ACCT_EMAIL INDEX 211 7.91 SCECOMM LDATA PK_ACCOUNT_ID INDEX 191 7.16

    24. RAC Wait Events Broader category called Cluster Wait Class Characterized as Current or CR Current - blocks read into memory for the first time Consistent Read (CR) - denotes block for read access

    25. GC Current Block 2-way Occurs during cache fusion process Instance A requests block from master instance B If the block is available on B then it is sent to A

    26. GC Current Block 3-way Maximum three hops, not dependent on number of nodes in cluster Instance A requests block from master instance B B does not have block, directs to instance holding block or B directs request to disk

    27. GC CR/current block congested LMS not keeping up under heavy load Block transfer process delayed, indicates low CPU resources GC CR/current block busy Delay before block is sent, indicates write contention GC current grant busy Permission to access block is granted, but is blocked GC CR/current block request Placeholder event, active while waiting for a block More Global Cache Waits

    28. Block Access Cost Cost of retrieving the block, made up of the following: Message propagation delay Inter process CPU Block Server Load

    29. Block Access Latency Factors affecting request processing time: Operating System Oracle processing time Available Interconnect network throughput CPU load on other nodes

    30. Operating System Block latency related to CPU utilization LMS process is CPU intensive Typically one LMS for every 2 CPU’s Waits - GC CR/current block congested Apply OS and kernel patches

    31. I/O Capacity High I/O can be a result of: Node addition, increased usage, database size Bad queries Dissimilar disks within disk group Wait event “gc cr block busy” is an indicator Global Cache Consistent Read

    32. Best Practices

    33. General Ensure adequate resources on surviving nodes Benchmark cluster configuration Load test on single instance first Avoid serialization in application design Apply few changes at a time

    34. Network Use Jumbo Frames for Interconnect, increased MTU JF lowers CPU utilization, reduces bonding overhead Fewer frames needed for large I/O’s All components in network must support JF Monitor dropped packets, timeouts, buffer overflows, transmit and receive errors

    35. Hardware Redundancy - server, storage, network components Add HBA cards, switches, disk array controllers Load balance LUNs across HBA ports Enable hyperthreading at the OS level Use asynchronous I/O Set “aio-max-size” to 1,048,576, “aio-max-ns” to 56k

    36. Monitoring & Tuning Use OEM Database Control or Grid Control View overall system status, status of cluster, alert logs Monitor throughput across Interconnect Make decisions to add or redistribute resources Tune SQL plans and schemas for better optimization

    37. New Features

    38. New Features in 10gR2 & 11g FAN – Fast Application Notification, aware of current cluster configuration, connects only to instances able to respond ASM options – sysadm role, new ASMCMD commands AWM – Automatic Workload Manager, manages distribution for optimal performance, services restored onto surviving nodes Extended distance (stretch) clusters, physically separate CRS - Now provides HA for non-Oracle applications

    39. Conclusion

    40. Items Learned in this Session RAC databases are complex in nature Scalability, availability start with initial configuration Proper configuration is essential Monitoring and tuning requires RAC skills and knowledge DBA needs specialized training, experience

    41. Where to Find More Information Additional sessions here at Collaborate08 Plenty of information available on the internet Oracle Technology Network http://www.oracle.com/technology/index.html Ask me diane@servercare.com 1-888-918-6309

    42. Questions? Covered many RAC topics today Additional questions, please contact me diane@servercare.com 1-888-918-6309

    43. Session #121: RAC 11g Best Practices & Tuning Thank You! Please fill out evaluations! Congratulations, you’re done! The best way to receive feedback is via the evaluation forms. Make sure you ask the attendees to complete the forms. Provide your name, session name and session # for them to fill out on the form. Attendees or those who read your session from the web/CD may want to contact you with further questions; optionally you can provide your contact information.Congratulations, you’re done! The best way to receive feedback is via the evaluation forms. Make sure you ask the attendees to complete the forms. Provide your name, session name and session # for them to fill out on the form. Attendees or those who read your session from the web/CD may want to contact you with further questions; optionally you can provide your contact information.

More Related