730 likes | 871 Views
11gR2 RAC/Grid Clusterware: Best Practices, Pitfalls, and Lessons Learned Presented during DOUG meeting held on 10/21/2010 at Dallas, TX. Ramkumar Rajagopal. I ntroduction. DBARAC is a specialty database consulting firm with expertise in a variety of industries based at Austin , Texas.
E N D
11gR2 RAC/Grid Clusterware: Best Practices, Pitfalls, and Lessons LearnedPresented during DOUG meeting held on 10/21/2010 at Dallas, TX Ramkumar Rajagopal
Introduction • DBARAC is a specialty database consulting firm with expertise in a variety of industries based at Austin, Texas. • Our people are experts in Oracle Real application clustered database focused solutions for managing large database systems. • We provide proactive database management services including but not limited to In-house, on-shore DBA support , remote DB support, database maintenance and backup and recovery. • Our DBA Experts provide specialized services in the areas of - • Root cause analysis • Capacity planning • Performance tuning, • Database migration and consolidation • Broad industry expertise • High-Availability RAC database specialists • End-to-end database support 11GR2 Grid clusterware
Introduction Presenter • Senior Database Consultant DBARAC • Oracle Database/Applications DBA since 1995 • Dell, JP Morgan Chase, Verizon • Presenter @ Oracle Open world 2007 • Author Dell Power Solutions articles 11GR2 Grid clusterware
AGENDA • Introduction • Node eviction issue in 10g • What is “11GR2 Grid Clusterware”? • The Challenges • What’s different today? • “We’ve seen this before, smart guy…” • Architecture and Capacity Planning • Upgrade Paths • Pre-installation best practices • Grid Clusterware Installation • Clusterware Startup sequence • Post Install steps • RAC Database build steps • Summary • Q&A 11GR2 Grid clusterware
Why a node is evicted? • Split brain condition • IO fencing • CRS keeps the lowest number node up • Node eviction detection 11GR2 Grid clusterware
Root Causes of Node Eviction • Network heartbeat lost • Voting disk problems • cssd is not healthy • Oprocd • Hang check timer • cssd and oclsomon race to suicide 11GR2 Grid clusterware
11GR2 Grid Clusterware Improvements • Node eviction algorithm is enhanced • Prevent a split-brain problem without rebooting the node • Oracle High Availability Services Daemon • Will still reboot in some cases • Faster relocation of services on node failure in 11GR2 11GR2 Grid Clusterware
9i/10g RAC Scenario • Several separate versions of databases • Several servers • Space/resource issues • Lesser resources • Provisioning takes time 11GR2 Grid clusterware
Top concerns • How many are using 11GR2 Grid clusterware? • Do you have more than one mission-critical databases within single RAC cluster? • Can you allocate resources dynamically to handle peak volumes of various application loads without downtime? • Issues on using shared infrastructure • Will my database availability and recovery suffer ? • Will my database performance suffer ? • How to manage a large clustered environment to meet sla’s for several applications? 11GR2 Grid clusterware
Why 11GR2 Grid CRS? • 11GR2 Grid Clusterware is… • An Architecture • An IT Strategy • Clusterware& ASM storage deployed together • Many, many Oracle Database Instances • Drives Consolidation 11GR2 Grid clusterware
Challenges • Skilled resources • Meeting SLA’s • End-to-end testing not possible • Security Controls • Capacity issues • Higher short-term costs 11GR2 Grid clusterware
What’s different today? • 11gR2 Grid CRS & ASM supports • 11GR2, 11GR1, 10gR1 and 10gR2 Single Instances • Powerful servers, 64Bit O/s • Provisioning Framework to deploy • Grid control 11GR2 Grid clusterware
11GR2 RAC DB Architecture Planning 11GR2 Grid clusterware
Capacity Planning • What are the current requirements? • What are the future growth requirements in the next 6-12months? • To meet the demand – estimate the hardware requirements • Data retention requirements • Archiving and purging 11GR2 Grid clusterware
Capacity Planning metrics • Database metrics for capacity planning • CPU & memory Utilization • I/O rates • Device utilization • Queue length • Storage utilization • Response time • Transaction rate • Network Packet loss • Network Bandwidth utilization 11GR2 Grid clusterware
Capacity Planning Strategy • Examine existing engagement processes • Examine existing capacity of servers/storage • Define Hardware/database scalability • Provisioning for adding capacity • Integration testing • Large clustered database • SLA requirements 11GR2 Grid clusterware
Comparison – 10g vs 11GR2 • Server consolidation • Database consolidation • Instance consolidation • Storage consolidation 11GR2 Grid clusterware
AGENDA so far… • Introduction • Node eviction issue in 10g • What is “11GR2 Grid clusterware”? • The Challenges • What’s different today? • “We’ve seen this before, smart guy…” • Architecture and Capacity Planning • Upgrade Paths • Pre-installation best practices • Grid Clusterware Installation • Clusterware Startup sequence • Post Install steps • RAC Database build steps • Summary • Q&A 11GR2 Grid clusterware
Upgrade Paths • Out-of-place clusterware upgrade • Rolling Upgrade • Oracle 10gR2 - from 10.2.0.3 • Oracle 11gR1 - from 11.1.0.6 11GR2 Grid clusterware
Pre-installation best practices • Network Requirements • Cluster Hardware Requirements • ASM Storage Requirements • Verification Checks 11GR2 Grid clusterware
Pre-Installation best practices Network Configuration • SCAN -Single Client Access Name • Failover - Faster relocation of services • Better Load balancing • MTU package size of Network Adapter (NIC) • Forwarder, zone entries and reverse lookup • Ping tests • Two dedicated interconnect switches for redundant interconnects • Run cluvfy 11GR2 Grid clusterware
Pre install - Network - SCAN Configuration 11GR2 Grid clusterware
Pre Install Network - SCANVIP Troubleshooting • SCAN Configuration: • $GRID_HOME/bin/srvctl config scan • SCAN Listener Configuration: • $GRID_HOME/bin/srvctl config scan_listener • SCAN Listener Resource Status: • $GRID_HOME/bin/crsctl stat res -w "TYPE = ora.scan_listener.type“ • $GRID_HOME/Listener.ora • Local and remote listener parameters 11GR2 Grid clusterware
Pre Install -Cluster Hardware requirements • Os/kernel same on all servers in the cluster • Minimum 32 GB of RAM • Minimum Swap space 16GB • Minimum Grid Home free space 16GB • For each Oracle Home directory allocate 32 GB of space (for each db -32GB) • Allocate adequate disk space for centralized backups • Allocate adequate storage for ASM diskgroups – DATA and FRA 11GR2 Grid clusterware
Cluster Hardware requirements continued… • Most cases: use UDP over 1 Gigabit Ethernet • For large databases - Infiniband/IP or 10 Gigabit Ethernet • Use OS Bonding/teaming to “virtualize” interconnect • Set UDP send/receive buffers high enough • Crossover cables are not supported 11GR2 Grid clusterware
Pre Install - ASM Storage configuration • In 11gR2 ASM diskgroups are used • Grid infrastructure - OCR, Voting disk and ASMspfile. • Database - DATA and FRA. • OCR and voting disks for Grid clusterware • OCR can now be stored in Automatic Storage Management (ASM). • Add Second diskgroup for ocr using • - ./ocrconfig -add +DATA02 • Change the compatibility of the new diskgroup to 11.2 as follows: • ALTER DISKGROUP DATA02 SET ATTRIBUTE ‘COMPATIBILITY.ASM’=’11.2’; • ALTER DISKGROUP DATA02 SETATTRIBUTE ‘COMPATIBILITY.RDBMS’=’11.2’; 11GR2 Grid clusterware
AGENDA so far… • Introduction • Node eviction issue in 10g • What is “11GR2 Grid clusterware”? • The Challenges • What’s different today? • “We’ve seen this before, smart guy…” • Architecture and Capacity Planning • Upgrade Paths • Pre-installation best practices • Grid Clusterware Installation • Clusterware Startup sequence • Post Install steps • RAC Database build steps • Q&A 11GR2 Grid clusterware
Hardware/Software details • 10gR2 architecture – 9 database servers, 25TB storage • Original Database Version: 10.2.0.5 • Original RAC cluster version : 10.2.0.1 • Original Operating System: Ret Hat Linux 5 As 64 Bit • Storage Type : ASM & RAW Storage • 11gR2 Grid architecture – 4 database servers, 40TB storage • New Database Version: 11.2.0.2 • New Grid Clusterware/ASM version: 11.2.0.2 • New Operating System : Ret Hat Linux 5 As 64Bit • Data migration steps using Rman backup and restore and data pump export dump files 11GR2 Grid Clusterware
11GR2 Migration Steps • Install 11gR2 Grid clusterware and Asm • Install 11gR2 database binaries for each database separately • Create the 11gR2 database • Add additional ASM diskgroups • Install 11GR1/10gR2 database binaries • Create 11GR1/10gR2 databases • Take backup • Restore the data 11GR2 Grid Clusterware
Pre-verification checks- cluvfy • Before Clusterware installation • ./cluvfy stage -pre crsinst -n node1,node2, node3 –verbose • Before Database installation • ./cluvfy stage -pre dbinst -n node1,node2, node3-fixup -verbose 11GR2 Grid Clusterware
11gR2 Grid Clusterware Installation – Step 1 11GR2 Grid Clusterware
Step-2 11GR2 Grid Clusterware
Step-3 11GR2 Grid Clusterware
Step-4 11GR2 Grid Clusterware
Step 5 11GR2 Grid Clusterware
Step 6 11GR2 Grid Clusterware
Step 7 11GR2 Grid Clusterware
Step 8 11GR2 Grid Clusterware
Step 8 cont.. 11GR2 Grid Clusterware
Step 9 11GR2 Grid Clusterware
Step 9 cont... 11GR2 Grid Clusterware
Step 10 11GR2 Grid Clusterware
Step 11 11GR2 Grid Clusterware
Step 11 cont.. 11GR2 Grid Clusterware
Step 12 11GR2 Grid Clusterware
Step 12 11GR2 Grid Clusterware
Step 13 11GR2 Grid Clusterware
Step 14 11GR2 Grid Clusterware
Runfixup.sh • root> /tmp/runfixup.sh • Response file being used is :/tmp/CVU_11.2.0.1.0_grid/fixup.response • Enable file being used is :/tmp/CVU_11.2.0.1.0_grid/fixup.enable • Log file location: /tmp/CVU_11.2.0.1.0_grid/orarun.log • Setting Kernel Parameters... • fs.file-max = 327679 • fs.file-max = 6815744 • net.ipv4.ip_local_port_range = 9000 65500 • net.core.wmem_max = 262144 • net.core.wmem_max = 1048576 • uid=501(grid)gid=502(oinstall)groups=502(oinstall), • 503(asmadmin),504(asmdba) 11GR2 Grid Clusterware
Step 15 11GR2 Grid Clusterware