300 likes | 404 Views
Oracle 10g RAC Scalability – Lessons Learned. Bert Scalzo, Ph.D. Bert.Scalzo@Quest.com. About the Author. Oracle Dev & DBA for 20 years, versions 4 through 10g Worked for Oracle Education & Consulting Holds several Oracle Masters (DBA & CASE) BS, MS, PhD in Computer Science and also an MBA
E N D
Oracle 10g RAC Scalability – Lessons Learned Bert Scalzo, Ph.D. Bert.Scalzo@Quest.com
About the Author • Oracle Dev & DBA for 20 years, versions 4 through 10g • Worked for Oracle Education & Consulting • Holds several Oracle Masters (DBA & CASE) • BS, MS, PhD in Computer Science and also an MBA • LOMA insurance industry designations: FLMI and ACS • Books • The TOAD Handbook (March 2003) • Oracle DBA Guide to Data Warehousing and Star Schemas (June 2003) • TOAD Pocket Reference 2nd Edition (June 2005) • Articles • Oracle Magazine • Oracle Technology Network (OTN) • Oracle Informant • PC Week (now E-Magazine) • Linux Journal • www.Linux.com
About Quest Software Used in this paper
Project Formation • This paper is based upon collaborative RAC research efforts between Quest Software and Dell Computers. • Quest: • Bert Scalzo • Murali Vallath – author of RAC articles and books • Dell: • Anthony Fernandez • Zafar Mahmood • Also an extra special thanks to Dell for allocating a million dollars worth of equipment to make such testing possible
Project Purpose • Quest: • To partner with a leading hardware vendor • To field test and showcase our RAC enabled software • Spotlight on RAC • Benchmark Factory • TOAD for Oracle with DBA module • Dell: • To write a Dell Power Edge Magazine article about the OLTP scalability of Oracle 10g RAC running on typical Dell servers and EMC storage arrays • To create a standard methodology for all benchmarking of database servers to be used for future articles and for lab testing & demonstration purposes
OLTP Benchmarking • TPC benchmark (www.tpc.org) • TPC Benchmark™ C (TPC-C) is an OLTP workload. It is a mixture of read-only and update intensive transactions that simulate the activities found in complex OLTP application environments. It does so by exercising a breadth of system components associated with such environments, which are characterized by: • • The simultaneous execution of multiple transaction types that span a breadth of complexity • • On-line and deferred transaction execution modes • • Multiple on-line terminal sessions • • Moderate system and application execution time • • Significant disk input/output • • Transaction integrity (ACID properties) • • Non-uniform distribution of data access through primary and secondary keys • • Databases consisting of many tables with a wide variety of sizes, attributes, and relationships • • Contention on data access and update Excerpt from “TPC BENCHMARK™ C: Standard Specification, Revision 3.5”
Create the Load - Benchmark Factory The TPC-C like benchmark measures on-line transaction processing (OLTP) workloads. It combines read-only and update intensive transactions simulating the activities found in complex OLTP enterprise environments.
Setup Planned vs. Actual • Planned: • Redhat 4 Update 1 64-bit • Oracle 10.2.0.1 64-bit • Actual: • Redhat 4 Update 1 32-bit • Oracle 10.0.1.4 32-bit • Issues: • Driver problems with 64-bit (no real surprise) • Some software incompatibilities with 10g R2 • Known ASM issues require 10.0.1.4, not earlier
Sweet Spot Lessons Learned • Cannot solely rely on BMF transactions per second graph • Can still be increasing throughput while beginning to trash • Need to monitor database server with vmstat and other tools • Must stop just shy of bandwidth challenges (RAM, CPU, IO) • Must factor in multi-node overhead, and reduce accordingly • Prior to 10g R2, better to rely on app (BMF) load balancing • If you’re not careful on this step, you’ll run into roadblocks which either invalidate your results or simply cannot scale!!!
Some Speed Bumps Along the Way As illustrated below when we reached our four node tests we did identify that CPU’s on node racdb1 and racdb3 reached 84% and 76% respectively. Analyzing the root cause of the problem it was related to temporary overload of users on these servers, and the ASM response time.
Smooth Sailing After That As shown below, the cluster level latency charts from Spotlight on RAC during our eight node run. This indicated that the interconnect latency was well within expectations and in par with any industry network latency numbers.
Full Steam Ahead! As shown below, ASM was performing excellently well at this user load. 10 instances with over 5000 users indicated an excellent service time from ASM, actually the I/O’s per second was pretty high and noticeably good - topping over 2500 I/O’s per second!
Final Results Other than some basic monitoring to make sure that all is well and the tests are working, there’s really not very much to do while these tests run – so bring a good book to read. The final results are shown below.
Projected RAC Scalability Using the 6 node graph results to project forward, the figure below shows a reasonable expectation in terms of realizable scalability – where 17 nodes should equal nearly 500 TPS and support about 10,000 concurrent users.
Next Steps … • Since first iteration of test we were limited by memory, we upgraded each database server from 4 to 8 GB RAM • Now able to scale up to 50% more users per node • Now doing zero percent paging and/or swapping • But – now CPU bound • Next step, replace each CPU with a dual-core Pentium • Increase from 4 CPU’s (2-real/2-virtual) to 8 CPU’s • Should be able to double users again ??? • Will we now reach IO bandwidth limits ??? • Will be writing about those results in future Dell articles…
Questions … Thanks for coming