1 / 22

Oracle Grid Computing: Trending for Capacity Planning

Ashish Rege SEI Session # S307772. Oracle Grid Computing: Trending for Capacity Planning. What DBA’s, SYS admin’s need. Database Administrators, Sys Administrators are called on everyday to make decisions on capacity planning.

poppy
Download Presentation

Oracle Grid Computing: Trending for Capacity Planning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ashish RegeSEISession # S307772 Oracle Grid Computing: Trending for Capacity Planning

  2. What DBA’s, SYS admin’s need • Database Administrators, Sys Administrators are called on everyday to make decisions on capacity planning. • Which database has trended high for CPU, Memory, I/O etc. over the last year, month, week, day versus other databases on the same server? • Do these trends have some unique cyclical patterns based on year, month, week, day for different applications? • Do we see new functionality, additional users, increased concurrency with time? • Do we have the capacity to add one or more database to existing servers? • Where do we re-allocate databases at the next opportunity and where do we see capacity? • Do we need to buy new servers and factor this into next year’s budget?

  3. Approach and Toolset • Top down and Bottom up approach & edge at the this sideways too .. • SA-DBA-Middleware cross functional analysis • Toolset that can help trend CPU/Memory/Elapsed Time • OS extended stats • OEM repository DB/Host performance metrics • Oracle Services stats • App-DB-Integration-elapsed time/time-out metrics • Apache Logs fact-dimensional model • Batch Job-stream runtime metrics • Business measures e.g. positions, txn counts, tax lots, fees etc • Fill in the gaps with data points to construct a complete picture

  4. Requirements summary • Database administrators and sys administrators need • consolidated data points to provide holistic • "Capacity Trend Analysis" • from an intraday to multiyear via a web interface for the databases and their host servers from CPU, Memory, I/O, wait bottlenecks, throughput, and efficiencies perspective. • Stack up trends for different database per server on the same graph to enable side-by-side comparison and a better awareness of percentage usage by database/application and its alignment with the business cycle. • Help trend uptick or downward spiral of business measures and find correlation of those with database and server statistics to model what if analysis • end-to-end trending for • UI response times • Batch jobs • Processed Business measures

  5. Now to the details • OEM grid captures at a 15 minutes interval, write PERL scripts to transfer this to a trending utility that graphs those on the web with reporting • At the grain of intraday to multiyear for performance counters like consumption of CPU, memory, I/O, throughput, efficiency, wait stats etc. for the different databases. • Capture metadata into flat files • (1) Details per database • (2) Comparative numbers across different databases on the same or different servers • (3) Supplement the above with details from stats pack

  6. Putting it all together

  7. Reporting Classifications .. • Thus the reporting that comes out of the OEM is at two levels • Database statistics • Host statistics, this also includes side-by-side Oracle comparison for databases on that host • The above statistics have associated flat files extracted out of OEM to feed this data to Orca/RRDTool to generate the graphs/trend. • Have found these graphs very useful in • Re-alignment decisions • Budget discussions around shared infrastructure resources for different cost centers

  8. Orca/RRD Tool Orca/RRD Tool is a tool useful for plotting arbitrary data from text files onto a directory on a Web server. It has the following features: files into the same or different plots. Creates an HTML tree of HTML and image (PNG or GIF) files. Creates an index of URL links listing all available targets. Creates an index of URL links listing all different plot types. No separate CGI set up required. Can be run under cron or it can sleep itself waiting for file updates based on when the file was last updated. Configuration file based. Reads arbitrarily formatted text or binary data files. Watches data files for updates and sleeps between reads. Finds new files at specified times. Remembers the last modification times for files so they do not have to be reread continuously. Allows arbitrary grouping of data from different sources Allows arbitrary math performed on data read from one file

  9. Orca architecture • Out of the box Orca statistics for the host • On clients, orcallator.se, a component of the SE Toolkit, collects data every 5 minutes and dumps the data in orca's home directory. Orcallator.se has a startup script in /etc/init.d. This data is dumped in /home/orca/data/<hostname> • A crontab entry for the orca user polls each client every 5 minutes, grabs the current data, and prunes old data. This is done via an SCP from orca's crontab on the server, and is driven by a list of hosts on the orca server at /apps/orca/xfer/hostlist. •  scp -r -p -v orca@$host:/home/orca/data/* /apps/orca/orcallator  • Orca's crontab on the orca server calls /apps/orca/xfer/pull.sh which processes all files in /apps/orca/xfer/hostlist

  10. Out of box OS performance Statistics • The Orca server app which simply checks a directory tree (/apps/orca/orcallator) every five minutes and graphs any new data which has appeared there • Then, as orca on the orca server does an ssh to the new client ssh <client>" and when prompted, adds to known hosts, unless this is done for each client added, no data transfer can succeed. • Then, add the client to the file /apps/orca/xfer/hostlist. This file contains a list of all clients and is used by the script 'pull.sh', driven by orca's crontab, included here. Orca's crontab also prunes some files on each client. Clients are completely passive, orcallator.se dumps performance data in orca's home directory on each client, the server collects it, processes it. • Orca produces HTML files, along with "index.html". A standard Apache server is required to publish the pages. • The url for viewing orca is: http://<orac server IP>/orca/

  11. OEM Custom statistics • Extract performance data out of OEM repository into flat files, via a custom script running at an hourly frequency. These flat files are dropped into the same directory which Orca checks and processes data from to plot graphs (/apps/orca/orcallator). Thus data is generated right off the Orca server connecting remotely to the OEM repository. • Thus any data can be plotted including business measures. It is that simple • Using the DBI::Oracle library to query the OEM GRID Repository; joining tables like sysman.MGMT_TARGETS, sysman.MGMT_METRICS, sysman. MGMT_METRICS _1HOUR extracting data out of into files. • A timestamp is written to a snapfile that tracks time intervals already queried and reported on; the next run is based on querying the sysman repository table to find snapshots with rollup_timestamp greater than the last run • e.g. 2008-11-12_19:00:00

  12. Quarterly Instance Efficiency

  13. Comparative Quarterly Instance CPU Utilization The sample compares Waits – User CPU, User I/O, Other between different databases GWMPE02, E2L1, E2L2, EC02, ER02 across the DB server seidevdb34  E02 is the max consumer, at times taking 50% or more of CPU compare to other databases which have negligible CPU consumption.

  14. SNAPFILE Snapfile helps to track last extraction time of performance data. pwd /export/home/oracle/orca_oracle/snapfile  oracle@seieaas52z1# tail -f <DBNAME>.seic.com_Oem_snapid 2009-03-10_12:00:00 2009-03-10_14:00:00 2009-03-10_16:00:00 2009-03-10_18:00:00 2009-03-10_20:00:00 oracle@seieaas52z1# tail -f <SERVERNAME>_OemHost_snapid 2009-03-10_12:00:00 2009-03-10_14:00:00 2009-03-10_16:00:00 2009-03-10_18:00:00 2009-03-10_20:00:00

  15. OEM SQL QUERY HOST & DB PERFORMANCE DATA # Part 1: Decide on beginning and ending snapshots OEM repository query "select min(to_char (d.rollup_timestamp ,'YYYY-MM-DD_HH24:MI:SS')) FROM sysman.mgmt_targetstgt , sysman.mgmt_metrics met , sysman.mgmt_metrics_1hour d WHERE lower(tgt.target_name) = lower(?) AND tgt.target_type ='oracle_database' AND tgt.target_guid = d.target_guid AND met.metric_guid = d.metric_guid AND d.rollup_timestamp > to_date(?,'YYYY-MM-DD_HH24:MI:SS') "

  16. OEM SQL QUERY HOST & DB PERFORMANCE DATA # Part 2: Get the latest snapshot and it's time "select min(to_char ( d.rollup_timestamp ,'YYYY-MM-DD_HH24:MI:SS' )) ,max(to_char ( d.rollup_timestamp ,'YYYY-MM-DD_HH24:MI:SS' )) FROM sysman.mgmt_targetstgt , sysman.mgmt_metrics met , sysman.mgmt_metrics_1hour d WHERE lower(tgt.target_name) = lower(?) AND tgt.target_type ='oracle_database' AND tgt.target_guid = d.target_guid AND met.metric_guid = d.metric_guid "

  17. OEM SQL QUERY HOST & DB PERFORMANCE DATA # Part 3: Collect the raw values from OEM for DB stats "SELECT DISTINCT met.metric_column ,d.key_value ,d.value_average : FROM sysman.mgmt_targets tgt , sysman.mgmt_metrics met , sysman.mgmt_metrics_1hour d WHERE lower(tgt.target_name) = lower(?) AND tgt.target_type ='oracle_database' AND tgt.target_guid = d.target_guid AND met.metric_guid = d.metric_guid AND d.rollup_timestamp = to_date(?,'YYYY-MM-DD_HH24:MI:SS') ORDER BY d.rollup_timestamp,met.metric_column"

  18. OEM SQL QUERY HOST & DB PERFORMANCE DATA # Part 4: Collect the raw values from OEM for HOSTS stats  "SELECT DISTINCT met.metric_column ,d.key_value ,d.value_average . ..FROM sysman.mgmt_targets tgt, sysman.mgmt_metrics met, sysman.mgmt_metrics_1hour d WHERE lower(tgt.target_name) = lower(?) AND tgt.target_type ='host’ AND tgt.target_guid = d.target_guid AND met.metric_guid = d.metric_guid AND met.metric_column IN ('cpuLoad’,'cpuLoad_15min’,'cpuLoad_1min’,'longestServ’,'cpuIOWait’,'cpuKernel' ,'cpuUser’,'cpuUtil’,'memUsedPct’,'memfreePct’,'swapUtil’,'noOfProcs’,'noOfUsers’,'totIO’,'pgScanRate’) AND d.rollup_timestamp = to_date(?,'YYYY-MM-DD_HH24:MI:SS')

  19. Orcallator.cfg OEM group group oem { find_files /apps/orca/oem/(.*)/(?:oracle)-\d{4}-\d{2}-\d{2}(?:-\d{3,})?(?:\.(?:Z|gz|bz2))? column_description first_line date_source column_name timestamp interval 3600 filename_compare sub { my ($ay, $am, $ad) = $a =~ /-(\d{4})-(\d\d)-(\d\d)/; my ($by, $bm, $bd) = $b =~ /-(\d{4})-(\d\d)-(\d\d)/; if (my $c = (( $ay <=> $by) || ( $am <=> $bm) || (($ad >> 3) <=> ($bd >> 3)))) { return 2*$c; } $ad <=> $bd; } }

  20. Orcallator.cfg SERVER grouping group <SERVER1> { find_files /apps/orca/oemhost/( SERVER1)/(?:(?: SERVER1)|(?:percol))-\d{4}-\d{2}-\d{2}-\d{3} column_description first_line date_source column_name timestamp interval 3600 filename_compare sub { my ($ay, $am, $ad) = $a =~ /-(\d{4})-(\d\d)-(\d\d)/; my ($by, $bm, $bd) = $b =~ /-(\d{4})-(\d\d)-(\d\d)/; if (my $c = (( $ay <=> $by) || ( $am <=> $bm) || (($ad >> 3) <=> ($bd >> 3)))) { return 2*$c; } $ad <=> $bd; } }

  21. Services trending • Service performance in: • V$SERVICE_STATS • V$SERVICE_EVENT • V$SERVICE_WAIT_CLASS • V$SERVICEMETRIC • V$SERVICEMETRIC_HISTORY

  22. Items Learned in this Session • Database administrators and sys administrators need consolidated data points to provide holistic "Capacity analysis" for the databases and their host servers from CPU, Memory, I/O, wait bottlenecks, throughput, and efficiencies perspective. • This presentation outlined a methodology that helps with "Capacity Trend Analysis" from an intraday to multiyear via a web interface; the key theme here is the ability to stack up trends for different databases per server on the same graph to enable side-by-side comparison and a better awareness of percentage usage by database/application and its alignment with the business cycle.

More Related