410 likes | 507 Views
Modeling Retail Applications @ a Major Telecom Company Predictive Analysis in a Multi-Tier Infrastructure. John Slobodnik October 21, 2008 CMG Canada. Preparation for Modeling. Get an application infrastructure diagram. Turn on Solaris Process Accounting. Install TeamQuest Manager.
E N D
Modeling Retail Applications@ a Major Telecom CompanyPredictive Analysis in aMulti-Tier Infrastructure John Slobodnik October 21, 2008 CMG Canada
Preparation for Modeling • Get an application infrastructure diagram. • Turn on Solaris Process Accounting. • Install TeamQuest Manager. • Install TeamQuest View. • Gather Key Performance Indicator. • Perform Workload Characterization • Perform predictive analysis using TeamQuest Model.
Infrastructure Diagram • It is important to get this diagram to understand the infrastructure that this multi-tier application resides on. • Typically, an application support team is responsible for keeping these diagrams up-to-date.
Turn on Solaris Process Accounting • Turn on Solaris Process accounting. • Minimal additional CPU overhead since the data is already collected. • Allows short-running tasks to be captured for workload characterization. • Normally tasks <0.5 seconds get grouped. • Certain applications with thousands of short tasks are prime candidates for this extra level of accuracy.
Install TeamQuest Manager • Install TeamQuest Manager on at least one server from each tier of the application architecture. • At least one agent was installed in each of 4 tiers • Customize the TQ database on each server. • Changed retention of 10 minute data to 2 weeks. • Changed retention of 1 minute data to 1 week. • Deactivated reductions. • Requires Process Accounting turned on. • Keep process information for 7 days. • Created a silent install script to install the agent and customize the database. • Create a script to customize the database (using tqdbu) with the settings specified in the previous bullet. • Record the silent install script • Syntax • “install.sh –r silentinstallscriptnamehere tqmgr”
Install TeamQuest Manager • Create a specifications file backup for each TQ database daily. • Makes rebuilding the DB, in case of disaster, easier. • The command to create a specifications file called “productionDBspec” is: • teamquesthomedirectory/bin/tqdbu –o productionDBspec • The command to use the specifications file to recreate a new database is: • teamquesthomedirectory/bin/tqdbu –c productionDBspec • Put disk free space monitoring in place. • With process accounting on a lot of data was gathered on our Oracle server. • There was barely enough space to keep a week’s worth of data in the existing filesystem. • Alerts us when there is <20% free space in the filesystem used by the TQ DB.
Install TeamQuest View • TQ View was used to ensure consistent performance across each server. • This tells us that the workload is consistent and reliable to use for modeling. • Data for whole week was analyzed to come up with the best time frame to use for modeling.
Gather Key Performance Indicator • We asked the business what their key performance indicator (or main business driver metric) was. • They were tracking these sales numbers by hour in an Oracle database. • Using a customized SQL query. • Which you can turn into a “custom metric” and create historical reports against.
Workload Characterization • Purpose: To uniquely identify application-related work that runs on each server. A pre-requisite for modeling. • Used TeamQuest View to list all processes that run on each server. • Identified processes into unique workloads. • This is the most labor-intensive part of the whole exercise (can take days or weeks depending upon level of co-operation). • Requires co-operation of the application experts to help identify processes which belong to their application. • Try to keep the number of workloads to as small a number as possible. • Our goal was to create 2 workloads per server, one for the application-related work and OTHER. • Define the workload definitions using TeamQuest Manager. • On each server we created a new “Workload Set” containing a new “Workload” definition which uniquely identifies application-related activity. • Left the default “Example” workload set alone. • “Login =“ uniquely identified application-related work on our Web Services, authentication, WebLogic, and Oracle servers.
Using TeamQuest Model • The most important decision to make for modeling is “What timeframe do I use to base my model upon?”. • The answer varies upon the peak usage time of the application from both a system resource and business sales perspective. • I use a combination of busiest CPU, I/O and sales to come up with the timeframe to use. • This has worked successfully for me using a 1-hour timeframe to base my modelling upon (5 hour timeframe as well). • Stay away from “problem” times. • Then we apply a growth percentage to that timeframe which equates to what the business said the estimated peak volume would be at their busiest time of year. • We frame the growth % (LT & GT 50%). • If the model did not show any weakness in the infrastructure at 50% growth we created another model with enough growth applied to find a weakness.
Using TeamQuest Model • Outcome: • We have successfully identified the need for an additional Oracle node in the infrastructure. • Other outcomes have been: • Your infrastructure is sufficient to make it through this years peak period, however, once the growth from the current state hits 300% then the Web Services tier will be the bottleneck, addition of 2 additional servers of the same build is recommended prior to that time.
Select data to build the Model • Select “Generate Input File” servername
Select data to build the Model • Fill out time and date and click “Next” servername
Select data to build the Model • Confirm Workload Set, click “Next” servername
Select data to build the Model • Click “Create Model Input File” servername
Select data to build the Model • “Save” the file servername
Select data to build the Model • Choose a filename then save.
Select data to build the Model • Confirmation servername
TQ Model - Assumptions • TeamQuest was not installed on all the systems in the environment, so in absence of that data we assume : • External webservers – The 4 Sun servers are load balanced. • WebLogic tier – The 3 Sun servers are load balanced. The 2 Sun WebLogic instances performs twice the work as a single WebLogic instance on the larger Sun server. • Applications such as iPlanet, WebLogic, and Oracle are well instrumented. • The orders are coming from the External Webserver.
TQ Model - Findings, Recommendations & Results • Findings for multi-tier application environment: • The number of orders on mm/dd/yyyy from noon until 5 pm was n. • At 300% growth or nn orders from noon till 5 pm, the CPU in the UNIX web services iPlanet tier is maxed and the response time is significantly higher than for n orders, i.e. 382.4% higher. • Recommendations: • Add 2 additional nodes to the external web tier • Plan to add the additional servers in 2009. • Results: • TeamQuest time spent on Model = less than 2 hours
TQ View – CPU Utilization CPU utilization of all the systems: One day does not stand out as looking any different than any other day for CPU & I/O. So, we chose the afternoon mm/dd/yyyy, 12:00-17:00. We divided the work between application & non-application (workloads).
Frequency of Modeling • During peak time of year for the application and 6 months later (at a minimum). • Prior to and after any major hardware changes to the infrastructure. • After any major software changes to the infrastructure. • This can be changes to the application code. • Can also be vendor software version change. • New version of WebLogic. • New OS level. • Latest version of Oracle • These happen more frequently, it is not realistic (in my life) to re-do the exercise monthly.
John Slobodnik Performance & Capacity Planner Infrastructure & Technology CGI (905) 858-7100 ext. 7355 Mobile: (416) 729-8356 John.Slobodnik@cgi.com capacityguy@gmail.com