1 / 41

John Slobodnik October 21, 2008 CMG Canada

Modeling Retail Applications @ a Major Telecom Company Predictive Analysis in a Multi-Tier Infrastructure. John Slobodnik October 21, 2008 CMG Canada. Preparation for Modeling. Get an application infrastructure diagram. Turn on Solaris Process Accounting. Install TeamQuest Manager.

nigel-lang
Download Presentation

John Slobodnik October 21, 2008 CMG Canada

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modeling Retail Applications@ a Major Telecom CompanyPredictive Analysis in aMulti-Tier Infrastructure John Slobodnik October 21, 2008 CMG Canada

  2. Preparation for Modeling • Get an application infrastructure diagram. • Turn on Solaris Process Accounting. • Install TeamQuest Manager. • Install TeamQuest View. • Gather Key Performance Indicator. • Perform Workload Characterization • Perform predictive analysis using TeamQuest Model.

  3. Infrastructure Diagram • It is important to get this diagram to understand the infrastructure that this multi-tier application resides on. • Typically, an application support team is responsible for keeping these diagrams up-to-date.

  4. Infrastructure Diagram

  5. Turn on Solaris Process Accounting • Turn on Solaris Process accounting. • Minimal additional CPU overhead since the data is already collected. • Allows short-running tasks to be captured for workload characterization. • Normally tasks <0.5 seconds get grouped. • Certain applications with thousands of short tasks are prime candidates for this extra level of accuracy.

  6. Install TeamQuest Manager • Install TeamQuest Manager on at least one server from each tier of the application architecture. • At least one agent was installed in each of 4 tiers • Customize the TQ database on each server. • Changed retention of 10 minute data to 2 weeks. • Changed retention of 1 minute data to 1 week. • Deactivated reductions. • Requires Process Accounting turned on. • Keep process information for 7 days. • Created a silent install script to install the agent and customize the database. • Create a script to customize the database (using tqdbu) with the settings specified in the previous bullet. • Record the silent install script • Syntax • “install.sh –r silentinstallscriptnamehere tqmgr”

  7. Install TeamQuest Manager • Create a specifications file backup for each TQ database daily. • Makes rebuilding the DB, in case of disaster, easier. • The command to create a specifications file called “productionDBspec” is: • teamquesthomedirectory/bin/tqdbu –o productionDBspec •   The command to use the specifications file to recreate a new database is: • teamquesthomedirectory/bin/tqdbu –c productionDBspec • Put disk free space monitoring in place. • With process accounting on a lot of data was gathered on our Oracle server. • There was barely enough space to keep a week’s worth of data in the existing filesystem. • Alerts us when there is <20% free space in the filesystem used by the TQ DB.

  8. Customize TQ Database

  9. Customize TQ Database

  10. Install TeamQuest View • TQ View was used to ensure consistent performance across each server. • This tells us that the workload is consistent and reliable to use for modeling. • Data for whole week was analyzed to come up with the best time frame to use for modeling.

  11. TeamQuest View Data Analysis

  12. Gather Key Performance Indicator • We asked the business what their key performance indicator (or main business driver metric) was. • They were tracking these sales numbers by hour in an Oracle database. • Using a customized SQL query. • Which you can turn into a “custom metric” and create historical reports against.

  13. Gather Key Performance Indicator

  14. Workload Characterization • Purpose: To uniquely identify application-related work that runs on each server. A pre-requisite for modeling. • Used TeamQuest View to list all processes that run on each server. • Identified processes into unique workloads. • This is the most labor-intensive part of the whole exercise (can take days or weeks depending upon level of co-operation). • Requires co-operation of the application experts to help identify processes which belong to their application. • Try to keep the number of workloads to as small a number as possible. • Our goal was to create 2 workloads per server, one for the application-related work and OTHER. • Define the workload definitions using TeamQuest Manager. • On each server we created a new “Workload Set” containing a new “Workload” definition which uniquely identifies application-related activity. • Left the default “Example” workload set alone. • “Login =“ uniquely identified application-related work on our Web Services, authentication, WebLogic, and Oracle servers.

  15. Workload Characterization

  16. Workload Characterization

  17. Using TeamQuest Model • The most important decision to make for modeling is “What timeframe do I use to base my model upon?”. • The answer varies upon the peak usage time of the application from both a system resource and business sales perspective. • I use a combination of busiest CPU, I/O and sales to come up with the timeframe to use. • This has worked successfully for me using a 1-hour timeframe to base my modelling upon (5 hour timeframe as well). • Stay away from “problem” times. • Then we apply a growth percentage to that timeframe which equates to what the business said the estimated peak volume would be at their busiest time of year. • We frame the growth % (LT & GT 50%). • If the model did not show any weakness in the infrastructure at 50% growth we created another model with enough growth applied to find a weakness.

  18. Using TeamQuest Model • Outcome: • We have successfully identified the need for an additional Oracle node in the infrastructure. • Other outcomes have been: • Your infrastructure is sufficient to make it through this years peak period, however, once the growth from the current state hits 300% then the Web Services tier will be the bottleneck, addition of 2 additional servers of the same build is recommended prior to that time.

  19. Select data to build the Model • Select “Generate Input File” servername

  20. Select data to build the Model • Fill out time and date and click “Next” servername

  21. Select data to build the Model • Confirm Workload Set, click “Next” servername

  22. Select data to build the Model • Click “Create Model Input File” servername

  23. Select data to build the Model • “Save” the file servername

  24. Select data to build the Model • Choose a filename then save.

  25. Select data to build the Model • Confirmation servername

  26. TQ Model - Assumptions • TeamQuest was not installed on all the systems in the environment, so in absence of that data we assume : • External webservers – The 4 Sun servers are load balanced. • WebLogic tier – The 3 Sun servers are load balanced. The 2 Sun WebLogic instances performs twice the work as a single WebLogic instance on the larger Sun server. • Applications such as iPlanet, WebLogic, and Oracle are well instrumented. • The orders are coming from the External Webserver.

  27. TQ Model - Findings, Recommendations & Results • Findings for multi-tier application environment: • The number of orders on mm/dd/yyyy from noon until 5 pm was n. • At 300% growth or nn orders from noon till 5 pm, the CPU in the UNIX web services iPlanet tier is maxed and the response time is significantly higher than for n orders, i.e. 382.4% higher. • Recommendations: • Add 2 additional nodes to the external web tier • Plan to add the additional servers in 2009. • Results: • TeamQuest time spent on Model = less than 2 hours

  28. TQ View – CPU Utilization CPU utilization of all the systems: One day does not stand out as looking any different than any other day for CPU & I/O. So, we chose the afternoon mm/dd/yyyy, 12:00-17:00. We divided the work between application & non-application (workloads).

  29. TeamQuest Model – Systems/Tier

  30. TeamQuest Model –Response Time with 300% growth applied

  31. TQ Model – CPU Utilization by Workload

  32. Active Resource utilization on web tier

  33. Active Resource Utilization on Web tier

  34. Components of Response–3 DB nodes

  35. What if we add 2 servers to the external web server tier?

  36. What if we model the external web server on its own?

  37. Frequency of Modeling • During peak time of year for the application and 6 months later (at a minimum). • Prior to and after any major hardware changes to the infrastructure. • After any major software changes to the infrastructure. • This can be changes to the application code. • Can also be vendor software version change. • New version of WebLogic. • New OS level. • Latest version of Oracle • These happen more frequently, it is not realistic (in my life) to re-do the exercise monthly.

  38. John Slobodnik Performance & Capacity Planner Infrastructure & Technology CGI (905) 858-7100 ext. 7355 Mobile: (416) 729-8356 John.Slobodnik@cgi.com capacityguy@gmail.com

More Related