1 / 56

Seeing the Forest AND the Trees: Capacity Planning for a Large Number of Servers

Seeing the Forest AND the Trees: Capacity Planning for a Large Number of Servers. Linwood Merritt Capital One Services, Inc. linwood.merritt@capitalone.com. Introduction: Environment. Capital One 4th largest card issuer in the United States Capital One to S&P 500 in 1998

reidar
Download Presentation

Seeing the Forest AND the Trees: Capacity Planning for a Large Number of Servers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Seeing the Forest AND the Trees: Capacity Planning for a Large Number of Servers Linwood Merritt Capital One Services, Inc. linwood.merritt@capitalone.com page 1

  2. Introduction: Environment • Capital One • 4th largest card issuer in the United States • Capital One to S&P 500 in 1998 • Fortune 500 company starting in 2000 • Managed loans at $71.2 billion as of Q4 2003 • Accounts at 47.0 million as of Q4 2003 • CIO 100 Award “Master of the Customer Connection” • Information Week “Innovation 100” Award Winner • ComputerWorld “Top 100 places to work in IT” page 2

  3. Categories of Issues • Acquiring and using business data • Exception detection • Platform types and operating systems • Data capture analysis • Organization and reporting of server structure • Bulk Capacity Planning • Business driver based forecasting • Visualization techniques page 3

  4. Acquiring Business Data • Chaney, Bob, “The Capacity Performance Council, Start Yours Today,” CMG1999 Proceedings • Chaney, Bob, “Divide and Conquer: Implementing the Capacity Performance Council in Pieces,” CMG2001 Proceedings • Merritt, Linwood, “A Capacity Planning Partnership with the Business,” CMG2002 Proceedings and UKCMG 2003 page 4

  5. Business Data Inputs • “Pull” • Capacity impact forms • Interview meetings • Phone calls • e-mails • “Push” • Capacity Councils page 5

  6. Capacity Councils • Cross-organizational structure • Purpose: Bring together business and technical views of applications and supporting technologies. • Evolution: Single Capacity Council, multiple Capacity Councils along business lines, multiple Technical Councils along types of platforms (mainframe “MVS,” Unix, NT, etc.) page 6

  7. Capacity Council Deliverables • Monthly meeting • Evaluation of Business Area Capacity Status (“stoplight” green/yellow/red color) • Business driver mapping to servers • Monthly report page 7

  8. “Stoplight” Status • Green: Little concern about hitting capacity constraints in the next 6 months • Yellow: Concern about capacity in the next 4-6 months • Red: Very high concern about not meeting capacity needs in the next three months page 8

  9. Capacity Planners’ Responsibilities CapacityPlanners PlannerA PlannerB PlannerC PlannerD PlannerE PlannerF Clubhouse Mgmt Fairway shots Mashies Chip Mgmt Putt Mgmt Foursome Pitch Mgmt Tee shot CapacityCouncils page 9

  10. Business Drivers • Capacity Councils: Business units responsible for capacity planning of “demand” side • Capacity Planners: Build “supply side” projections based on business drivers and historical trending page 10

  11. Business Driver Based Forecasting • Map business drivers to servers. • Use historical data to correlate. • Use business driver projections to build forecast. • Technical approaches (Spreadsheet and SAS) • Exponential forecast comparison • Multivariate regression page 11

  12. Business Driver and Resource Regression =FORECAST(Cx,$B$3:$B$26, $C$3:$C$26) page 12

  13. Combination of Business Driver and Resource Date-Based Projections page 13

  14. Forecast Using Date and Business Driver =FORECAST(A27,$B$7:$B$26,$A$7:$A$26)*C27/FORECAST(A27,$C$7:$C$26,$A$7:$A$26) page 14

  15. SAS Regression proc reg noprint data=forecast outest=regdata tableout ; by machine shift; model cpureg = %CtReg / selection=rsquare noint; page 15

  16. Actual vs. Projected page 16

  17. Individual Actual vs. Projected Analysis page 17

  18. Actual vs. Projected Graphical Analysis page 18

  19. Aggregate Actual vs. Projected (Relative) page 19

  20. Aggregate Actual vs. Projected (Average Actual vs. Average Projected) page 20

  21. Aggregate Actual vs. Projected (Absolute) page 21

  22. Exception Detection • Statistical Analysis (standard deviations from mean) • Exception Detection System developed by Igor Trubin, Ph.D., of Capital One • “Exception Detection System, Based on the Statistical Process Control Concept,” CMG2001 • “Global and Application Level Exception Detection System, Based on MASF Technique,” CMG2002 and UKCMG 2003 • Reporting: E-mail, web reports page 22

  23. Statistical Process Control page 23

  24. Exception Detection E-Mail Exception Detection Report for 05/02 _____________________________________________________ CPU_Utilization exception unix/unisys/tandem/MVS 5 boxes list: ServerA ServerC ServerE ServerL ServerZ _____________________________________________________ CPU_Utilization NULL DATA unix/unisys/tandem/MVS 0 boxes list: _____________________________________________________ CPU_Utilization insufficient DATA unix/unisys/tandem/MVS 1 boxes list: ServerG =============================================== CPU utilization was greater than 50% yesterday for: ServerA ServerD page 24

  25. Exception Reporting page 25

  26. Platform Types • “MVS” mainframe • “Flavors” of Unix • NT servers • Non- “standard” such as Unisys, Tandem, HP3000, native commands, etc. • Different data formats and locations page 26

  27. Level of DetailMainframe • Global (by Sysplex) • Partition (LPAR) • Service Class • SMF (use job names and account codes) • Hardware may be shared among workloads and business units. page 27

  28. Level of DetailDistributed • Global (by server) • Application or Workload • Assigned within data collection product • Assigned within Capacity/Performance database code • Processes by descending CPU% • Overall view of utilization (for consolidation opportunities) page 28

  29. Integrated Products • Products with integrated reporting capabilities • Extract data, port to the existing Performance Database system. • Interface directly with the product databases (e.g. with SAS ODBC). • Link web HTML pages to product graphs. page 29

  30. 5 or 10 Min Samples Sequential File(s) Hourly CPU trace(if available) Web Files Multiple Platform Types Non-Standard Platforms “MVS” Mainframe Native Commands Product Extract Product Database Sequential File(s) Sequential Files NRJE Hourly Sum- maries Capacity Bridge Network ftp Server ftp ftp Capacity Mainframe Remote Server with Product PDBs Data Product Database Extract ftp Sequential Files Sequential File(s) Product Database Web Files ftp Capacity Programs ftp Workstation Graphics (Other Remote Platforms) SNMP Performance Data Reports Html and Graphics Files LAN / WAN page 30

  31. Data Capture Analysis • Operational side to Capacity Planning: Automation of data collection, performance database population, and report creation • Automated process to check the successful and timely completion of each step of the process • Included in exception analysis mechanism • Ongoing tuning effort as complexity and volume increases page 31

  32. Organization and Reporting of Server Structure • Database of server characteristics and assignments • Business unit classifications • Applications • Capacity Planner assignments • Configuration details • Status color codes page 32

  33. Server Database page 33

  34. Use of Server Database • Central repository of capacity information • Data source to build browser pages on the company’s Intranet • Color-coded view of servers by business area and application. page 34

  35. Matrix-Based Reporting page 35

  36. Application Mapping and Color Coding page 36

  37. Bulk Capacity Planning • Analyze large number of servers in a single pass. • Import measured and trended CPU utilization of each server. • Assign servers to business areas. • Allow assignment of business drivers (with growth rates) to each server • Allow assignment of upgrade thresholds to each server. page 37

  38. “Bulk Capacity Planning” Projections • For each server, calculate the month where projected CPU utilization crosses the upgrade threshold, for three growth rates. • Use “conditional formatting” to flag server upgrade dates as red or yellow if the date is of concern. page 38

  39. Calculation of #Years Before Upgrade Threshold = Base * (1+AnnualGrowth) (#Years) Log(Threshold) = Log(Base) + (#Years) * log(1+AnnualGrowth) (#Years) = ( Log(Threshold) - Log(Base) ) / Log(1+AnnualGrowth) page 39

  40. Bulk Capacity Planning Spreadsheet page 40

  41. Visualization Techniques • Different views of the same data • Web-based (HTML and Java) reports • “Stoplight” (green/yellow/red) coded status • Overlay presentation of trends and forecasts • “Thumbnail” charts with drilldown capabilities page 41

  42. Visualization Techniques (Continued) • Automatic generation of static HTML • Dynamic HTML (CGI bin, web portal) • Representation of servers as color-coded rectangles on a single page, where the area of each rectangle represents its capacity rating. page 42

  43. Different Views of Same Data • Servers can appear more than once (multiple applications and assignments). • “Production” vs. “All” • “All Departments” (no duplicates) vs. “By Department” page 43

  44. HTML with Hyperlinks <I><P ALIGN="CENTER">Table of Contents</P> </I><B><DIR> <DIR><FONT SIZE=3> <P><A HREF="#Intro">Introduction</A></P> ………………………………………………………………. <P><A HREF="#DASD">Storage Analysis</A></P> ………………………………………………………………. <A NAME="DASD"></A> <FONT FACE="Arial" SIZE=5><B><P ALIGN="CENTER">Storage Analysis</P> </B></FONT><FONT FACE="Arial" SIZE=3> <P ALIGN="JUSTIFY">Presented below are workload-based DASD space projections. Additional detail can be found in the <A HREF="#DASDPRF">DASD Workload Profile</A>.</P> <P ALIGN=CENTER><IMG SRC="dasd99.gif" usemap="#Objmap" BORDER=1></P> </FONT><FONT FACE="Arial" SIZE=3><I> <P ALIGN="CENTER">Figure 4 - DASD Space Analysis by Workload</P></I> </FONT> page 44

  45. Web Graphs from SAS TITLE2 F=SIMPLEX C=RED J=C H=1.3 ”ServerA CPU BY DAY"; PROC GCHART GOUT=GOUT.DATE3; WHERE ( SYSTEM =: 'AVG_ServerA' ); VBAR DATE / SUMVAR=CPU SPACE = 0 SUBGROUP=WKLD DISCRETE CAXIS=BLUE CTEXT=RED NAME="DATE" DESCRIPTION="DATE GRAPH”; /****************************************/ GOPTIONS DEVICE=GIF GSFNAME=GIFOUT GPROTOCOL=SASGPASC CBACK=BWH BORDER HSIZE=6 VSIZE=4 GSFMODE=REPLACE GSFLEN=128; PROC GREPLAY IGOUT=GOUT.DATE2 NOFS; REPLAY 1; page 45

  46. Indexed Report page 46

  47. Anchored Links Web Page 1 Web Page 2 page 47

  48. Thumbnail Graphs page 48

  49. Automatic Generation of HTML • Driven by server database • SAS or Visual Basic code - builds web pages and hyperlinks page 49

  50. Color-Coded Rectangles “Treemap” paper by Ben Shneiderman, University of Maryland, http://www.cs.umd.edu/hcil/treemaps page 50

More Related