700 likes | 977 Views
Capacity Planning for the Newer Workloads. Linwood Merritt Capital One Services, Inc. linwood.merritt@capitalone.com. Disclaimer. These generic issues are addressed by this presentation: Vendor capacity ratings e-Commerce Continuous availability Data warehousing Growth rates
E N D
Capacity Planning for the Newer Workloads Linwood Merritt Capital One Services, Inc. linwood.merritt@capitalone.com
Disclaimer • These generic issues are addressed by this presentation: • Vendor capacity ratings • e-Commerce • Continuous availability • Data warehousing • Growth rates • This presentation contains no specific business-related information.
Introduction: Environment • Capital One • 5th largest card issuer in the United States • Capital One to S&P 500 in 1998 • Fortune 500 company (#260) • Managed loans at $48.6 billion as of Q1 2002 • Accounts at 46.6 million as of Q1 2002 • Fortune 100 “Best Places to Work in America” • CIO 100 Award “Master of the Customer Connection” • Information Week “Innovation 100” Award Winner • ComputerWorld “Top 100 places to work in IT”
Outline of Approach • Understand behavior and issues around workloads, hardware, and data • Create projections and build recommendations. • Report the findings.
Outline of Presentation • Discussion of workload types and capacity projection approaches • Overall summary of issues and approaches • Examples
What Workloads? • E-Commerce • Relational database systems • Mainframe-class UNIX • Multiple platforms • New characteristics
e-Commerce WorkloadsDirect to Client (business-to-business) • Access • Internet • Leased line • Services • Point of Care / Point of Sale • Value-added analysis
e-Commerce WorkloadsDirect to Customer • Access • Internet • Dial-in • Services • Marketing • Account query
e-Commerce WorkloadsHow to Predict • Take business projections of volumes or users (include fudge factor) • Estimate transaction volumes and CPU/transaction • Convert to normalized unit such as MIPS
Relational Databases • Sub-second (OLTP), decision support / data mining • Distributed gateways • Database machines • Redundant data with extracts • How to predict: estimate a factor over current database demand or take usage estimates
Mainframe-Class Unix • Types: Mainframe USS or Linux, Future UNIX vendor offerings • Candidate applications • Web server • Vendor-ported applications • User-ported / new applications • How to predict: • Estimate by timeframe • Add factor to growth rates
Multiple Platforms • Mainframe: plan like existing applications (#users, transactions * CPU/transaction, application look-alikes, sizing tools) • Distributed: use vendor sizing, modeling tools, existing applications • Network: use network simulation tools, rules-of-thumb, bandwidth calculations
New Characteristics • External users • Continuous availability • New user interfaces • Cross-platform
External Users • Drive need for continuous availability • Different access patterns (e.g., doctor’s office vs. call center) • Service level measurement - harder to put agent on external workstations
Continuous Availability • Driven by external users • 24x7 schedule • Application redesign • Data Sharing: CPU overhead • Coupling Facility • Expansion of “prime shift” • 99.999% “up time” • Redundancy, overhead • Availability reporting
User Interfaces • TCP/IP - no “definite response” (end-to-end response time measurement) • Multiple internal transactions per “mouse click” • Response time measurement: • Agent on workstations • Scripting from “robots”
Cross Platform Applications • Only unified view: simulation package • Each platform (“silo”) can be analyzed separately. • Different application development groups • May be able to cross-validate user numbers
Types of Implementation (1) • Standalone / “shrink-wrap” • Layered onto legacy applications • New mainframe application code • GUI front-end • Browser • Middle-tier (Unix or NT) • MQSeries - can add middle-tier and new mainframe applications
Types of Implementation (2) • Legacy extracts • Re-engineered legacy applications • Convergence of business rules / applications • Re-usable components • Redundant access • Salvage investment, fix Band-Aids • Simplify logic, reduce platform complexity
What Are We Analyzing?(Mainframe) • MIPS - growth, latent demand, software cost • Memory - track and watch 2 GB limit on central storage (goes away with 64-bit) • I/O - channels, gigabytes of disk, tape • Coupling Facility - Parallel Sysplex, Shared Data, continuous availability • Vendor upgrade paths • New partitions
What Are We Analyzing?(Distributed) • Number and types of platforms • CPU, memory, disk space • Bandwidth • Location of applications / processes • Platform limitations (CPU, memory) • Software pricing considerations • Porting opportunities
Measurement of New Workloads • Summarize by platform: • Workload rules (process or user names) • Processes by descending CPU% • Resources: CPU, memory, disk space, Coupling Facility, network traffic • Growth: • Resources/user/application • Number of users + application changes
Distributed Approach • Consider tiers of service (not currently at Capital One) • Address service level measurement issue • Implement reporting • Add to Capacity Plan • “Silo” vs. “Application”
Tiers of Service“Platinum” • Most expensive • Modeling product • Install in one server for each major application, use collection product for other servers
Tiers of Service“Gold” • Collection product • Capacity planning with Rules of Thumb
Tiers of Service“Brass” • Least expensive (man-hours only) • “Native” • Unix scripts • NT PerfMon
Service Level Measurement • API call at workstation - “Applications Response Measurement” (ARM) or Windows 2000 trace API calls • Agents: software tracing of Windows API calls - can be installed in a subset of end-user base (sampling) • Scripting (“robots”) • Stop watch sampling and logging
Scope of Analysis • Silos • Look at each hardware/application environment independently. • Applications • Look at each application as a whole. • Application instrumentation • Inference: put platform silos together.
Analyzing the DataGrowth Rates • General list of business plans • List of technical scenarios • Timeline • Estimate median and maximum likely MIPS/CPU/users/business units • Derive scenario growth rates
Analyzing the DataAdditional Resources • Parallel Sysplex (Coupling Facility): important for continuous availability, level set functionality • Disk / channels / tape: disk megabytes, channel maximum, tape connectivity • Communications connectivity: new partitions for availability • Memory: 2 GB constraint, 64-bit
Growth • “Baseline” growth • “Scenario” growth • Independent events (merger/acquisition, potential major project)
Example 1: Mainframe Upgrade • Task force, led by Capacity Planner • Driven by expiring three-year lease (CPU replacement, three-year planning horizon) • “Vendor parade” - presentations and dialogues • Upgrade paths • Technology / service differences • References / site visits • Capacity sizing: MIPS charts, LSPR / sizing tools
Mainframe Upgrade Deliverables • Document • Business drivers and technical scenarios • Growth forecasts • Vendor options and growth paths • Coupling Facility / Parallel Sysplex • Evaluation • Difference thresholds: MIPS claims, price/MIPS, ICF • Differentiators
Business and Technical Technical Scenarios Consolidation of distributed servers Continuous availability Significant external business Data Warehousing Acquisition/merger Business Drivers Cost management External business Improved data access Business expansion
Projections • Make educated guess by timeframe for each scenario • Add to “baseline” growth • Convert to growth rate • Use both “baseline” and “scenario growth” • Compare maximum scenario growth to maximum for platform family
Period1 Initial muck exploitation with 250 Users First Parallel Sysplex exploitation Period2 First mainframe Wk1 Application Period3 (Potential acquisition) MajorProject A with 100 users, 150% CAGR New DB2 functionality exploitation Period4 64-bit OS/390 Full Data Sharing exploitation (IMS, CICS, DB2) Period5 Full subsystem redundancy (IMS, CICS, DB2) Period6 24x7 operation Period7 Scenario Timeline
Vendor Upgrade PathsDetail • Use logarithms: Start*CAGR^x = Threshold x years = log(Threshold/Start)/log(CAGR) • Model MIPS MSU +40%/Yr +25%/Yr • GS2068E 952 160 Aug-00 Sep-00 • GS2074E 1013 171 Oct-00 Dec-00 • GS2084E 1141 193 Apr-01 Jul-01 • GS2094E 1260 213 Sep-01 Dec-01 • GS2104E 1378 234 Nov-01 May-02
Example 2: UNIX Modeling • Modeling product installed on MQSeries server • Application running with a known number of users • Projected rollout schedule used to drive model • Mainframe side: CICS application, IMS load
UNIX Platform Workloads • Two primary workloads: • MQSeries userids (mqm*) - memory intensive • Messaging application processes (MDA*) - “CPU intensive”
Workload Modeling Methodology • MQSeries - Calculate relative workload intensity, enter model ratio. • Messaging application processes - Keep constant until application is removed from platform (“design loop” - always uses 1 CPU). Must adjust across CPU upgrade to continue using 1 CPU.
CPUUpgrade Track Across Upgrade
Model Presentation Timeframe: April 2000 #Users: 180, 100 Ratios: 1.27, 1.00 Config: F50/02,2GB Comment: Add Event1 Users
Validation - Tracking Users(on mainframe) //ECLUSRS EXEC SASV8,REGION=0M //ECLD1 DD DSN=XYZ.PRD.A.AAAPRD.I.VOLFIL,DISP=SHR //ECLDPDB DD DSN=CAPLAN.PRD.ECLDPDB,DISP=OLD //SYSIN DD *,DLM=@@ data ecld1; format date date.; format dt datetime.; INFILE ECLD1 MISSOVER; INPUT @1 RECNUM $CHAR5. @6 RECTYPE $CHAR8. @14 USERCT $CHAR5. @19 USERMAX $CHAR5.; if recnum =: '99999' and rectype =: 'TCSCONFG'; dt = datetime(); date = datepart(dt); hour = hour(dt); data ecldpdb.users; update ecldpdb.users ecld1; by date hour; proc print; title 'Ecloud1 Users';
Example 3: Server Replacement • Project: replace “old” NT servers • Application: Imaging servers • Capacity sizing data: • Rules-of-thumb analysis by vendor, using projected claims/minute and processor clock speeds • Benchmark information