450 likes | 1k Views
Unicenter Service Desk (USD). Preliminary Scalability Recommendations for r11 Revised January 11 2007. Objective. Primary objective: present sizing and location recommendations for the following Unicenter Service Desk components: Primary Server MDB Secondary Server
E N D
Unicenter Service Desk (USD) Preliminary Scalability Recommendations for r11 Revised January 11 2007
Objective • Primary objective: present sizing and location recommendations for the following Unicenter Service Desk components: • Primary Server • MDB • Secondary Server • Domserver/Webserver pairs (on primary/secondary) • Web and Java Clients • IIS Servers (for large clients) • Network infrastructure (latency and bandwidth)
The Architecture © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.
Components • USD Primary Server (with webdirector in all our tests) • USD Secondary Server(s) (optional) • Domserver/Webserver pair (on above servers) • MDB – may be local or remote to Primary Server • Java Client (optional – not preferred, will go away) • Web Client
Documented Minimum Recommendations CPU: • Minimum: single processor, 2 GHz or better • Preferred: dual processor, 2 GHz or better • RAM: • Minimum: 2 GB • Preferred: 4 GB • Disk Space: 20GB minimum but allow for incremental growth to accommodate MDB growth • Java Client: single processor, 1 GHz (or better) with minimum 1 GB RAM
What you need to know • Location of MDB – local or remote • Number of boxes at each location • Bandwidth from each location to central/managing site • NW latency – USD can do lots of round trips • Network and firewall port/direction restrictions • Failover/Fault Tolerance requirements • Monitor CPU and memory usage, especially on Domserver/Webserver pairs – add more resource when waiting on these resources – add another pair if available memory and CPU
Other Factors • Optimization and tuning tips to enhance scalability • Dedicated or shared machines? • Planning for future growth • Best practice guidelines for filtering, monitoring and policy can reduce load • Workflow
Architecting/Tuning Resources © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.
Resources • CPU • Memory • I/O Subsystem • Network (bandwidth and round trip latency) • These are interrelated – one problem can mask others • SQL can do too much I/O so you add memory • SQL will cache disk to memory and end up with too much CPU use • Real problem could be a missing index
CPU Consumption Remediation • Add CPU(s) • Remove load (fewer Webserver/DOMServer pairs) • Tune load – look at customization/configuration and adjust • Is 10M contacts bad • Is 10M assets when you only need 10k bad • Is logging everything bad • Is reporting against production DB bad
I/O Subsystem Remediation • Split service desk disk from other applications • Split SQL log, tempdb, data, index • Add arms (more units in stripe set, faster stripe set) • Split reporting to a separate DB • Look at what I/O you are doing – may be logging level, audit, too much data or data not being used, index maintenance, shared MD, wrong phase of the moon
Memory Consumption Remediation • Add memory • Reduce load • Look at why memory is being used – could be missing index, reporting against an online transaction oriented database
Network Consumption Remediation • Top 3 things: measure – measure – measure • Add Bandwidth or reduce latency – but fix the one that matters • Remove load • Deploy secondary server in geo
Management Servers © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.
Management Servers - Topics • Primary Server • Secondary Server(s) • Domserver/Webserver pairs • IIS Servers
Primary Server Guidelines • Dual CPU preferred with 2GB RAM minimum (4GB desirable) • Monitor CPU and memory consumption/availability on ALL servers – this is the best indication of load/reserve capacity
Secondary Server • Dual CPU preferred with 2GB RAM minimum (4GB desirable) • One Domserver/Webserver pair per CPU – 1GB RAM per pair • One Domserver/Webserver pair for each 300 simultaneous users (range is 200-400 based upon type of load but it is unusual to need a pair for 200 users unless all are analysts). • Monitor CPU and memory usage – remove a Domserver /Webserver pair or add more resource if nearly full – add more pairs if both CPU and memory are available. • Multiple secondary servers are required for large installations or those with resource constraints on one secondary server • Secondary servers may be placed within geographic locales
Secondary Server – Geographic Locales • Secondary servers may be placed within geographic locales • There is less bandwidth consumed from primary to secondary than from secondary to web client • Balance cost of secondary with improved performance benefit of a local secondary server • Poor or congested bandwidth from primary to a locale can drive use of a local secondary – but you still need reliable and reasonably fast access from secondary to primary • Growth in round trip latency is a driver for needing a secondary
Secondary Server – dom/web server pairs • CPU utilization • Our tests at target load were in range of 20-40% utilization • We do not expect variation in response time below 60-80% • If response time is good, no need to add CPUs even at 80% • If CPU utilization is consistently low and memory available consider adding an additional domserver/webserver pair to improve response time if an improvement is needed • Memory utilization • Our tests had effectively no disk paging • Your mileage may vary but we favor minimizing disk paging • Add memory if constrained or if paging is observed
Secondary Server • Recommended best practice is multiple machines or Partitions verses one Big Machine • Additional separate IIS servers may be required to maximize thruput (10k user test used four IIS servers) – estimate is one IIS server (low end server) for each 2500 users. • No point in using large hardware for IIS servers • Local secondary servers can reduce WAN bandwidth need to primary but will not “fix” poor or very slow connectivity
The MDB © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.
What is it? • MDB – single database containing common tables and multiple product-specific tables previously in separate product dbs • Stores all USD data • As # rows increases, table size increases as well as disk space • Db does not just need db location disk space but also work location space (e.g., for sorting, temp and transient files w/in db.)
Single or Multiple? • Multiple MDBs can be used to accommodate organizational requirements or network considerations • Distributed MDB – component dbs on dif computers (e.g., remote or local MDB) • Should have one enterprise MDB serving as central db (provides complete view of enterprise state that other products can use) • See Federated MDB presentation for more information on multiple MDBs
MDB Planning Guidelines • Use of Reiser file system is NOT recommended (not suitable for large dbs) • Incr. computer reqs as necessary for enterprise MDB when MDB is integrating info for multiple CA prods • MDB is business critical! s/b highly available • MDB on a cluster can be a performance benefit • MDB on 64 bit is also a significant performance benefit • Cannot be installed on Windows Domain Controller
The Clients © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.
Client Options • Java Client: optional – if installed - admin functions removed • Web Client: integrated, web-based user interface for all administrative functions – new for r11! • Web Client is the strategic direction for future use and where new features will be added
Scalability Testing © 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.
Testing Guidelines • Performance was tested based on the following: • Network latency (between primary and secondary) • Affect of DMZ and firewall requirements/placement • Potential for bottleneck if multiple users attach docs • Identify “breaking points” - point at which performance started to degrade, suggesting additional resources were required • Description of tests follows
Perfmon Data • Collected from Primary and Secondary Servers • Identifies: • System CPU utilization • System memory utilization • USD CPU utilization • USD Memory utilization • For MS SQL all basic SQL statistics • For Ingres IMA statistics and reports
Silk Performer Results • Saved from boxes running Silk Performer scripts • Identifies: • # failed and completed transactions • Page response time (min., max., ave.) • # users started and halted • Total # errors
DOMserver WebEngine WebEngine WebEngine Test Scenario #1: Web Engine Performance • Objective: identify max. # concurrent users supported per web engine before performance degrades • Configuration: 1 Primary Server w/ 1 DOM server and web engine on a Quad server Primary Server
Test Scenario 1: Goal • Test: Simulate 300 analysts logging in to create issues and 700 users logging in to create requests with concurrent connections peaking at 900
DOMserver DOMserver DOMserver WebEngine WebEngine WebEngine Test Scenario 2: Web Engine & DOM Server • Objective: Compare performance of single, dedicated web engine/DOM server pairing vs. multiple web engines • Configuration: 300 users in single DOM server/three web engines (on one quad), and in a three paired DOM Server/web engine environment Primary Server
Test Scenario 2: Goal • Users log in, create issues/requests, log out and repeat • # concurrent users gradually increase (1 user/8 seconds) to breaking point, then stay at this level.
DS DS DS WE WE WE Secondary server Secondary server Secondary server Test Scenario 3: Multiple Servers • Objective: compare breaking point of running on multiple small servers vs. one large server • Configuration: 3 DOM server/web engine pairs on Quad Primary Server Primary Server
Test Scenario 3: Goal • 300 users log in, create issues/requests, log out, repeat • # concurrent users increase (1 user/8 seconds) to breaking point and stay at this level
DS DS DS DS DS DS WE WE WE WE WE WE Test Scenario 4: Max. Concurrent Users • Objective: identify maximum # concurrent users the web engine can handle before errors occur or users are halted • Configuration: quad server, start w/ single Primary server/DOM server/Web Engine and scale up Primary Server Secondary Server Secondary Server…
Test Scenario 4: Goal • Add more secondary servers (up to 4), DOM Servers and Web Engines (up to 5 each) and scale up to Best Practice • Add Web Director, configure web engines to use SSL and rerun job
Test Scenario 5: Network Impact on Performance • Objective: determine impact of network physical/logical definitions on performance • Simulate scenario 4 and identify latency between primary and secondary servers.
Test Scenario 6: DMZ and Firewall Performance Impact • Objective: determine how performance is affected by DMZ and firewall definitions • Configuration: duplicate scenario 5 and restrict SLUMP port # to mimic implementation of USD across DMZ or firewall (i.e., Primary Server in intranet, Secondary Server in DMZ)
Test Scenario 7: MDB Location • Objective: to determine performance impact of locating the MDB locally vs. remote • Configuration: duplicate configurations (with the exception of MDB location) and simulated user actions
Test Scenario: Migration Impact • Objective: to determine average time required to migrate data from USPSD 6.0 to USD r11 • Identify elapsed time for data migration
Summary of Results – r11 • Sustained load of 10,000 concurrent users for 8 hour period • None of the partitions were taxed at any time • System was very responsive • Ticket save time ranged from 1.5-3.4 seconds • CPU utilization ranged from 20% -40% • Tickets created at a rate of 3.1 per second (180/minute) • This is the equivalent of 94.6 million tickets per year for a 24x7x365 helpdesk! • Latent capacity estimated at 20-50% (12k-15k users)
Summary of Results – USPSD 6 • Sustained load of 8,000 concurrent users for 8 hour period • None of the partitions were taxed at any time • System was very responsive • Ticket save time was roughly 1 second • CPU utilization ranged from 10% -15% • Tickets created at an average rate of under 3 per second • Our hardware was not stressed at this load level • Latent capacity estimated comparable to r11