590 likes | 718 Views
Horizontal Scaling and Reliability Planning and Testing for Heavy Load. Steven Goeke Bill Frikken. Outline. Project Background Our Motivation Testing Tools, Techniques, and Methods Results Conclusions. Background on Georgia Tech. Six Colleges 16,000 Graduate and Undergraduate
E N D
Horizontal Scaling and Reliability Planning and Testing for Heavy Load Steven Goeke Bill Frikken
Outline • Project Background • Our Motivation • Testing Tools, Techniques, and Methods • Results • Conclusions
Background on Georgia Tech • Six Colleges • 16,000 Graduate and Undergraduate • 5000 Faculty and Staff • The NSF ranks Tech 2nd in engineering R&D and 4th in industry-sponsored R&D • Four Campuses
Background on WMU • Carnegie Research Extensive Institution • Seven Colleges • Six Regional Campuses • 28,000 Graduates and Undergraduates • 3,500 Faculty and Staff • Business Technology Research Park
Motivation • It started with Wireless Western • Anytime, anywhere access to resources • A better e-communication infrastructure • Multi-platform, open source, end-of-life system • Be innovative with the solutions
And then along came SIS • Replace a much needed student information system • Eliminate Social Security numbers • Budget challenges – student records fee • Take advantage of a portal solution • GoWMU.wmich.edu – portal delivery • Content development in 4 weeks! • SSO (Single Sign-on) capabilities • Seamless access to Banner Self-Serve, WebCT, ECS, …
We Want a Portal!! • Facilitate student/faculty communication • Enhance the student experience • Prestige • uPortal or Luminis • Banner – 9 years • WebCT – 4 years
Motivation • BuzzPort is becoming mission critical • Expanding user base • Cost savings
Current GT Architecture GT Network WebCT Firewall(s) Load Balancer Private Network Banner Self-Service Others Trusted Network Luminis 3.2 Calendar Luminis 3.2 Calendar Luminis 3.2 Calendar Portal DBs Banner Portal DBs Banner Portal DBs Banner Production Test Development
GT FOS Architecture GT Network WebCT Banner Self-Service Others Load Balancer Private Network WS WS WS WS WS WS WS WS WS Firewall(s) Trusted Network Resource Calendar Resource Calendar Resource Calendar Portal DBs Banner Portal DBs Banner Portal DBs Banner Production Test Development
WMU Architecture • What technologies deliver these various services • Sun hardware • Cisco 11503 Load Balancers • StorageTek D280 Storage Area Network • Single enterprise UserID – “Bronco NetID” • Kerberos • LDAP – Sun JES Directory • “Legacy” provisioning services • Multiple web-authentication schemes
Test and production hardware WMU • Test environment • 3 – Sun V210’s – 1.334GHz, 2GB • 1 backend box - PDS • 2 front-end web servers • Production environment • 2 backend boxes – Sun V480’s – 4, 1.0GHz, 8GB • 3 front-end – Sun V210’s – 2, 1.34GHz, 8GB
Performance and growth • Back-end services are clustered and highly redundant • Veritas HA Cluster for JES • Dual drive paths to SAN • Front-end services are load-balanced • Horizontal scaling wherever possible • Multiple SunFire V1xx and V2xx servers
Testing Tools And Techniques • Georgia Tech: • Radview Webload • 200, 500, 1000 Users • Ramp-Up over 30 minutes to target users • Sustain the load for 30 minutes • Simple Agenda: • Login, navigate to a group, post a message, logout • Measure: • Login Time, First Page Time, Average Page Time, and Response Time
GT Load Test 1 • Date: 3/9/2005 • One web server (280R/2x1.2G/2G mem) • Time: 3:06PM – 3:44PM • Duration: 2327.48 sec • 1000 Sessions
GT Load Test 1 - Results • Max Time to First Page – 5.098 sec (1000VC) • Max Login Time – 9.294 sec (1000VC) • Average Time to 1st Page: 2.337 sec • Average Login Time: 2.913 sec
GT Load Test 2 • Date: 3/9/2005 • Three web servers • Time: 4:04PM – 5:06PM • Duration: 3766.32 • 500 Sessions
GT Load Test 2 - Results • Max Time to First Page – 5.098 sec (1000VC) • Max Login Time – 9.294 sec (1000VC) • Average Time to 1st Page: 2.337 sec • Average Login Time: 2.913 sec
Test Tools • JMeter • Apache tool to provide load testing and performance-based testing and evaluation • Badboy • Export functional test for JMeter load testing • 1000 users within 30 minutes
Test Results – WMU initiated • Date: 6/8/2005 • 1000 Users over 30 minutes • Avg Login Time: 3.5 Seconds • Avg Page Load: ~1 second – 2.4 seconds • Max CPU Utilization • 15% Server 1 • 13% Server 2 • Avg Session Activity – 47 seconds
Test Results – SCT initiated • Date: 6/6/2005 • 1000 Users over 4 Hours (20 min ramp up) • Avg Login Time: 3.932 Seconds • Max Login: 4.76 Seconds • Min Login: 2.758 Seconds • Avg Page Load: ~1 second – 2.4 seconds • Max CPU Utilization – 54% Single Server • Session Activity over 4 hours
Test Results – Joint evaluation • Anticipated environment exceeded expectations • 2 Sources provided validation • Confidence moving ahead
Luminis FOS – Features & Limitations • Limited failover capability - No session persistence • Still have single points of failure • Replicate the LDAP • Replicate the DB • Horizontal scalability at web tier • Phased patching
Conclusions • Luminis FOS significant improvement • More complex • Machine allocation • Will we be implementing it?
Next Steps • Test result conclusions • More stable testing environment • Production considerations • Test needs to resemble production • Horizontally scale before putting into production • Removing single points of failure
Critical Success Factors • Top-level support • Good planning • Flexible project plan • Being “big picture” but still attend to details • Solid infrastructure • Relationships
Contact Information • Steven Goeke • steven.goeke@oit.gatech.edu • Georgia Tech www.gatech.edu • Buzzport buzzport.gatech.edu • Bill Frikken • bill.frikken@wmich.edu • Western Michigan Universitywww.wmich.edu • GoWMU portal gowmu.wmich.edu • Office of Information Technology www.wmich.edu/oit
Contact Information • Steven Goeke • Bill Frikken • bill.frikken@wmich.edu • Western Michigan Universitywww.wmich.edu • GoWMU portal gowmu.wmich.edu • Office of Information Technology www.wmich.edu/oit
GT Load Test 3 • Date: 3/10/2005 • Three web servers • Time: 1:21PM – 1:47PM • Duration: 1592.66 sec • 1000 Sessions
GT Load Test 3 – results • Max Time to First Page – 4.067 sec (786VC) • Max Login Time – 0.983 sec (76VC) • Average Time to 1st Page: 2.178 sec • Average Login Time: 0.564 sec
GT Load Test 4 • Date: 3/10/2005 • Three web servers • Time: 2:18PM – 3:15PM • Duration: 3162.2 sec • 200 Sessions
GT Load Test 4 – results • Max Time to First Page – 1.125 sec (34VC) • Max Login Time – 0.406 sec (150VC) • Average Time to 1st Page: 0.803 sec • Average Login Time: 0.283 sec
Results (Acadia1, CPU)3:06PM-3:44PM (1000VC, 1Tier)4:04PM-5:06PM(500VC, 3 Tier)
Results (Acadia1, Free Memory)3:06PM-3:44PM (1000VC, 1Tier)4:04PM-5:06PM(500VC, 3 Tier)
Results (Biscayne, CPU)3:06PM-3:44PM (1000VC, 1Tier)4:04PM-5:06PM(500VC, 3 Tier)
Results (Biscayne, Free Memory) 3:06PM-3:44PM (1000VC, 1Tier)4:04PM-5:06PM(500VC, 3 Tier)