250 likes | 261 Views
Understand performance testing types, tools, measurements, and key factors for performance engineers. Explore real-life case studies and their root causes.
E N D
性能测试那些事儿 刘博 boliu@thoughtworks.com .
3 3 WHERE WE ARE BASIC CONCEPT 1 TROUBLESHOOTING 2 1
WHATISPERFORMANCETESTING? • To determine how a system performs in terms of responsiveness and stability under a particular workload. • To investigate, measure, validate or verify other quality attributes of the system, such as scalability, reliability and resource usage.
PERFORMANCE TESTING TYPES • Load testing • Configuration testing • Isolation testing • Capacity testing • Stress testing • Soak testing • Spike testing
PERFORMANCETESTINGTOOLS • Neoload, LoadRunner • Silk Performer, Rational Performance Tester • LoadUI, Gatling, Grinder, JMeter
PERFORMANCE TESTING S.O.P • Identify Performance Acceptance Criteria • Plan and Design Tests • Identify, Configure and Validate the Test Environment • Implement, Validate and Verify the Test Script • Execute the Test (Warm-up first) • Analyze Results, Tune, and Retest
KEY MEASUREMENTS • Hardware Resource • CPU (Context Switches/sec, Processor Queue Length) • Memory (Pages/sec) • IO (Average Disk Queue Length, Network Usage) or IOPS • Software Resources • Web Server • Database • Customized Performance Counters • xVM • Logs
KEY MEASUREMENTS • Monitoring Tools • AppDynamics, Dynatrace, New Relic, One APM • Performance counter tool along with testing tools or OS • Zabbix, nagois
KEY FACTOR AS/FOR PERFORMANCE ENGINEER • ALL-ROUND • For the target system • Architecture Design • Cluster Configuration • Network Topology • Capacity of Test Agents • Communication
CASE 1 – PHONE INTERVIEW SLOWS DOWN • Key Measurements • Get Sample • Start Interview • Page to Page • End Interview
CASE 1 – PHONE INTERVIEW SLOWS DOWN • Performance degrade ~10% steadily with build 0615, only on page to page time • CPU usage ~10% higher in build 0615 • No such issue with build 0501 • No error in logs • No such issue on Web Interview • ~ 150 bugs fixed between 0501 and 0615 • No performance bug fixed between 0501 and 0615
CASE 1 – ROOT CAUSE • One base class in common framework modified with extra features, which is NOT supposed to be used by Phone Interview, causes unnecessary load/unload operations in Next/Previous page operations • Simulate clicking to Next/Previous page operations is ultra frequently especially under heavy load • Actions?
CASE 2 – WEB INTERVIEW TIMES OUT • Lots of Web Interviews timed out in production randomly • After a restart everything’s fine but as time goes on, the error recurs • Error calling WS method 'Method'. URL 'URL', Error codes: Client 5, HTTP -1, SOAP 0, TCP 10048 • IIS works well • Load is heavy sometimes but not exceeds upper limit • Cannot reproduce with given load/scenario in house • Not related with anti-virus software or firewall
CASE 2 – CONTINUE INVESTIGATION • Increase load and monitor pages/sec from customized counters • Drop down dramatically when the issue reproduced • Then web tier server could only handle interview in slow rate • Drill down to the entire web interview process in back end, i.e. from client, to web server, and then to interview server • Every request to web server will open a new TCP port! • netstat -an
CASE 2 – ROOT CAUSE • TCP port exhaustion on web tier server • Default release time for TCP TIME_WAIT is 4 minutes in Windows • Actions?
CASE 3 – ERRORS IN MULTI-TENANT ONLY • Error occurs in 10 minutes accuratelywith multi-tenant • No such issue with single-tenant • Massive errors in logs - not so helpful • CPU Usage is higher than single-tenant • GC Activity is much higher (5% to 10% in CPU time) • No use to adjust -Xmx since physical memory is not the bottleneck
CASE 3 – CONTINUE INVESTIGATION • Architect team guarantees this issue is not relevant with single or multiple tenant • System.gc() is called explicitly in code but exists for long time • System.gc() is called only under specified condition out of test scope • Check Oracle Java Doc on GC policy and confirmed using correct one • Check JVM startup parameters with Ops
CASE 3 – ROOT CAUSE • JVM startup parameter configuration on multi-tenant • Add -XX:NewRatio to adjust young generation and old generation to avoid frequently GC • Actions?
THANK YOU • Q & A