280 likes | 425 Views
Analyzing Performance test data. (or how to convert your numbers to information). Carles Roch-Cunill Test Lead for System Performance McKesson Medical Imaging Group carles.roch-cunill@mckesson.com. Agenda. Performance testing as an experimental activity
E N D
Analyzing Performance test data (or how to convert your numbers to information) Carles Roch-Cunill Test Lead for System Performance McKesson Medical Imaging Group carles.roch-cunill@mckesson.com
Agenda • Performance testing as an experimental activity • Very fast review of Scientific Method • Errors, forget them at your own risk • About the meaning of data • Some statistical concepts • Analyzing data • Adjusting your data to a model • Summary
Performance testing as an experimental activity There are two approaches to testing: a) Without added value • This feature does not work • This requirement is not meet b) With added value • This feature does not work, and this module/component/software artifact is the culprit • This requirement is not meet, and it fails for this reason. Usually, things are not so clear, and testers statements fall somehow in the middle. Because Performance testing gathers data that can be analyzed, the performance tester is well positioned to provide added value information to the team.
Performance testing as an experimental activity If you want to provide added value and explain why the requirement is not met you will • Formulate a hypothesis: “My performance degrades due to component X” • Test the hypothesis by developing an appropriate test environment • Gather results • Analyze the results to see if they confirm or reject your hypothesis If you are lucky and your guess (the hypothesis) was good, you will have explained at least a part of the performance behaviour. However, usually there may be other factors that may also influence your performance, so you have catch one low hanging fruit.
Performance testing as an experimental activity You can create different test that will put more emphasis in one of the components of the system. For example, you may want to specifically measure the performance of the data repository tier, or the network, or only the UI. Depending where is your focus, your methodology and your tools will change. In all cases, you need to fix all the parameters but one. For example, if you want to study the influence of the network on your system, you need to do the following: • Determine the parameters that characterize the network (latency, bandwidth, utilization…) • Identify if they are independent or not (utilization and latency may not be independent) • Modify one parameter at a time while keeping the other constant
Very fast review of Scientific Method • An effect has been observed. Example: performance degradation on your application • You try to reproduce it and learn the conditions to reproduce it at will • You may gather some data through testing • To explain the data you formulate a model (hypothesis) • You refine your testing and tailor it around your model • You analyze the new data and check if your model fits the data • If the model fits it, you are on a good footing • If the model partially fits it, you either refine your model or discard it. • If the model does not fits it, you formulate another model • In both cases, new data obtained from other tests may force you to modify/rethink or even dump your model. • Once your data fits the model, you draw conclusions based on the framework provided by the model.
Very fast review of Scientific Method Unstated principles: • Simpler is better • Same procedure and system, you get the same results. • A model should not introduce mode questions than it answers • Usually, newer models include the older models as particular cases • Models are dynamic.
Errors, forget them at your own risk Errors happen… so take them into account There are two main kind of errors: • Human Errors: stopping the watch in the wrong moment, confusing digits… • Instrument error: Your watch is not precise, has a mechanical defect…
Errors, forget them at your own risk In the graph besides. If your error bar is ± 1, we can say the trend is to a larger value. However, if the error bar is ± 3, then we can not say anything about the trend of this data
About the meaning of data Performance generates a lot of data. But what all the data means? To explain this data you need to take into account: • Hardware • Network characteristics • Network topology • Physical support for Data tier (storage, database..) • The architecture of your application • How your application is coded ….
About the meaning of data In addition, you need to analyze the results in the context of the requirement or the question you are trying to answer. For example: “ Event A should not take more than x seconds” In most of the circumstances involving computer systems, you will have an stochastic component in your distribution. Assuming a normal one you will have something like
About the meaning of data But, what exactly the requirement means?Strictly it means:
About the meaning of data However, the requirement it usually interpreted as: For formal point of view the requirement “Event A should not take more than x seconds” would have failed with the above distribution. However the statement “The average of Event A should not take more than x seconds” would pass
About the meaning of data The requirement can also be expressed as percentile In this case the requirement will be stated as “Event A should not take more than X seconds 50% of the time”
Some statistical concepts Once we have defined the question, we can provide the answer. The answer will be obtained through measurements (either manual or automated). The more measurements you take, the better will be your statistics and the better will be your answers. However, the measurements need to be statistically significant. What it means is the measurement is good enough to be included in your statistics. All the measurements that are included in your statistics need to be statistically equivalent
Some statistical concepts How you determine if your data is statistically equivalent? You can apply some complex mathematical analysis or apply common sense. Some rules of thumb: • If in a single set of measurements, 20% of your data is very different, you either have a problem in your test system or you are observing different phenomena. • If you have done several runs, and the 90th percentile of a new test is bigger (smaller) than the maximum (minimum) of the previous tests, then the new data is not statistically similar, and has no statistically significance for your results. • If you are expecting a specific distribution, and you are not getting it, the current set can not be compared (is not statistically equivalent) to the data you were expecting. • Outliers are not statistically equivalent to the rest of the set.
Some statistical concepts Example of 90th percentile for Test 3 being bigger than the maximum of the other sets of measurements. In this context Test 3 is not statistically equivalent and will be rejected.
Some statistical concepts Outliers are usually defined as • Measurement outside the overall pattern of a distribution (Moore and McCabe 1999). • A more precise definition is a point the is 1.5 more than the interquartile range above the third quartile of below the first quartile Usually, the presence of an outlier indicates either an error in the measurement or an incomplete model
Analyzing data • While testing a non deterministic system you will always get a distribution of values, all of them valid in principle. • For example, if your average in a measure is 3 and you sample again and get 6, this ‘6’ is also correct and you can not discard this number (unless you do not determine this point is an outlier). • The good news is you can extract information from this succession of different numbers.
Analyzing data For example, we may have the following collection of raw data for a measure that generically we will describe as “query database”, in seconds4.18; 2.1; 1.9; 2.23; 4.5; 4.2; 2.19; 2.21; 4.24; 2.23; 1.99; 2.01; 2.39; 4.19; 2.42; 2.08; 2.27; 3.98; 2.21; 2.45; 4.32; average: 2.9These results seem to be a mix of two series:2.1; 1.9; 2.23; 2.19; 2.21; 2.23; 1.99; 2.01; 2.39; 2.42; 2.08; 2.27; 2.21; 2.45 average: 2.2And4.18; 4.24; 4.19; 3.98; 4.32; 4.5; 4.2 average: 4.2
Analyzing data • What the previous slide is telling us? • Averaging all the results tells us nothing. • The results point to a hidden effect: the system executes the query in different ways. • One possible cause could be that one query joints more tables and thus, it takes more time to return the results • So, if you want to answer the question of “What is the time to execute this query” you would need to be more nuanced or would need to know the frequency of these queries, so you would be able to make a weighted average.
Adjusting your data to a model The most common one is the usual Gaussian or normal distribution, where σ is the standard deviation and μ is the average The importance of this distribution lay in the Central Limit Theorem, that indicates the distribution of random variables tend to be a normal distribution when sampled a large number of times. Example: if we assume that latency experience by users in a wireless network only depend on the distance to the hub, μ can be interpreted as the average distance of the user to the hub and σ will indicate how spread are the users around the hub.
Adjusting your data to a model Another example of analysis: The Chi distribution Resembles in first approximation to the Gaussian distribution, however, it refers when a phenomena depends of K independent parameters, and each of them individually would provide a Gaussian distribution. Example: the observed latency time in a ADSL city wide network may depend of the network utilization, and the latency induced by the distance to the nearest hub. If we want to improve the performance of the system, then we need to tackle both problems.
Adjusting your data to a model This would be an example of two uniform distributions
Adjusting your data to a model • If your model can not explain well the results, you need to change or improve the model • A useful model should have predictive capabilities, so you can design new tests to prove/disprove the model. • Negative results (model disproved) can be as useful as a positive results • The analysis of the performance data can help to prevent future bottlenecks and problems • The analyzed results will have a range of validity. Do not force too many consequences from them
Summary • Performance testers provide information beyond requirement compliance • Performance testing should be treated like a experimental activity • As experimental activity, scientific method is the most appropriate method of enquiry. • In tune with the scientific method, you need to make assumptions, design your experiment accordingly and reduce the error bars • Data should be subject to an statistical analysis • After the analysis, you should try explain your data with a model • If the models does not a good job explaining your data, you should change/refine the model • Your analysis should help to make the software better.
Analyzing Performance test data Questions?