200 likes | 333 Views
HYPOTHESIS TESTING. class of “Experimental Methods of Physics” Mikhail Yurov Kyungpook National University May 9 th , 2005. Contents. Introduction Use of weighted sum of squared deviations Errors of two sorts Types of hypothesis testing Using the z-statistic. Introduction.
E N D
HYPOTHESIS TESTING class of “Experimental Methods of Physics” Mikhail Yurov Kyungpook National University May 9th, 2005
Contents • Introduction • Use of weighted sum of squared deviations • Errors of two sorts • Types of hypothesis testing • Using the z-statistic
Introduction In probability theory, we start with some well defined problem, and calculate from this the possible outcomes of a specific experiment. We thus proceed from theory to the data. In ‘statistics’ we try solve the inverse problem of using the data to enable us to deduce what are the rules or laws relevant for our experiment. The two basic sorts of problems that we deal with in the subject of statistics are hypothesis testing and parameter fitting. In the former, we test whether our data are consistent with a specific theory (which may contain some free parameters) and in the latter we use the data to determine the values of the free parameters.
Logically, hypothesis testing precedes parameter fitting, since if our hypothesis is incorrect, then there is no point in determining the values of the free parameters contained within the hypothesis. In fact, we deal with parameter fitting first, since it is easier to understand. In practice, one often does parameter fitting first anyway; it may be impossible to perform a sensible test of the hypothesis before its free parameters have been set at their optimum values. Example Suppose we have data on angular distribution, consisting of a set of values cosθifor each interaction, where θiis the angle that the observed particle makes with some fixed direction. We can ask. Are the data consistent with an angular distribution of the form? If the data look inconsistent with this, can we make a numerical estimate indicating how confident we are that the experimental data show that the angular distribution in incorrect?
Use of weighted sum of squared deviations So, the more fundamental question is of whether our hypothesis concerning the form of the data is correct or not. In fact we will not be able to give ‘yes or not’ answer, but simply to state how confident we are about accepting or rejecting the hypothesis. In simply cases, the hypothesis may consist simply of a particular value for some parameters. The desirability of examining a distribution rather than simply determining parameter when we are hypothesis testing. If we fit either the solid or the dashed distribution in cosθ by an expression [1+b/acos2θ], the value of b/a is liable to be close to zero. This does not imply that either distribution is isotropic.
It is preferable to perform distribution testing rather than parameter testing. Distribution are tested by the χ2-method. In order to test hypothesis we have to • Construct S and minimize it with respect to the free parameters • Determine the number of degrees of freedom ν from ν=b-p where b is the number of bins of the distribution, p is the number of free parameters. • look up in the relevant set of tables the probability that, for ν degrees of freedom, χ2 is greater than or equal to our observed value Smin.
χ2-distribution have the property that the expectation value and the variance σ2(χ2)=2ν χ2-distribution for various numbers of degrees of freedom ν. As ν increases, so do the mean and variance of the distribution. Thus large values of Smin are unlikely, and so our hypothesis is probably wrong. Very small values of Smin are also unlikely, and so again something is suspicious.
More useful than the χ2distribution itself is Fν(c)=Pν(χ2>c) i.e. the probability that, for the given number of degrees of freedom, the valueχ2 will exceed a particular specified value c. Such distributions are available in almost all books on statistics
If our experiment is repeated many times, and assuming that our hypothesis is correct, then because of fluctuations we will get a larger value of Smin than particular one we are considering in a fraction F of experiments. Example In a cosθ histogram, let’s assume that there are 12 bins and that when we fit the expression N(1+b/acos2θ) to the data, we obtain a value of 20.0 for Smin. In this case we have ten degrees of freedom (12 bins less two parameters N and b/a). From figure, we see that the probability of getting a value of 20.0 or large is about 3%.
Errors of two sorts In deciding whether or not to reject a hypothesis, we can make two sorts incorrect decision. • Error of the first kind In this case we reject the hypothesis H when it is in fact correct. This should happen in a well known fraction F of the tests, where F is determined by the maximum accepted value of Smin. But if we have biases in our experiment so that the actual value of the answer is incorrect, or if our errors are incorrectly estimated, then such errors of the first kind can happen more or less frequently. The number of errors of the first kind can be reduced simply by increasing the limit on Smin above which we reject the hypothesis.
Error of the second kind In this case we fail to reject the hypothesis when in fact it is false, and some other hypothesis is correct. The value of Smin accidentally turns out to be small, even though the hypothesis H (i.e. the theoretical curve yth that is being compared with the data) is incorrect. It is very difficult to estimate how frequent this effect is likely to be; it depends only on the magnitude of the cut for Smin but also on the nature of the competing hypothesis. If these are known, then we may be able to predict what distribution they will give for Smin and hence how often we will be incorrect in accepting H.
Types of hypothesis testing The hypothesis we are testing may relate to the experiment as a whole, or alternatively it may be used as a selector for subsets of a data samples which satisfy specific criteria. • Hypothesis relates to whole experiment We observe an angular distribution from the decay of a resonance. The question is “Does the resonance have spin zero?”, this would imply that the angular distribution is isotropic. In this case, an error of the first kind is serious and in this example so is an error of the second kind; in the former case, we reject the spin zero case, when it is in fact true, in the latter we accept it when the spin is non-zero. In this experiment, the alternative hypothesis are well defined: if the spin is not zero, it is 1,2,3,.. It may also be possible to calculate angular distribution for these cases, and hence we can deduce how often each of these give a low value for Smin.
Example The angular distribution for the decay of a state whose spin we wish to determine. If the spin is zero, the distribution must be isotropic (dashed line). We calculate the value of Sminfor this hypothesis. There five experimental points and four degrees of freedom for this hypothesis, since the only variable is the normalization. If Smin is large than 10, we would reject this hypothesis - the probability that χ2 for 4 degrees of freedom exceeds 10 is only 5%. In our case Smin is 8.7, so the hypothesis is not rejected. This does not necessarily mean that the spin is zero. If it were 1, the predicted decay distribution may be cos2θ (dotted curve). The Smin' for this hypothesis is 4.1, which is also below our rejection cut. The errors on our data are so large that we have poor discrimination between these two hypothesis
Hypothesis used as data selector An experiment may consist of a large set of interaction of a beam of protons with a hydrogen target, in each of which four charged tracks are observed and measured. We test the hypothesis that this interaction are examples of the reaction pp→ppπ+π- The hypothesis is tested by seeing whether the measured direction and momentum of the tracks are consistent with those expected for reaction on the basis of energy and momentum conservation. Here, we are using our hypothesis to check individual sets of data to study with a view to extracting some interesting physics. Errors of the first kind correspond to rejecting a small fraction of genuine examples of reaction. It is not serious; reduction is the size of the data sample due to the rejection of these events should be small
Errors of the second kind correspond to accepting events as examples of reaction when they in fact are produced by some other reaction with four visible charged tracks, for example pp→ppμ+μ- (*) Thus errors of the second kind constitute a potentially more dangerous problem; our data sample is contaminated. The extent of this contamination is difficult to estimate. It will depend on how frequently the reaction (*) produce kinematical configurations resembling those of reaction of interest. Since the μ mass is very close to that of the π, reaction (*) will be difficult to distinguish from primary reaction simply on the basis of measurements of direction and momenta. These contamination are in general reduced by lowering the value of the cut on Smin.
Using the z-statistic When σ is known it is possible to describe the form of the distribution of the sample mean as a Z statistic. μ – population mean (either known or hypothesized under H0) Critical Region –the portion of the area under the curve which includes those values of a statistic that lead to the rejection of the null hypothesis. The most often used significance levels are 0.01, 0.05, 0.1. For a one-tailed test using z-statistic, these correspond to z-values of 2.33, 1.65, and 1.28 respectively. For a two-tailed test, the critical region of 0.01 is split into two equal outer areas marked by z-values of |2.58|.
Example Given a population with μ=250 and σ=50, what is the probability of drawing a sample of n=100 values whose mean is at least 255? In this case, Z=1.00. Looking at Table of Areas Under the Normal Curve, the given area for Z=1.00 is 0.3413. To its right is 0.1587(=0.5-0.3413) or 15.85% Conclusion There are approximately 16 chances in 100 of obtaining a sample mean 255 from this population when n=100
References • L.Lyons, “Statistics for nuclear and particle physics”, Cambridge (1985) • William R.Leo “Techniques for Nuclear and Particle Physics Experiments”, Springer-Verlag Berlin Heidelberg (1987) • http://rvgs.k12.va.us/statman/