1 / 80

Hypothesis testing

Learn the concept of the null hypothesis, procedure for hypothesis testing, and the errors of type 1 and 2. Gain knowledge on when to take action based on statistical results.

reedn
Download Presentation

Hypothesis testing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hypothesis testing TRIBE statistics course Split, spring break 2016

  2. Goal Concept of the null hypothesis H0 Know the procedure at hypothesis testing Error of type 1 (false positive) and 2 (false negative)

  3. When to take action Auric Goldfinger (in James Bond 'Goldfinger'): 'Mr. Bond, they have a saying in Chicago: Once is happenstance. Twice is coincidence. The third time it is enemy action.'

  4. Trouble with knowledge From which certainty level on do you claim to know rather than believe? • Maybe never => hardly any knowledge at all • Personal choice => preferences matter • Varying personal α level in different settings: dice versus lottery In statistics • Only negative results for sure • Results for all α levels by the threshold given by the p-value • Never sure despite statistically significant results

  5. Standard procedure Formulate a null hypothesis H0 Identify a test statistic Compute the p-value Compare p-value versus α level

  6. The H0 world Virtual world: omission of everything unnecessary Model: connections between variables, distributions, parameters, ε Not necessarily wrong: else, rejecting it hardly an achievement

  7. Model with no error = a definition Examples • Kelvin versus °C linear with slope 1 • Fahrenheit versus °C linear • Variance versus standard deviation quadratic • Measurement errors still possible

  8. The falsification principle for H0 An outcome (of a test statistic) in the sample which is too extreme (= less likely ex ante than α in percent) leads to a rejection of the null hypothesis • If you live in a H0 world, you wrongly reject the null in α (in percent) of independent samples

  9. Statistics can prove something wrong AbsoluteRealizations outside the distribution like 7 at standard dice With any (freely chosen) degree of conviction but never certaintyRealizations that would have been unlikely ex ante(corresponds to the standard hypothesis testing) FailureWrong decision about the null hypothesis of due to random effects(errors of type 1 and 2)

  10. Statistics cannot prove something to be correct Without (model or measurement) errors, there is no need for statistics With errors, the result could (almost always) result from those(depending on the possible outcomes of the error under the null) Even if the sample outcome is 'likely' under the null hypothesis, it could truly result from another distribution, and this other distribution must satisfy no other condition than assigning a positive probability to the outcome in the sample => Prove, no – Support, yes

  11. 1-sided versus 2-sided tests Choice depends on the alternative to H0 • If you suspect that the true value of your test statistic exceeds the average of this test statistic under the null hypothesis, a relatively low value in the sample does not support your alternative H1 • Once the direction of the deviation is given by the sample, of course the 2-sided test sets a stricter threshold for the rejection of the H0 • Story matters ex ante, not only ex post(otherwise, the choice of 'only' a 1-sided test might be considered as fiddling)

  12. Hypothesis testing calculator (example)

  13. Hypothesis testing in EViews SeriesViewDescriptive Statistics & TestsSimple hypothesis tests

  14. Limits to H1 None in principle but • statements only possible about the sample in relation to H0 • H1 bound to changes in the H0 model parameters in most tests • no specific indication for the choice among alternative Hx What to choose as the new null hypothesis after rejection • Rejection usually just indicates a region for better parameter values(like mean > 0 instead of mean = 0) • Lower/upper bound by parameters that result in rejection as H0 • Confidence intervals as a result (specific to the sample, not to H0)

  15. Type 1 error Situation • H0 is true • The sample exhibits an extreme test statistic • H0 is therefore rejected 'Extreme' is a matter of opinion • Type 1 error is therefore set by the investigator => α confidence level

  16. Type 2 error Situation • H0is not true • By chance, the sample test statistic does not classify as 'extreme' under the null hypothesis • H0 is therefore not rejected • Type 2 error occurrence is usually the result ofthe α level and the assumptions about H1

  17. Alternatives to the H0 tests

  18. To do list • Acknowledge if you are still not sure • Be aware of the assumptions that your H0 implies • Choose and justify your new null hypothesis in case of rejection • Do not chase rejection by • data selection • indiscriminate adjustment of your theory to the data • lowering the requirements (higher α level) • Explain what (no) rejection of H0 means in your setting • Make sure that your null hypothesis is not obviously wrong

  19. Questions?

  20. Conclusion • Hypothesis testing works as follows • Formulate a null hypothesis H0 • Identify a test statistic • Compute the p-value • Compare p-value and α level • More data usually helps • No rejection ≠ no effect • Choice of the α level (type 1 error), indirect control only over type 2 • No real alternative to H0 hypothesis testing

  21. H0 formulation TRIBE statistics course Split, spring break 2016

  22. Goal Formulate the desired result Formulate the desired result in a testable way Meet the requirements for a meaningful H0 and H1

  23. Scientific approach: make your statement testable Replication by repetition • at least in theory (some datasets are hard to replicate) • H0 provides a benchmark for every new sample • The more samples, the more likely rejection occurs under the null (also with a predictable and hence testable frequency) Predictions • best result of a theory if they come true • stronger in a different setting (X variables outside the first sample) • For equally not rejected hypotheses, trust the more convincing story

  24. Types of data stories (example online) Change over time Contrast Drill down Factors Intersections Outliers Zoom out …and more

  25. Use existing work: statistics as a tool 'Standing on the shoulders of giants' (Isaac Newton) • Confirmation in a new setting • Country • Time • Topic • Extension of an existing model • Variables • Structure (parameters) • Error • Green field model

  26. Model structure X only • Correlation • Independence • Time series X/Y • Form of the relationship (linear, logarithmic, etc.) • Parameters (number, flexibility, interaction) • Error distribution Omitted variables of no or not enough relevance

  27. Assumptions Again • Model type • Correlations • Error term • How much do deviations from the assumptions hurt? • Check on the parameters by significance tests • Check on the error term by distribution and independence tests • Check on practical consequences by the explanatory content • Consequences in the real world?

  28. Justification of the assumptions GenerallyApproximation, closeness confirmed by tests and words Law of large numbers helpsExample: normally distributed sample means High explanatory content helpsError relatively unimportant

  29. Interpretation of the model Does the quantified version (= the model) represent the idea? • Interpretation error (misspecification)Example face recognition (black persons ignored)=> Algorithm may have looked for optically dark features on a bright face while for some person the relation appears reverted • Something seemingly unusualwhich is actually 'normal' • If X and/or Y are proxies, how are they linked to the ideal measure?

  30. Admissible interpretation after significant results H0 rejected • acknowledging that the result may be driven by chance • at the α level (no certainty) • up to alternative α levels equal to the p-value Support for H1, indeed for any alternative not rejected when taken as H0 Inappropriate • H1 is true • Generalizing ('model is wrong' when only parameters are tested) • Any statement about the assumptions

  31. The reality check AppropriatenessWould H0 make sense? InsightDoes H1 make sense? RelevanceDoes the rejection of the H0 change anyone's behavior?

  32. Fix it in theory New story Transformation of • Y effect of the explanatory variables on different aspects of Y (absolute values, growth rate, etc.) • X change of the relations between the explanatory variables • ε as a consequence only (the error term should not explain anything) Transformations monotone in order to preserve the order

  33. Fix it in practice More data • Broader coverage(application, geography, or time) • Clearer statistical results(higher N) • Robustness(more potential variables) Predictive prior research results(justification)Story (theoretical explanation)

  34. Transferability Transfer • geographically • outside thesample region of the x-variable • over time • to a technically analogous y-variable (similar behavior) • to an analogous y-variable in terms of content (similar explanation) Valuable for predictions Stability of the assumptions needed (model type, parameters, ε)

  35. Data availability • Access • Awareness • Costs • Coverage • Format (tractability) • Extension later on • Permission to use, especially publication • Reliability • Size • Time

  36. Simplification Acceptable if the results are clear enough • Low p-value • High explanatory contents R2 • All which is significant and relevant should be modeled Unequally distributed (but correlated) outside factors lead to distortions Parameters change with other variables even if p stays below α

  37. Application on other data sets Statistics as the lowest hurdle • Methods transferable • Assumptions and interpretations matter • Advantage when building on previous work

  38. To do list • Acknowledge if the desired H0/H1 combination cannot be rejected • ex ante: no existing test for the prevailing configuration • ex ante: not the appropriate data available • ex post: not the required sample properties • Anticipate the distribution but not the realization of the sample • Be aware of the assumptions that your H0 implies (again) • Justify the α level you require

  39. Questions?

  40. Conclusion Statistics of no help – H0 formulation is a purely conceptual process Aim at H1 and choose H0 accordingly Formulate H0 and H1, and only implementation & justification remain Ask specific questions Tests useful if you gain insights from (at least one possible) result

  41. Mean comparison TRIBE statistics course Split, spring break 2016

  42. Goal Answer to 'Is there a difference on average?'

  43. Why should we care about mean comparisons? Usually meant when asking 'Is there a difference?' Mean = expected (= 'true') value of the average Applies to other statistics as wellExample: expected value of the variance Basis for marginal effects in regressionsHow much does the outcome change if the input increases by 1 unit Easy application

  44. Dice roller online (example) • Roll 1, roll 2, roll 52 – What tendency does the average have? How likely seem the extreme realizations (all 1 or all 6)?

  45. Approaching the normal distribution Roll 1 die => uniform ('equal') distribution, often associated with 'fair' Roll 2 dice => eyes sum up to unequally likely numbers: symmetric, higher probability of realizations in the middle More dice (n) • Distribution gets larger • Support proportional to n (distance from minimum to maximum)(for finite distributions: no possible realization of +/-infinity) • Shape of the bulge proportional to √n (= volatility)

  46. Law of large numbers Message • Sample mean → µ for larger n • Imagine to sample N => average (=sample mean) and µ coincide • The higher n, the fewer possibilities to 'drive the average away from µ' (not true strictly speaking in a distribution with infinite realizations) Types • Strong Law of Large Numbers • Weak Law of Large Numbers (= Bernoulli's theorem) • Law of Truly Large Numbers (a consequence, no mathematical law)

  47. Law of large numbers online (example)

  48. The Central Limit Theorem (statement) Requires some mathematical expertise for a full appreciation Almost always, the sample mean converges to µ (= true) for higher n Application: the average of large samples is normally distributed

  49. Central Limit Theorem (example)

  50. CLT message We can approximate the distribution of the sample mean arbitrarily well by the normal distribution N(µ,σ) • no matter what the criterion for 'well' is (a bold statement) • no matter how the distribution of X looks (also a bold statement) • Only restriction: finite variance (and hence also existence of a mean) Consequence • Complete distribution of the statistic (here: sample mean) known • Knowledge above despite limited information (only the sample) about the underlying distribution • More data solves any problem (here)

More Related