1 / 33

Hypothesis t esting. Parametric t ests

Explore the importance of biostatistics in quantifying differences in medical studies through hypothesis testing and statistical inference. Learn about parametric tests, including the Student t-test and ANOVA, and the concepts of Type I and Type II errors.

vela
Download Presentation

Hypothesis t esting. Parametric t ests

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hypothesis testing.Parametric tests Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine

  2. Outline • Statistical inference • Hypothesistesting • Type I andtype II errors • Student t test • ANOVA • Parametric vsnon-parametric tests

  3. Importance of biostatistics • Diabetestype 2 study • Experimental group: Meanbloodsugarlevel: 103 mg/dl • Control group: Mean blood sugar level: 107 mg/dl • Pancreaticcancerstudy • Experimental group: 1-year survivalrate: 23% • Control group: 1-year survivalrate: 20% Isthere a difference? Statistics are needed to quantify differences that are too small to recognize through clinical experience alone.

  4. Statistical inference • Diabetes type 2 study • Experimental group: Mean blood sugar level: 103 mg/dl • Control group: Mean blood sugar level: 107 mg/dl • Increased sample size: • Diabetes type 2 study • Experimental group: Mean blood sugar level: 99 mg/dl • Control group: Mean blood sugar level: 112 mg/dl

  5. Statistical inference • Compare the mean between 2 samples/ conditions • if 2 means are statistically different, then the samples are likely to be drawn from 2 different populations, ie they really are different µ1 µ2 X1 X2

  6. Statistical inference • Diabetestype 2 study • Experimental group: Meanbloodsugarlevel: 103 mg/dl • Control group: Mean blood sugar level: 107 mg/dl • Increasedsamplesize: • Diabetestype 2 study • Experimental group: Meanbloodsugarlevel: 105 mg/dl • Control group: Mean blood sugar level: 106 mg/dl

  7. Statistical inference • Compare the mean between 2 samples/ conditions • if 2 samples are taken from the same population, then they should have fairly similar means X1 µ X2

  8. Hypothesis testing • The general idea of hypothesis testing involves: • Making an initial assumption; • Collecting evidence (data); • Based on the available evidence (data), deciding whether to reject or not reject the initial assumption. • Every hypothesis test — regardless of the population parameter involved — requires the above three steps.

  9. Criminal trial • Criminal justice system assumes the defendant is innocent until proven guilty. That is, our initial assumption is that the defendant is innocent. • In the practice of statistics, we make our initial assumption when we state our two competing hypotheses – the null hypothesis (H0) and the alternative hypothesis (HA). Here, our hypotheses are: • H0: Defendant is not guilty (innocent) • HA: Defendant is guilty • In statistics, we always assume the null hypothesis is true. That is, the null hypothesis is always our initial assumption.

  10. Null hypothesis – H0 • This is the hypothesis under test, denoted as H0. • The null hypothesis is usually stated as the absence of a difference or an effect. • The null hypothesis says there is no effect. • The null hypothesis is rejected if the significance test shows the data are inconsistent with the null hypothesis.

  11. Alternative hypothesis – H1 • This is the alternative to the null hypothesis. It is denoted as H', H1, or HA. • It is usually the complement of the null hypothesis. • If, for example, the null hypothesis says two population means are equal, the alternative says the means are unequal

  12. Criminal trial • The prosecution team then collects evidence with the hopes of finding sufficient evidence to make the assumption of innocence refutable. • In statistics, the data are the evidence. • The jury then makes a decision based on the available evidence: • If the jury finds sufficient evidence — beyond a reasonable doubt — to make the assumption of innocence refutable, the jury rejects H0 and deems the defendant guilty. We behave as if the defendant is guilty. • If there is insufficient evidence, then the jury does not reject H0. We behave as if the defendant is innocent.

  13. Making the decision • Recall that it is either likely or unlikely that we would observe the evidence we did given our initial assumption. • If it is likely, we do not reject the null hypothesis. • If it is unlikely, then we reject the null hypothesis in favor of the alternative hypothesis. • Effectively, then, making the decision reduces to determining likely or unlikely.

  14. Making the decision • In statistics, there are two ways to determine whether the evidence is likely or unlikely given the initial assumption: • We could take the critical value approach (favored in many of the older textbooks). • Or, we could take the P-value approach (what is used most often in research, journal articles, and statistical software).

  15. Making the decision • Suppose we find a difference between two groups in survival: • patients on a new drug have a survival of 15 months; • patients on the old drug have a survival of 18 months. • So, the difference is 3 months. • Do we accept or reject the hypothesis of no true difference between the groups (the two drugs)? • Is a difference of 3 a lot, statistically speaking – a huge difference that is rarely seen? • Or is it not much – the sort of thing that happens all the time?

  16. Probability • A measure of the likelihood that a particular event will happen. • It is expressed by a value between 0 and 1. • First, note that we talk about the probability of an event, but what we measure is the rate in a group. • If we observe that 5 babies in every 1000 have congenital heart disease, we say that the probability of a (single) baby being affected is 5 in 1000 or 0.005. 0.0 1.0 Cannot happen Sure to happen

  17. Making the decision • A statistical test tells you how often youwould get a difference of 3, simply by chance, if the null hypothesis is correct – no real difference between the two groups. • Suppose the test is done and its result is that P = 0.32. This means that youwould get a difference of 3 quite often just by the play of chance – 32 times in 100 – even when there is in reality no true difference between the groups.

  18. Making the decision • A statistical test tells you how often you’d get a difference of 3, simply by chance, if the null hypothesis is correct – no real difference between the two groups. • On the other hand if we did the statistical analysis and P = 0.0001, then we say that you’d only get a difference as big as 3 by the play of chance 1 time in 10 000. That’s so rarely that we want to reject our hypothesis of no difference: there is something different about the new therapy.

  19. Hypothesis testing • Somewhere between 0.32 and 0.0001 we may not be sure whether to reject the null hypothesis or not. • Mostly we reject the null hypothesis when, if the null hypothesis were true, the result we got would have happened less than 5 times in 100 by chance. This is the conventional cutoff of 5% or P < 0.05. • This cutoff is commonly used but it’s arbitrary i.e. no particular reason why we use 0.05 rather than 0.06 or 0.048 or whatever.

  20. Hypothesis testing

  21. Type I and II errors A type I error is the incorrect rejection of a true null hypothesis (also known as a false positive finding). The probability of a type I error is denoted by the Greek letter  (alpha). A type II error is incorrectly retaining a false null hypothesis (also known as a false negative finding). The probability of a type II error is denoted by the Greek letter  (beta).

  22. Level of significance Level of significance (α) – the threshold for declaring if a result is significant. If the null hypothesis is true, α is the probability of rejecting the null hypothesis. α is decided as part of the research design, while P-value is computed from data. α = 0.05 is most commonly used. Small α value reduces the chance of Type I error, but increases the chance of Type II error. Trade-off based on the consequences of Type I (false-positive) and Type II (false-negative) errors.

  23. Power Power – the probability of rejecting a false null hypothesis. Statistical power is inversely related to β or the probability of making a Type II error (power is equal to 1 – β). Power depends on the sample size, variability, significance level and hypothetical effect size. You need a larger sample when you are looking for a small effect and when the standard deviation is large.

  24. Common misconceptions • P-value is different from the level of significance α. P-value is computed from data, while α is decided as part of the experimental design. • P-value is not the probability of the null hypothesis being true. P-value answers the following question: If the null hypothesis is true, what is the chance that random sampling will lead to a difference as large as or larger than observed in the study. • A statistically significant result does not necessarily mean that the finding is clinically important. Look at the size of the effect and its precision. • Lack of difference may be a meaningful result too!

  25. Choosing a statistical test Choice of a statistical test depends on: Level of measurement for the dependent and independent variables; Number of groups or dependent measures; Number of units of observation; Type of distribution; The population parameter of interest (mean, variance, differences between means and/or variances).

  26. Choosing a statistical test • Multiple comparison – two or more data sets, which should be analyzed • repeated measurements made on the same individuals; • entirely independent samples. • Degrees of freedom – the number of scores, items, or other units in the data set, which are free to vary • One- and two tailed tests • one-tailed test of significance used for directional hypothesis; • two-tailed tests in all other situations. • Sample size – number of cases, on which data have been obtained • Which of the basic characteristics of a distribution are more sensitive to the sample size?

  27. Student t-test

  28. 1-sample t-test • Comparison of sample mean with a population mean • It is known that the weight of young adult male has a mean value of 70.0 kg with a standard deviation of 4.0 kg. Thus the population mean, µ= 70.0 and population standard deviation, σ= 4.0. • Data from random sample of 28 males of similar ages but with specific enzyme defect: mean body weight of 67.0 kg and the sample standard deviation of 4.2 kg. • Question: Whether the studied group have a significantly lower body weight than the general population?

  29. 2-sample t-test Aim: Compare two means Example: Comparing pulse rate in people taking two different drugs Assumption: Both data sets are sampled from Gaussian distributions with the same population standard deviation Effect size: Difference between two means Null hypothesis: The two population means are identical Meaning of P value: If the two population means are identical, what is the chance of observing such a difference (or a bigger one) between means by chance alone?

  30. Paired t-test Aim: Compare a continuous variable before and after an intervention Example: Comparing pulse rate before and after taking a drug Assumption: The population of paired differences is Gaussian Effect size: Mean of the paired differences Null hypothesis: The population mean of paired differences is zero Meaning of P value: If there is no difference in the population, what is the chance of observing such a difference (or a bigger one) between means by chance alone?

  31. One-way ANOVA Aim: Compare three or more means Example: Comparing pulse rate in 3 groups of people, each group taking a different drug Assumption: All data sets are sampled from Gaussian distributions with the same population standard deviation Effect size: Fraction of the total variation explained by variation among group means Null hypothesis: All population means are identical Meaning of P value: If the population means are identical, what is the chance of observing such a difference (or a bigger one) between means by chance alone?

  32. Parametric andnon-parametric tests Parametric test – the variable we have measured in the sample is normally distributed in the population to which we plan to generalize our findings Non-parametric test – distribution free, no assumption about the distribution of the variable in the population

  33. Parametric andnon-parametric tests

More Related