Understanding Inferential Statistics: Learn About Making Inferences from Samples. Discover Sampling Distributions, S

Inferential statistics

In inferential statistics • Data from samples are used to make inferences about populations • Researchers can make generalizations about an entire population based on a smaller number of observations • However, the sample means will not all be the same when repeated random samples are taken from a population Evidence-based Chiropractic

Sampling distributions • If many different samples were taken from a population, it would produce a distribution of sample means • If repeated enough times, the distribution would take on a normal shape • Even if the underlying population is not normal • If repeated an infinite number of times, it would be called a sampling distribution Evidence-based Chiropractic

Sampling distributions (cont.) • Which of the sample means is truly the population mean? • It would be useful to know, but an exact figure is not possible • The population mean can be inferred from the sample • The sample mean is an estimate • Referred to as the point estimate Evidence-based Chiropractic

Sampling distributions (cont.) • Because sampling distributions are normal, the properties of the normal distribution can be used • e.g., the 68.3, 95.5, 99.7 proportion of the area under the curve Evidence-based Chiropractic

Standard error of the mean (SEm) • The spread of means around the mean of a sampling distribution • Can be estimated from the sample • SEm is calculated by dividing the SD of the sample by the square root of the number of units in the sample Evidence-based Chiropractic

SEm (cont.) • SEm is higher when • The sample’s SD is large or • The sample size is small • Lower when • SD is a small or • The sample size is large • A small SEm is preferable because generalizations are more precise Evidence-based Chiropractic

Confidence Intervals (CIs) • A CI is a range of values that is likely to contain the population parameter that is being estimated (e.g., the mean) • The probability that this range of values contains the population parameter is typically 95% • Thus, the 95% confidence interval Evidence-based Chiropractic

Confidence Intervals (CIs) -3 -2 -1 0 +1 +2 +3 Evidence-based Chiropractic

CIs (cont.) • One can have 95% confidence that the value of the true mean lies within the calculated interval (i.e., 95% CI) Evidence-based Chiropractic

Calculating a 95% CI • Find the z-score (using a z-table) that corresponds to the area under the distribution that includes 95% of all values (e.g., z = ±1.96 for a 95% CI) • Multiply the z-scores by the SEm • Add the product to the sample mean to find the upper limit of the CI and subtract to find the lower limit Evidence-based Chiropractic

Size (width) of CIs • The size of the CI is related to the size of the sample and the size of the data variation • Small samples & large variation = larger CIs • Large samples & small variation = smaller CIs Evidence-based Chiropractic

Hypothesis testing • A hypothesis is an assumption that appears to explain certain events, which must be tested to see whether it is true • Research hypothesis • a.k.a., alternative hypothesis • Denoted H1 • The research hypothesis is not tested directly • Instead the null hypothesis (H0) is tested Evidence-based Chiropractic

Hypothesis tests • Depending on the outcome of the test of H0, there is either support for or against the research hypothesis • Hypothesis testing involves the comparison of the means of groups in an experiment • The objective is to find out whether they are significantly different from each other Evidence-based Chiropractic

Hypothesis tests (cont.) • When comparing the means of an active treatment group and a control group, one looks for a difference • The treatment may produce a better outcome leading to a higher mean than the control group • The difference may appear real, but it may be due to chance • Statistical tests verify if it is real Evidence-based Chiropractic

The null hypothesis • H0 states that there is no difference between the group means • H1 is accepted only if the null hypothesis proves to be unlikely • Typically it must be at least 95% unlikely • If H0 is unlikely, it is rejected • Not unlike the innocent until proven guilty concept in our legal system Evidence-based Chiropractic

A hypothetical neck pain study • Patients are treated with chiropractic vs. usual medical care • Outcome measure is the Neck Disability Index (NDI) • H1 • Chiropractic patients will have lower mean NDI scores after treatment • H0 • There is no difference between mean NDI scores Evidence-based Chiropractic

Hypothetical study (cont.) • Results • Mean NDI scores of chiropractic patients • 28 before, 10 after treatment • Mean NDI scores medical patients • 29 before, 15 after treatment • Chiropractic care appears to be better • But is there enough difference to rule out chance • Must perform statistical tests to find out Evidence-based Chiropractic

Hypothetical study (cont.) ChiropracticMedical 30 20 10 0 Is this difference enough to be meaningful? NDI score Baseline Outcome Evidence-based Chiropractic

Statistical significance • The results of a study (i.e., the difference between groups) are unlikely to be due to chance • At a specified probability level, referred to as alpha () •  is the probability of incorrectly rejecting a null hypothesis • If the results are not due to chance, H0 is rejected and H1 is accepted Evidence-based Chiropractic

Statistical significance (cont.) • It must be at least 95% unlikely that H0 is true before it can be rejected • There is still a 5% chance that H0 would be rejected, when it was actually true • Accordingly, P values must be equal to or less than 5% in order for the results of a study to reach a level of statistical significance Evidence-based Chiropractic

Statistical significance (cont.) • The level of significance (alpha level) is not the same as the P value • The alpha level must be set before the study begins • The P value is calculated at the completion of the study and must be ≤ to the alpha level in order to reach statistical significance Evidence-based Chiropractic

Statistical significance (cont.) • Even when studies are not statistically significant, there is a 1:20 chance that significant results would occur if the study was repeated 20 times • Fishing • When researchers perform a lot of statistical tests on their data • Increases the chance that at least one of the tests will wrongly reach statistical significance Evidence-based Chiropractic

Type I & II errors • Type I error (a.k.a., alpha error) • Rejecting a true null hypothesis • The probability of making a Type I error is equal to the value of α • Type II error (a.k.a., betaerror ) • Failure to reject a false null hypothesis • The probability of making a Type II error is equal to the value of beta () Evidence-based Chiropractic

Type I & II errors (cont.) Consequences of accepting or rejecting true and false null hypotheses Evidence-based Chiropractic

Type I & II errors (cont.) • There is a trade-off between the likelihood of a study resulting in a Type I error versus a Type II error • As alpha becomes smaller, the chance of making a Type I error decreases • Whereas the chance of making a Type II error increases • Because it is more likely that a false H0 will not be rejected Evidence-based Chiropractic

Type I & II errors (cont.) The 0.05 alpha level is a compromise between Type I and Type II errors Evidence-based Chiropractic

Power • The probability of correctly rejecting a false H0 • Related to  error • Power is equal to 1- • Power depends on sample size, the magnitude of the difference between group means, and the value of α Evidence-based Chiropractic

Power (cont.) • Power increases as • Sample size increases • Only to a certain extent, then it becomes a waste of resources • The difference between group means increases • α increases • A power value of 0.80 is often sought by researchers Evidence-based Chiropractic

Power (cont.) • Power may be calculated after a study has been completed (post hoc) • If low power is detected during post hoc power analysis and H0 was not rejected, it may be grounds to repeat the study using a larger sample Evidence-based Chiropractic

Confidence intervals and hypothesis testing • If the value specified as the difference between group means in the null hypothesis is included in the 95% CI, then H0 should not be rejected • The test is not statistically significant • H0 states there is no difference between group means, so the specified no difference value is always zero Evidence-based Chiropractic

CIs and hypothesis testing (cont.) • If zero is not included in the 95% CI, the null hypothesis should be rejected • The test is statistically significant • CIs are becoming more prevalent in the health care literature because they convey more information than P values alone Evidence-based Chiropractic

CIs and hypothesis testing (cont.) • Example study • Brinkhaus et al. • Acupuncture was more effective in improving pain on VAS* than no acupuncture in chronic low back pain patients • Difference, 21.7 mm (95% CI 13.9 to 30.0) • But no statistical difference between acupuncture and minimal acupuncture • Difference, 5.1 mm (95% CI -3.7 to 13.9) * Visual analog scale Evidence-based Chiropractic

Clinical significance a.k.a., practical significance • Do the findings of a study really matter in clinical situations • Sometimes a study is statistically significant, but the findings are not important in clinical terms • Large studies with small differences between groups can generate statistically significant findings that are not meaningful to practitioners Evidence-based Chiropractic

Clinical significance (cont.) • For example • A study found a statistically significant difference between mean Headache Disability Inventory (HDI) scores of only 10 points • Yet at least a 29-point change must occur from test to retest before the changes can be attributed to a patient’s treatment • The HDI is not very responsive to change Evidence-based Chiropractic

Commonly encountered statistical tests • Statistical tests determine the probabilities associated with relationships in studies • Are the results real or merely due to chance? • t-test, ANOVA, and chi-square are common in journal articles • Familiarity with these tests is helpful in the appraisal of articles Evidence-based Chiropractic

t-test • Used to find out whether the means of two groups are statistically different • Results are not entirely black-and-white • Only indicates that the means are probably different • Or, that they are probably the same, if the study fails to find a difference • The t-test can be used for a single group by comparing the mean with known values Evidence-based Chiropractic

t-test (cont.) • The actual differences between means is considered • Also the amount of variability of the scores • A high degree of variability of group scores can obscure the differences between means Evidence-based Chiropractic

t-test (cont.) • The differences between means are the same in both examples, but the variability of group scores differs • The lower example would be much more likely to reach statistical significance because of the narrow spread Evidence-based Chiropractic

Assumptions of the t-test • The data should be normal and involve interval or ratio measurement • Groups should be independent • The variances of groups should be equal • When the sample size is large enough (about 30 subjects) violations of these assumptions are less important Evidence-based Chiropractic

Alternatives to the t-test • The t-test for unequal variances • Non-parametric tests for use with skewed data • Mann-Whitney U test • Wilcoxon test Evidence-based Chiropractic

The t-score • The t-score (a.k.a., t-ratio) is similar to the z-score • However, the t-distribution and a t-table are used • This is because the SD of the population is estimated from the sample, whereas it is known in the z-distribution • P values are found using the calculated t-score and a t-table Evidence-based Chiropractic

The t-score (cont.) • t-tables consider the number of subjects in the groups • Referred to as degrees of freedom (df) • Signifies the number of subjects in each group minus 1 • Minus 2 when there are two groups • Thus, a study that compares the means of 2 groups that involve 30 subjects has 28 df Evidence-based Chiropractic

The t-table • t-distributions eventually become nearly normal when many subjects are included • As a result, t-tables usually only go to 100 df • Alpha levels are shown for • When α is all in one tail (α1 or one-tailed test ) • When α is spit between the tails (α2 or two-tailed test) Evidence-based Chiropractic

Critical value for 10 df and α2 = 0.05 To 100 Evidence-based Chiropractic

One-tailed test vs. two-tailed test • One-tailed test (a.k.a., directional test) • Alpha is all in one tail • The researcher specifies the direction the test results will go before the data analysis • Either higher or lower • Two-tailed test (a.k.a., non-directional test) • Alpha is split between the tails • The study’s results could go either way Evidence-based Chiropractic

One-tailed test vs. two-tailed test (cont.) • In a non-directional test, the researcher wants to know if the means are different • For example, in a study comparing manipulation with acupuncture for tension headaches, the results could go either way • That is the case with almost all studies that compare treatments Evidence-based Chiropractic

One-tailed test vs. two-tailed test (cont.) • It is easier to reach statistical significance using a directional test • Consequently it is tempting for researchers to use directional hypotheses • The opposite direction must be of no interest to the researcher • But it is almost always possible for the test to go either way when comparing treatments Evidence-based Chiropractic

Calculating the t-score • Is a ratio of the difference between group means and the variability of the data • Variability is represented by the standard error of the difference ( ) rather than the SD • Thus or Evidence-based Chiropractic

The t-score • For the t-test result to be statistically significant • The difference between the means must be large (the numerator) • And the variability of the data must be small (the denominator) • This results in a t-score that is larger than the critical value of t in the t-table Evidence-based Chiropractic

Understanding Inferential Statistics: Learn About Making Inferences from Samples. Discover Sampling Distributions, S