1 / 83

Introduction to Statistical Considerations in Experimental Research

Introduction to Statistical Considerations in Experimental Research. Dr . Kim Pearce a nd Dr. Simon Kometa. Introductions. Dr Kim Pearce Dr Simon Kometa. Today’s Session. Introductions to the people who can give you statistical guidance.

jewel
Download Presentation

Introduction to Statistical Considerations in Experimental Research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Statistical Considerations in Experimental Research Dr.Kim Pearce and Dr. Simon Kometa

  2. Introductions Dr Kim Pearce Dr Simon Kometa

  3. Today’s Session • Introductions to the people who can give you statistical guidance. • A taste of “popular” statistical techniques. • Statistical software for analysis. • Courses available to you.

  4. This is not a lecture!

  5. Who am I? • Why are we here today?

  6. Statistical tests – what’s the point? • To confirm that “patterns” we see in our data are not due to chance – i.e. the patterns would appear again if the study was repeated. • We collect data from a sample of the population and use these data to infer things about the whole population.

  7. The basics:What kind of data have you got? Variables Quantitative - counted or measured on a numerical scale Qualitative – non-numerical, classification into categories Continuous – measured on a scale e.g. height Discrete – counts, whole numbers e.g. number of patients Nominal –categories e.g. cause of death Ordinal – ordered categories e.g. level of pain

  8. Basic terminology Mean (more precisely, the arithmetic mean) is commonly called the average. The population mean is given the symbol µ. The sample mean is represented by. Median means middle, and the median is the middle of a set of data that has been put into rank order. Variance: determines the extent that observations deviate from the arithmetic mean. ~The larger the deviations, the larger the variability. ~Units are the square of the units of the original observations e.g.kg2 ~The population variance is represented by  σ2. The sample variance is represented by s2. Standard deviation (SD) is the square root of the variance (expressed in the same units as the data). The sample standard deviation is represented by s. 95% Confidence Interval “we are 95% confident that the true value of the statistic lies within this interval” e.g. a 95% confidence interval for the mean of (4.7, 4.9) means that we are 95% confident that the population mean lies between 4.7 and 4.9.

  9. Statistical Tests • Non Parametric Vs Parametric • What’s the Difference? • Parametric: Assumes Normality. • Non Parametric: Doesn’t Assume Normality.

  10. The basicsThe normal distribution • Let’s think about the heights of men living in Newcastle….. How many men would you expect to see who were of (i)very short, (ii) medium height and (iii) very tall height?

  11. A normal distribution LOTS! Count Not Many Not Many Lots Very Tall Very short Medium Height Men’s Heights in Newcastle

  12. Which parametric statistical tests are we going to discuss today • Independent Samples: • Two-sample t-test • One-way ANOVA • Two-way ANOVA (“factorial” design) • Related Samples: • Paired t-test • One-way Repeated Measures ANOVA • We are also going to touch upon: • Power and sample size

  13. Two-sample t-test

  14. Two-sample t-test • There is no basis for pairing the subjects in the two samples. • Each group is comprised of different subjects. Group 1 Group 2

  15. Two-sample t-test What are the problems that could occur when we do an experiment using, say, 2 different samples of people? One group could be older, more ill etc etc. That’s why we need to randomise!

  16. The two-sample t-test (Parametric Test) • Subjects (units) are usually randomly assigned to two groups. One of the groups undergoes experimental manipulation (e.g. has a treatment applied), the other group is the control. • In many examples, however, two groups are compared where membership is ‘fixed’ e.g. males vs females, left vs right handed etc. • We are testing if the two population means are equal.

  17. The two-sample t-test (Parametric Test) • The two-sample t-test statistic makes use of • the difference between the (average) value of group 1 and group 2, • the (pooled) standard deviation, and • the size of group 1 and group 2. (We do not have to have equal numbers in our groups) • We compare the value of the statistic to a statistical distribution. The significance of the statistic is obtained and is expressed by a ‘p value’. • When p value is < 0.05 we say that the statistic is statistically significant i.e. in this case, there is evidence that group1 is different to group 2 (in the population).

  18. Two-sample t-test • Dr Owen Money is interested in determining if a new drug helps to combat acid reflux. His patients are prone to this condition (especially after eating a heavy meal) and have agreed to take part in his experiment. • Dr Money wants to determine if patients given the drug have a different level of acid reflux after eating a heavy meal compared to patients given a placebo. • The drug and placebo are in identical tablet form. • How do we do it? • We need two groups of patients. • One group is given the placebo and one group is given the drug. • In Dr Money’s experiment, acid reflux is measured for each patient after eating.

  19. Two-sample t-test: An Example • A placebo is given to one group of patients and a drug is given to a different group of patients. The administration of the drug and placebo is done randomly. The study uses 12 people having the drug and 12 people having the placebo. Level of acid reflux after eating the heavy meal is measured for each person. Here’s the data: • Do the drug and placebo groups differ? • Let’s look at the data on a plot.

  20. The Boxplot

  21. Boxplot for the experiment

  22. Two-sample t-test: An Example P-value > 0.05. We say that there is no significant difference between the two groups as regards level of acid reflux.

  23. One extra thing…. • In reality, Dr Money’s experiment isn’t as simple as he thinks! • More complex analysis (analysis of covariance : ANCOVA) can be used which: • (i) “adjusts” for chance imbalance between the two groups of patients as regards baseline level of acid reflux, • (ii) reduces error variance making the test on "treatment" more powerful. Analysing controlled trials with baseline and follow up measurements AJ Vickers, DG Altman BMJ 2001 Nov 10; 323(7321): 1123–1124.

  24. Two independent samples • Parametric test to examine if there is a difference between the two groups (continuous data): • 2 sample t-test

  25. Two independent samples • Non-Parametric test to examine if there is a difference between the two groups (ordinal or continuous data): • Mann-Whitney test

  26. Power and Sample Size

  27. What is the power of this test? • We would like our test to have high power which means that the test is likely to detect a difference we care about when it truly exists.

  28. What influences the power of a test? • As variation in the sample increases, power decreases. • As the difference we care about decreases, power decreases. • As sample size decreases, power decreases.

  29. Prospective Power Analysis Vs Retrospective Power Analysis

  30. Prospective Power Analysis (used before collecting data) • Finding a sample size to detect an effect size we care about at a specific power. • Usually need to specify: • Alpha level • Variance (from literature or pilot data) • Statistical power • Effect size we care about* *Effect size could be, for example, the difference between the means

  31. Retrospective Power Analysis (after test has been done on collected data and you obtain a non significant result): controversial! • Finding the power of the test that you have performed to detect “an effect size”. • Usually need to specify: • Alpha level • Variance (from data) • Sample Size • Effect size

  32. Retrospective Power Analysis • You could: calculate power based on effect size you observe in your data: not recommended…… Power calculated in this way is related to the p value of the test and both are dependent on the observed effect size. - Non significant test tends to have low power; - Significant test tends to have high power.

  33. Retrospective Power Analysis • Calculate power based on effect size you care about. Less controversial. For example, say we get a non significant test…we can work out the power that your test has to detect an effect size that you care about. If test has a low power to detect this effect size then you can do something about it (e.g. collect more data) to increase the power, then continue to evaluate the same problem; if test has high power to detect this effect size, then you may conclude that there is no meaningful difference (effect) and refrain from collecting additional data. Suggested that you also report 95% confidence interval for power (as variance is estimated from sample data). • Which effect size should I choose? Look at a range of effect sizes. • Can also use ‘reverse power analysis’ : determine effect size detectable with a certain power…question could be ‘what effect size am I able to detect with my data at power 0.8?’

  34. Retrospective Power Analysis • Calculate confidence intervals about the effect size calculated from your data –recommended. For example, if dealing with differences between two means, we can be 95% confident that the true difference between the means (in the population) lie within this interval. • We ask ourselves : does the ‘difference we care about’ lie in this interval? If it does not, we can be confident that there is no important difference between the two groups (in the population). • Confidence intervals ‘quantify our uncertainty’.

  35. Power of Dr. Money’s Test(restrospective) • Dr. Money’s test is non significant. • Can we rely on the results of this analysis or has he has used too few patients? • Dr. Money wants the test to be able to detect a difference of 20 between the averages of acid reflux in the two groups. • if the test’s power to detect this difference is low, he may want to modify the experiment by sampling more patients to increase the power and re-evaluate the formulations. • if the test’s power is high, he may conclude that the placebo and drug are not different, and refrain from collecting additional data.

  36. Power of Dr.Money’s Test The test has high power (0.90) to detect the difference Dr. Money cares about and so he may conclude that there is no meaningful difference between the placebo and drug.

  37. Let’s see how I did a power calculation!

  38. Minitab

  39. Minitab Pooled standard deviation. As this is a “retrospective” power analysis, this can be estimated from our experiment (if this had been a prospective study, an estimate could be obtained from related research, pilot studies or subject-matter knowledge).

  40. Power CalculationsAnother Package:“SAS” proc power; twosamplemeans meandiff = [value] stddev = [value] groupweights = [value] | [value] ntotal = [value] power = . ; run;

  41. Power CalculationsAnother Package:“SAS” TYPE TEXT HERE CHOOSE RUN>SUBMIT

  42. Power CalculationsAnother Package:“SAS”

  43. Power CalculationsAnother Package:“SAS”

  44. Retrospective Power Analysis • References • Hoenig, J.M. and Heisey, D.M. (2001). The abuse of power: The pervasive fallacy of power calculations for data analysis. The American Statistician 55, 19--24. • Thomas, L. (1997). Retrospective power analysis. Conservation Biology 11, 276-280. • Lenth, R.V. (2001). Some Practical Guidelines for Effective Sample Size Determination. The American Statistician, 55, No. 3, 187-193.

  45. More than Two Independent Samples : Undeterred by the last result, Dr Owen Money now spends much time and energy creating 3 new drugs he hopes will combat acid reflux. PlaceboDrug A Drug B Drug C • Each group is comprised of different subjects. • He needs to employ randomisation to reduce the risk of any (unknown) variation influencing the experiment. • Acid reflux level is recorded for each subject after ingesting their assigned tablet and after eating a heavy meal.

  46. More than Two Independent Samples Test:one-way ANOVA : An Example P-value<0.05. There is evidence that there is a difference between the groups as regards acid reflux.

  47. More than Two Independent Samples Test:one-way ANOVAWhich pairs of groups are different? Hurrah! • Placebo and drug A • Placebo and drug B • Drug B and drug C

  48. More than Two Independent Samples Test • Parametric test to examine if there is a difference between the groups (continuous data) • one-way ANOVA (also called a ‘completely randomised’ experiment).

  49. More than Two Independent Samples Test • Non-Parametric test to examine if there is a difference between the groups (ordinal or continuous data): • Kruskal-Wallis test

  50. Adding a 2nd Factor: Two-way ANOVA • In a two–way ANOVA we have 2 factors. Experiments such as this with two or more crossed factors are called factorial experiments. • In this example, we have n replicates per treatment combination (10 replicates). There are 10 different people per treatment combination. • The subjects (units) are considered homogeneous above & these units are randomly assigned to the 6 experimental conditions (combinations). • Driving performance is measured for each subject. • Here the 2 factors are ‘alertness’ and ‘drug’ type – by testing, we can establish if there are differences between (i) levels of alertness and (ii) levels of drug and (iii) establish if there is an alertness x drug interaction.

More Related