1 / 85

Can I Believe It? Understanding Statistics in Published Literature

Can I Believe It? Understanding Statistics in Published Literature. Keira Robinson – MOH Biostatistics Trainee David Schmidt – HETI Rural and Remote Portfolio. Agenda. Welcome Understanding the context Data types Presenting data Common tests Tricks and hints Practice Wrap up.

adele
Download Presentation

Can I Believe It? Understanding Statistics in Published Literature

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Can I Believe It?Understanding Statistics in Published Literature Keira Robinson – MOH Biostatistics Trainee David Schmidt – HETI Rural and Remote Portfolio

  2. Agenda • Welcome • Understanding the context • Data types • Presenting data • Common tests • Tricks and hints • Practice • Wrap up

  3. Understanding statistics • Never consider statistics in isolation • Consider the rest of the article • Who was studied • What was measured • Why was that measure used • Where was the study completed • When was it done • It is the author’s role to convince you that their results can be believed!

  4. Types of Data

  5. Examples of data – Table 1Diamond et al. 2006

  6. Types of data • Numeric • Continuous (height, cholesterol) • Discrete (number of floors in a building) • Categorical • Binary (yes/no, ie born in Australia?) • Categorical (cancer type) • Ordinal categorical (cancer stage)

  7. Histograms • Represents continuous variables • Areas of the bars represent the frequency (count) or percent • Indicates the distribution of the data

  8. Measures of association

  9. Stem and leaf plot- heights 6* 11 6* 2 6* 3333333 6* 44444444444 6* 555555555555 6* 66666666666666666666666 6* 777777777777777777777777777777 6* 8888888888888888 6* 99999999999999999999999999999999 7* 0000000000000000000000000 7* 1111111111111111111 7* 222222222222 7* 333333 7* 44 7* 55

  10. Skewed Data

  11. Salient features- the mean • The average value:

  12. Salient features- the median • The observation in the middle • Example- newborn birth weights • 3100, 3100,3200,3300,3400,3500,3600,3650 g • (3300+3400)/2 = 3350 • Not affected by extreme values • Wastes information

  13. Salient features- the mean and median

  14. Mean and Median • Mean is preferable • Symmetric distributions mean ~ median • Present the Mean • Skewed distributions • Mean is pulled toward the ‘tail’ • Present the Median

  15. Mean and Median

  16. Variability – Standard deviation and variance • The average distance between the observations and the mean • Standard deviation : • with original units , ie. 0.3 % • Variance = • With the original units squared

  17. Range • Example, infant birth weight • 3100, 3100,3200,3300,3400,3500,3600,3650, 3800 • Range = (3100 to 3800) grams or 700 grams • Interquartile range: the range between the first and 3rd quartiles (Q1 and Q3) • 3100, 3100,3200,3300,3400,3500,3600,3650 , 3800 • IQR = (3200 to 3600) grams or 400 grams

  18. Presenting variability • Present standard deviation if the mean is used • Present Interquartile range if the median is used

  19. Graphics for Continuous Variables • Boxplot : outlier Maximum in Q3 75th percentile (Q3) IQR Median Minimum in Q1 25th percentile (Q1)

  20. Categorical Variables- table summaries

  21. Bar charts • Relative frequency for a categorical or discrete variable

  22. Bar chart vs Histogram • Histogram • For continuous variables • The area represents the frequency • Bars join together • Bar chart • For categorical variables • The height represents the frequency • The bars don’t join together

  23. Pie chart • Areas of “slices” represent the frequency

  24. Precision

  25. Presenting statistics • Tables should need no further explanation • Means • No more than one decimal place more than the original data • Standard deviations may need an extra decimal place • Percentages • Not more than one decimal place (sometimes no decimal place) • Sample size <100, decimal places are not necessary • If sample size <20, may need to report actual numbers

  26. Statistical Inference

  27. Sampling Inference Sampling

  28. Sampling, cont’d • A statistic that is used as an estimate of the population parameter. • Example: average parity Population Mean Sample Mean

  29. Confidence intervals • We are confident the true mean lies within a range of values • 95% Confidence Interval: We are 95% confident that the true mean lies within the range of values • If a study is repeated numerous times, we are confident the mean would contain the true mean 95% of the time • How does confidence interval change as the sample size increases?

  30. Confidence intervals cont’d

  31. Hypothesis testing • Is our sample of babies consistent with the Australian population with a known mean birth weight of 3500 grams? • Sample mean = 3800 grams, 95% CI of 3650 to 3950 grams • 3800 lies outside of this confidence interval range, indicating our sample mean is higher than the true Australian population

  32. Hypothesis testing • State a null hypothesis: • There is no difference between the sample mean and the true mean: Ho = 3500 • Calculate a test statistic from the data t = 2.65 • Report the p-value = 0.012

  33. What is a p-value? • The probability of obtaining the data, ie a mean weight of 3800 grams or greater if the null hypothesis is true • The smaller the p-value, the more evidence against the null hypothesis • < 0.0001 to 0.05 – evidence to reject the null hypothesis (statistically significant difference) • > 0.05 – evidence to accept the null hypothesis (not statistically significant)

  34. Summary – Confidence intervals and p values • P –value: Indicates statistical significance • Confidence interval: range of values for which we are 95% certain our true value lies • Recommended to present confidence intervals where possible

  35. Analysing Continuous Outcomes

  36. T tests • What are they used for? • Analyse means • Provide estimate of the difference in means between the two groups and the 95% confidence interval of this difference • P-value – a measure of the evidence against the null hypothesis of no difference between the two groups

  37. T tests- paired vs independent • Paired: • Outcome is measured on the same individual • Eg: before and after, cross-over trial • Pairs may be two different individuals who are matched on factors like age, sex etc.

  38. Paired T-tests • Calculate the difference for each of the pairs • The mean weight at baseline was 93 kg and the mean weight at 3 months was 88 kg. The weight at 3 months was 5 kg less compared to the baseline weight 95% CI (-3, 12)

  39. Paired T-tests • There was no evidence that there was a significant change in weight after 3 months (p value = 0.19) • Assumptions • Bell shaped curve with no outliers • Assess shape by graphing the difference • Use a histogram or stem and leaf plot

  40. Independent T tests • Two groups that are unrelated • Eg: weights of different groups of people

  41. Independent samples t-tests • Same assumption as for paired t tests plus the assumption of independence and equal variance

  42. Interpretation –independent t tests • The mean weight in NW Public was 62 kg and the mean weight in SW Public was 61 kg • The mean difference in weight between the two schools was 1 kg (-22, 24) • There was no evidence of a significant difference in weight between the two schools (p=0.92)

  43. One-way Analysis of Variance (ANOVA) • What happens when there are more than two groups to compare? • Null hypothesis: means for all groups are approximately equal • No way to measure the difference in means between more than two groups, so the variance between the groups is analysed • Can measure variance within a group as well as variance between groups

  44. One-way ANOVA • Comparing multiple groups

  45. Interpretations – One-way ANOVA • There was evidence of a difference between the average student weight between the four schools p<0.05 • There was evidence of no difference between the average student weight between the four schools p>0.05 • Not advised to compare all means against each other because there is an increased chance of finding at least 1 result that is significant the more tests that are done

  46. Assumptions ANOVA • Normality, - observations for all groups are normally distributed, • Variance in all groups are equal • Independence – all groups are independent of each other

  47. Extensions of one-way ANOVA • Two way-ANOVA: • Multiple factors to be considered. Eg school and type of school (public/private) • ANCOVA – Analysis of Covariance • Tests group differences while adjusting for a continuous variables (eg. age) and categorical variables

  48. Linear Regression • Measures the association between two continuous variables (weight and height) • Or one continuous variable and several continuous variables (mutliple linear regression) • What is the relationship between height and weight?

  49. Scatter plot of weight and height • Correlation between height and weight = 0.75

More Related