850 likes | 1.01k Views
Can I Believe It? Understanding Statistics in Published Literature. Keira Robinson – MOH Biostatistics Trainee David Schmidt – HETI Rural and Remote Portfolio. Agenda. Welcome Understanding the context Data types Presenting data Common tests Tricks and hints Practice Wrap up.
E N D
Can I Believe It?Understanding Statistics in Published Literature Keira Robinson – MOH Biostatistics Trainee David Schmidt – HETI Rural and Remote Portfolio
Agenda • Welcome • Understanding the context • Data types • Presenting data • Common tests • Tricks and hints • Practice • Wrap up
Understanding statistics • Never consider statistics in isolation • Consider the rest of the article • Who was studied • What was measured • Why was that measure used • Where was the study completed • When was it done • It is the author’s role to convince you that their results can be believed!
Types of data • Numeric • Continuous (height, cholesterol) • Discrete (number of floors in a building) • Categorical • Binary (yes/no, ie born in Australia?) • Categorical (cancer type) • Ordinal categorical (cancer stage)
Histograms • Represents continuous variables • Areas of the bars represent the frequency (count) or percent • Indicates the distribution of the data
Stem and leaf plot- heights 6* 11 6* 2 6* 3333333 6* 44444444444 6* 555555555555 6* 66666666666666666666666 6* 777777777777777777777777777777 6* 8888888888888888 6* 99999999999999999999999999999999 7* 0000000000000000000000000 7* 1111111111111111111 7* 222222222222 7* 333333 7* 44 7* 55
Salient features- the mean • The average value:
Salient features- the median • The observation in the middle • Example- newborn birth weights • 3100, 3100,3200,3300,3400,3500,3600,3650 g • (3300+3400)/2 = 3350 • Not affected by extreme values • Wastes information
Mean and Median • Mean is preferable • Symmetric distributions mean ~ median • Present the Mean • Skewed distributions • Mean is pulled toward the ‘tail’ • Present the Median
Variability – Standard deviation and variance • The average distance between the observations and the mean • Standard deviation : • with original units , ie. 0.3 % • Variance = • With the original units squared
Range • Example, infant birth weight • 3100, 3100,3200,3300,3400,3500,3600,3650, 3800 • Range = (3100 to 3800) grams or 700 grams • Interquartile range: the range between the first and 3rd quartiles (Q1 and Q3) • 3100, 3100,3200,3300,3400,3500,3600,3650 , 3800 • IQR = (3200 to 3600) grams or 400 grams
Presenting variability • Present standard deviation if the mean is used • Present Interquartile range if the median is used
Graphics for Continuous Variables • Boxplot : outlier Maximum in Q3 75th percentile (Q3) IQR Median Minimum in Q1 25th percentile (Q1)
Bar charts • Relative frequency for a categorical or discrete variable
Bar chart vs Histogram • Histogram • For continuous variables • The area represents the frequency • Bars join together • Bar chart • For categorical variables • The height represents the frequency • The bars don’t join together
Pie chart • Areas of “slices” represent the frequency
Presenting statistics • Tables should need no further explanation • Means • No more than one decimal place more than the original data • Standard deviations may need an extra decimal place • Percentages • Not more than one decimal place (sometimes no decimal place) • Sample size <100, decimal places are not necessary • If sample size <20, may need to report actual numbers
Sampling Inference Sampling
Sampling, cont’d • A statistic that is used as an estimate of the population parameter. • Example: average parity Population Mean Sample Mean
Confidence intervals • We are confident the true mean lies within a range of values • 95% Confidence Interval: We are 95% confident that the true mean lies within the range of values • If a study is repeated numerous times, we are confident the mean would contain the true mean 95% of the time • How does confidence interval change as the sample size increases?
Hypothesis testing • Is our sample of babies consistent with the Australian population with a known mean birth weight of 3500 grams? • Sample mean = 3800 grams, 95% CI of 3650 to 3950 grams • 3800 lies outside of this confidence interval range, indicating our sample mean is higher than the true Australian population
Hypothesis testing • State a null hypothesis: • There is no difference between the sample mean and the true mean: Ho = 3500 • Calculate a test statistic from the data t = 2.65 • Report the p-value = 0.012
What is a p-value? • The probability of obtaining the data, ie a mean weight of 3800 grams or greater if the null hypothesis is true • The smaller the p-value, the more evidence against the null hypothesis • < 0.0001 to 0.05 – evidence to reject the null hypothesis (statistically significant difference) • > 0.05 – evidence to accept the null hypothesis (not statistically significant)
Summary – Confidence intervals and p values • P –value: Indicates statistical significance • Confidence interval: range of values for which we are 95% certain our true value lies • Recommended to present confidence intervals where possible
T tests • What are they used for? • Analyse means • Provide estimate of the difference in means between the two groups and the 95% confidence interval of this difference • P-value – a measure of the evidence against the null hypothesis of no difference between the two groups
T tests- paired vs independent • Paired: • Outcome is measured on the same individual • Eg: before and after, cross-over trial • Pairs may be two different individuals who are matched on factors like age, sex etc.
Paired T-tests • Calculate the difference for each of the pairs • The mean weight at baseline was 93 kg and the mean weight at 3 months was 88 kg. The weight at 3 months was 5 kg less compared to the baseline weight 95% CI (-3, 12)
Paired T-tests • There was no evidence that there was a significant change in weight after 3 months (p value = 0.19) • Assumptions • Bell shaped curve with no outliers • Assess shape by graphing the difference • Use a histogram or stem and leaf plot
Independent T tests • Two groups that are unrelated • Eg: weights of different groups of people
Independent samples t-tests • Same assumption as for paired t tests plus the assumption of independence and equal variance
Interpretation –independent t tests • The mean weight in NW Public was 62 kg and the mean weight in SW Public was 61 kg • The mean difference in weight between the two schools was 1 kg (-22, 24) • There was no evidence of a significant difference in weight between the two schools (p=0.92)
One-way Analysis of Variance (ANOVA) • What happens when there are more than two groups to compare? • Null hypothesis: means for all groups are approximately equal • No way to measure the difference in means between more than two groups, so the variance between the groups is analysed • Can measure variance within a group as well as variance between groups
One-way ANOVA • Comparing multiple groups
Interpretations – One-way ANOVA • There was evidence of a difference between the average student weight between the four schools p<0.05 • There was evidence of no difference between the average student weight between the four schools p>0.05 • Not advised to compare all means against each other because there is an increased chance of finding at least 1 result that is significant the more tests that are done
Assumptions ANOVA • Normality, - observations for all groups are normally distributed, • Variance in all groups are equal • Independence – all groups are independent of each other
Extensions of one-way ANOVA • Two way-ANOVA: • Multiple factors to be considered. Eg school and type of school (public/private) • ANCOVA – Analysis of Covariance • Tests group differences while adjusting for a continuous variables (eg. age) and categorical variables
Linear Regression • Measures the association between two continuous variables (weight and height) • Or one continuous variable and several continuous variables (mutliple linear regression) • What is the relationship between height and weight?
Scatter plot of weight and height • Correlation between height and weight = 0.75