Comprehensive Guide to Statistics for Non-Statisticians: An Essential Overview for Beginners

Statistics for non-statisticians Marco Pavesi Lead Statistician Liver Unit – Hospital Clínic i Provincial Ferran Torres Statistics and Methodology Support Unit. Hospital Clínic Barcelona Biostatistics Unit. School of Medicine. Universitat Autònoma Barcelona (UAB)

Outline Why Statistics? Descriptive Statistics. Populations and Samples. Type of errors Inferential Statistics. Hypothesis testing Statistical errors p-value Confidence Intervals Multiplicity issues. Type of tests. Sample size Multivariate analysis. More on p-values Conclusion: “little shop of horrors”

Intro. Why should we learn statistics ?

Inducción y Verdad Bertrand Russell presents… The inductivist turkey

Troubles for the plain researchers: Induction and statistics ARE NOT a method to get a sort of mathematical demonstration of Truth The results observed for a population sample are not necessarily true for the whole population

Smart turkeys / researchers… • …are aware that the relevance (weight) of statistical inferences always depends on the sample size

Smart turkeys / researchers… • …do know that we can only model /estimate the real world with a specific approximation error.

Smart turkeys / researchers… • …understand that true hipotheses do not exist, and we can only reject or keep a hypothesis based on the available evidence

What is statistics ? • “I know (I’m making the assumption) that these dice are fair: what is the probability of always getting a 1 in 15 runs?“ ==> Probability mathematics • “I have got always a 1 in 15 runs. Are these dice fair ?” ==> Inferential STATISTICS

So, why statistics? To account for chance & variability!

Why is Statistics needed? Statistics tells us whether events are likely to have happened simply by chance Statistics is needed because we always work with sample observation (variability) and never with populations Statistics is the only mean to predict what is more likely to happen in new situations and helps us to make decisions

Introduction to descriptive statistics

Population and Samples Sample Study Population Target Population

Random vs Sistematic error True Value 130 150 170 01 02 03 04 05 Example: Systolic Blood Pressure (mm Hg) Systematic (Bias) Random True Value 130 150 170 01 05 02 03 04

What Statistics? • Descriptive Statistics • Position statistics (central tendency measures): mean, median • Dispersion statistics: variance, standard deviation, standard error • Shape statistics: symmetry, skewness and kurtosis measures.

The mean and the median 1,3,3,4,6,13,14,14,18  6 1,3,3,4,6,13,14,14,17,18  6 - 13 Median=(6+13)/2=9.5 Arithmetic mean (average): Median: (50% of sample individuals have a value higher than or equal to the median) • Unlikely the median, the mean is affected by outliers • Especially relevant for specific distributions (survival times) Mean 1 Mean 2 New outlier Median 1 Median 2

Dispersion measures The Variance is the mean of squared differences from the distribution mean: • The Standard Deviation is the square root of the Variance: • The Standard Error is generally expressed as the ratio between the Variance and the sample size: • It is considered as the true SD of the population mean (or parameter) SE = σ2 / N

Inference & tests • Inferential Statistics • Draw conclusions (inferences) from incomplete (sample) data. • Allow us to make predictions about the target population based on the results observed in the sample • Are computed in hypothesis testing • Examples • 95%CI’, t-test, chi square test, ANOVA, regression

Basic pattern of statistical tests Based on the total number of observations and the size of the test statistic, one can determine the P value.

How many noise units? Test statistic & sample size (degrees of freedom) convert to a probability or P Value.

Overall hypothesis testing flow chart Test Statistics value Corresponding P-value (from known distribution) Comparison with significance level (previously defined) P < α P >= α Reject null hypothesis Keep null hypothesis

Introduction to inferential statistics

The role of statistics “Thus statistical methods are no substitute for common sense and objectivity. They should never aim to confuse the reader, but instead should be a major contributor to the clarity of a scientific argument.” The role of statistics. Pocock SJ . Br J Psychiat 1980; 137:188-190 23

Extrapolation Study Results Sample Inferential analysis Statistical Tests Confidence Intervals Population “Conclusions”

Statistical Inference Statistical Tests=> p-value Confidence Intervals 25

Valid samples? Population Likely to occur Invalid Sample and Conclusions Unlikely to occur

P-value The p-value is a “tool” to answer the question: Could the observed results have occurred by chance*? Remember: Decision given the observed results in a SAMPLE Extrapolating results to POPULATION *: accounts exclusively for the random error, not bias p < .05 “statistically significant” 27

A intuitive definition • The p-value is the probability of having observed our data when the null hypothesis is true • Steps: • Calculate the treatment differences in the sample (A-B) • Assume that both treatments are equal (A=B) and then… • …calculate the probability of obtaining a magnitude of at least the observed differences, given the assumption 2 • We conclude according the probability: • p<0.05: the differences are unlikely to be explained by random, • we assume that the treatment explains the differences • p>0.05: the differences could be explained by random, • we assume that random explains the differences

HYPOTHESIS TESTING • Testing two hypotheses • H0: A=B (Null hypothesis – no difference) • H1: A≠B (Alternative hypothesis) • Calculate test statistic based on the assumption that H0 is true (i.e. there is no real difference) • Test will give us a p-value: how likely are the collected data if H0 is true • If this is unlikely (small p-value), we reject H0

RCT from a statistical point of view Treatment A Randomisation Treatment B (control) 1 homogeneous population 2 distinct populations

RCT Sample Population

Statistical significance/Confidence ? A>B p<0.05 means: “I can conclude that the higher values observed with treatment A vs treatment B are linked to the treatment rather to chance, with a risk of error of less than 5%”

Factors influencing statistical significance • Difference • Variance (SD) • Quantity of data Signal Noise(background) Quantity

P-value • A “very low” p-value do NOT imply: • Clinical relevance (NO!!!) • Magnitude of the treatment effect (NO!!) With n or variability  p • Please never compare p-values!! (NO!!!)

P-value STAT REPORT A “statistically significant” result (p<.05) tells us NOTHING about clinical or scientific importance. Only, that the results were not due to chance. A p-value does NOT account for bias only by random error

THE BASIC IDEA Statistics can never PROVE anything beyond any doubt, just beyond reasonable doubt!! … because of working with samples and random error

Type I & II Error & Power

Type I & II Error & Power • Type I Error (a) • False positive • Rejecting the null hypothesis when in fact it is true • Standard: a=0.05 • In words, chance of finding statistical significance when in fact there truly was no effect • Type II Error (b) • False negative • Accepting the null hypothesis when in fact alternative is true • Standard: b=0.20 or 0.10 • In words, chance of not finding statistical significance when in fact there was an effect

Type I & II Error & Power • Power • 1-Type II Error (b) • Usually in percentage: 80% or 90% (for b =0.1 or 0.2, respectively) • In words, chance of finding statistical significance when in fact there is an effect

95%CI • Better than p-values… • …use the data collected in the trial to give an estimate of the treatment effect size, together with a measure of how certain we are of our estimate • CI is a range of values within which the “true” treatment effect is believed to be found, with a given level of confidence. • 95% CI is a range of values within which the ‘true’ treatment effect will lie 95% of the time • Generally, 95% CI is calculated as • Sample Estimate ± 1.96 x Standard Error

Interval Estimation A probability that the population parameter falls somewhere within the interval. Sample statistic (point estimate) Confidence interval Confidence limit (lower) Confidence limit (upper)

Superiority study Control better Test better IC95% d < 0 - effect d = 0 No differences d > 0 + effect

Multiplicity

Lancet 2005; 365: 1591–95 • To say it colloquially, • torture the data until they speak... 45

Torturing data… Investigators examineadditional endpoints, manipulate group comparisons, do manysubgroup analyses, and undertakerepeated interim analyses. Investigators shouldreport all analytical comparisons implemented. Unfortunately, they sometimes hide the complete analysis, handicapping the reader’s understanding of the results. Lancet 2005; 365: 1591–95 46

Design Conduction Results 47

Multiplicity K independent hypothesis : H01 , H02 , ... , H0K S significant results ( p<a ) Pr (S  1 | H01  H02  ...  H0K = H0.) = 1 - Pr (S=0|H0.) = 1- (1 - a)K 48

Sources of multiplicity in RCT Multiple assessment criteria (variables) Multiple times of assessment (repeated measurements) Multiple inspections (interim analyses) Multiple comparisons (more than two treatments) Multiple subsets and subgroups 49

Same examples 50

Comprehensive Guide to Statistics for Non-Statisticians: An Essential Overview for Beginners