860 likes | 1.1k Views
Nuts and bolts of biostatistics. A primer of biomedical statistics for cardiologists. What to expect today?. CORE MODULES Principles of biostatistics - Descriptive statistics Inferential statistics I - I - Basic concepts, α & β errors, sample size Inferential statistics II -
E N D
Nuts and bolts of biostatistics A primer of biomedical statistics for cardiologists
What to expect today? • CORE MODULES • Principles of biostatistics - Descriptive statistics • Inferential statistics I - • I - Basic concepts, α & β errors, sample size • Inferential statistics II - • II - Parametric and non-parametric hypothesis tests • Linear, logistic, univariate and multivariate regression • Survival analysis
Science or fiction? There are lies, damn lies, and statistics B. Disraeli Knowledge is the process of piling up facts, wisdom lies in their simplification M. Fisher
What is statistics? • DEFINITIONS • A whole subject or discipline • A collection of methods • Collections of data • Specially calculated figures
Statistics is great Find stuff out • Finding stuff out is fun • Feel like you have done something • It’s small, but it’s something Understand stuff • When we are being deceived • Support, or illumination?
Methods of inquiry Statistical inquiry may be… Descriptive (to summarize or describe an observation) or Inferential (to use the observations to make estimates or predictions)
Inferential statistics If I become a scaffolder, how likely I am to eat well every day? Confidence Intervals P values
Samples and populations This is a sample
Samples and populations And this is its universal population
Samples and populations This is another sample
Samples and populations And this might be its universal population
Samples and populations But what if THIS is its universal population?
Samples and populations Any inference thus depend on our confidence in its likelihood
Precision and accuracy Thus Precision expresses the extent of RANDOM ERROR Accuracy expresses the extent of SYSTEMATIC ERROR (ie bias)
Bias Bias is a systematic DEVIATION from the TRUTH -in itself it cannot be ever recognized -there is a need for external gold standard and/or permanent surveillance
Validity Internal validity entails both PRECISION and ACCURACY (ie does a study provide a truthful answer to the research question) External validity expresses the extent to which the results can be applied to other contexts and settings. It corresponds to the distinction between SAMPLE and POPULATION)
Inferential statistics If I become a scaffolder, how likely I am to eat well every day? Confidence Intervals P values
Significance testing Significance hypothesis testing is based on the estimate of HOW LIKELY a specified hypothesis may EXPLAIN the data found in any specific study It is based on an explicit distinction between a NULL HYPOTHESIS and 1 or more ALTERNATIVE hypotheses
Significance testing eg X0 - NULL HYPOTHESIS Diabetics have the same risk of restenosis after coronary stenting in comparison to non-diabetics XA - ALTERNATIVE HYPOTHESIS The risk of restenosis is different between diabetics and non-diabetics
Significance testing Any significance hypothesis test must provide me with a PROBABILITY (termed p) that measures the LIKELIHOOD that the data I have found can be explained by RANDOM SAMPLING from a population where the null hypothesis is true Eg if I compare 5 diabetics and 5 non-diabetics random-sampling error may mask any underlying differences, leading me NOT TO REJECT HO
Alpha and type I error Whenever I perform a test, there is a risk of a FALSE POSITIVE result, ie REJECTING A TRUE null hypothesis This error is called type I, is measured as alpha and its unit is the p value The lower the p value, the lower the risk of falling into a type I error (ie the HIGHER the SPECIFICITY of the test)
Beta and type II error Whenever I perform a test, there is also a risk of a FALSE NEGATIVE result, ie NOT REJECTING A FALSE null hypothesis This error is called type II, is measured as beta and its unit is a probability The complementary of beta is called power The lower the beta, the lower the risk of missing a true difference (ie the HIGHER the SENSITIVITY of the test)
Ps and confidence intervals P values and confidence intervals are strictly connected Any hypothesis test providing a significant result (eg p=0.045) means that we can be confident at 95.5% that the population average difference lies far from zero (ie the null hypothesis)
Ps and confidence intervals important difference tirvial difference Ho significant differences (p<0.05) not significant (p>0.05)
Measures of central tendency: rationale • Need to describe the kind of scores that we have (eg age between fans of a singer) • Raw enumeration:
Measures of central tendency: the Mean Characteristics: -summarises information well -discards a lot of information Assumptions: -data are not skewed • distorts the mean • outliers make the mean very different -Measured on measurement scale • cannot find mean of a categorical measure ‘average’ shoe size is meaningless
Measures of central tendency: the Median • One in the middle • Place values in order • Median is central • Used for • Ordinal data • Skewed data / outliers • E.g. …………………
Measures of central tendency: comparisons • Mean is usually best • If it works • Useful properties (with SD)
Measures of central tendency: comparisons 30 Mode Median 25 Mean 20 Number of Students 15 10 5 0 0 1 2 3 4 5 6 7 8 9 Number of Books Number of Books
Measures of dispersion: rationale • Central tendency doesn’t tell us everything • We need to know about the spread, or dispersion of the scores • Big difference, or not?
Measures of dispersion: examples 99%CI • Range • Top to bottom • Not very useful • Interquartile range • Used with median • ¼ way to ¾ way • Standard deviation • Used with mean • Very useful 75%CI SD
Measures of dispersion: standard deviation Standard deviation: • approximates population sigma as N increases Advantages: • with mean enables powerful synthesis mean±1SD 68% of data mean±2SD 95% of data (1.96) mean±3SD 99% of data(2.86) Disadvantages: • is based on normal assumptions
n Measures of dispersion and confidence intervals Standard error (SE or SEM) can be used for making a confidence interval around a mean 95% CI = mean ± 2 SE S.E. = SD
Shapes of distribution Frequency Value
Power and sample sizes Whenever designing a study or analyzing a dataset, it is important to estimate the sample size or the power of the comparison SAMPLE SIZE Setting a specific alpha, a specific beta, you calculate the necessary sample size given the average inter-group difference and its variation POWER Given a specific sample size and alpha, in light of the calculated average inter-group difference and its variation, you obtain an estimate of the power (ie 1-beta)
Diagnostic test accuracy SENSITIVITY measures of all the people with a particular disease, the proportion who will test positive for it, ie true positives/(true+false positives) SPECIFICITY measures of all the people without the disease, the proportion who will test negative for it, ie true negatives/(true+false negatives) ACCURACY is the sum of both, ie (true positives+ true negatives)/(all tested subjects)
(Non-) Parametric tests Whenever normal or Gaussian assumptions are valid, we can use PARAMETRIC tests, which are usually more sensitive and powerful (eg Student t) However, if an underlying normal cannot be safely assumed (ie there is NON-GAUSSIAN distribution), non-parametric alternatives should be employed Even non-parametric tests are nonetheless based on ASYMPTOTIC assumptions…
Exact tests Whenever asymptotic assumptions cannot be met, EXACT TESTS should be employed Exact tests are computationally burdensome (they involve PERMUTATIONS)*, but they do not rely on any underlying assumption eg if in a 2x2 table a cell has an expected event rate≤5, Pearson chi-square test is biased (ie ↑alpha error), and Fisher exact test is warranted *6! is a permutation, and equals 7x6x5x4x3x2=3780