1 / 86

Nuts and bolts of biostatistics

Nuts and bolts of biostatistics. A primer of biomedical statistics for cardiologists. What to expect today?. CORE MODULES Principles of biostatistics - Descriptive statistics Inferential statistics I - I - Basic concepts, α & β errors, sample size Inferential statistics II -

barr
Download Presentation

Nuts and bolts of biostatistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Nuts and bolts of biostatistics A primer of biomedical statistics for cardiologists

  2. What to expect today? • CORE MODULES • Principles of biostatistics - Descriptive statistics • Inferential statistics I - • I - Basic concepts, α & β errors, sample size • Inferential statistics II - • II - Parametric and non-parametric hypothesis tests • Linear, logistic, univariate and multivariate regression • Survival analysis

  3. Science or fiction? There are lies, damn lies, and statistics B. Disraeli Knowledge is the process of piling up facts, wisdom lies in their simplification M. Fisher

  4. What is statistics? • DEFINITIONS • A whole subject or discipline • A collection of methods • Collections of data • Specially calculated figures

  5. Statistics is great Find stuff out • Finding stuff out is fun • Feel like you have done something • It’s small, but it’s something Understand stuff • When we are being deceived • Support, or illumination?

  6. Methods of inquiry Statistical inquiry may be… Descriptive (to summarize or describe an observation) or Inferential (to use the observations to make estimates or predictions)

  7. Inferential statistics If I become a scaffolder, how likely I am to eat well every day? Confidence Intervals P values

  8. Samples and populations This is a sample

  9. Samples and populations And this is its universal population

  10. Samples and populations This is another sample

  11. Samples and populations And this might be its universal population

  12. Samples and populations But what if THIS is its universal population?

  13. Samples and populations Any inference thus depend on our confidence in its likelihood

  14. Precision and accuracy

  15. Precision and accuracy Thus Precision expresses the extent of RANDOM ERROR Accuracy expresses the extent of SYSTEMATIC ERROR (ie bias)

  16. Bias Bias is a systematic DEVIATION from the TRUTH -in itself it cannot be ever recognized -there is a need for external gold standard and/or permanent surveillance

  17. Validity Internal validity entails both PRECISION and ACCURACY (ie does a study provide a truthful answer to the research question) External validity expresses the extent to which the results can be applied to other contexts and settings. It corresponds to the distinction between SAMPLE and POPULATION)

  18. Inferential statistics If I become a scaffolder, how likely I am to eat well every day? Confidence Intervals P values

  19. Significance testing Significance hypothesis testing is based on the estimate of HOW LIKELY a specified hypothesis may EXPLAIN the data found in any specific study It is based on an explicit distinction between a NULL HYPOTHESIS and 1 or more ALTERNATIVE hypotheses

  20. Significance testing eg X0 - NULL HYPOTHESIS Diabetics have the same risk of restenosis after coronary stenting in comparison to non-diabetics XA - ALTERNATIVE HYPOTHESIS The risk of restenosis is different between diabetics and non-diabetics

  21. Significance testing Any significance hypothesis test must provide me with a PROBABILITY (termed p) that measures the LIKELIHOOD that the data I have found can be explained by RANDOM SAMPLING from a population where the null hypothesis is true Eg if I compare 5 diabetics and 5 non-diabetics random-sampling error may mask any underlying differences, leading me NOT TO REJECT HO

  22. Alpha and type I error Whenever I perform a test, there is a risk of a FALSE POSITIVE result, ie REJECTING A TRUE null hypothesis This error is called type I, is measured as alpha and its unit is the p value The lower the p value, the lower the risk of falling into a type I error (ie the HIGHER the SPECIFICITY of the test)

  23. Beta and type II error Whenever I perform a test, there is also a risk of a FALSE NEGATIVE result, ie NOT REJECTING A FALSE null hypothesis This error is called type II, is measured as beta and its unit is a probability The complementary of beta is called power The lower the beta, the lower the risk of missing a true difference (ie the HIGHER the SENSITIVITY of the test)

  24. Errors

  25. Ps and confidence intervals P values and confidence intervals are strictly connected Any hypothesis test providing a significant result (eg p=0.045) means that we can be confident at 95.5% that the population average difference lies far from zero (ie the null hypothesis)

  26. Ps and confidence intervals

  27. Ps and confidence intervals important difference tirvial difference Ho significant differences (p<0.05) not significant (p>0.05)

  28. Shapes of distribution

  29. Shapes of distribution MEAN

  30. Shapes of distribution

  31. Shapes of distribution

  32. Departing from normality: Outliers

  33. Measures of central tendency: rationale • Need to describe the kind of scores that we have (eg age between fans of a singer) • Raw enumeration:

  34. Measures of central tendency: histograms

  35. Measures of central tendency: the Mean Characteristics: -summarises information well -discards a lot of information Assumptions: -data are not skewed • distorts the mean • outliers make the mean very different -Measured on measurement scale • cannot find mean of a categorical measure ‘average’ shoe size is meaningless

  36. Measures of central tendency: the Median • One in the middle • Place values in order • Median is central • Used for • Ordinal data • Skewed data / outliers • E.g. …………………

  37. Measures of central tendency: comparisons • Mean is usually best • If it works • Useful properties (with SD)

  38. Measures of central tendency: comparisons 30 Mode Median 25 Mean 20 Number of Students 15 10 5 0 0 1 2 3 4 5 6 7 8 9 Number of Books Number of Books

  39. Measures of dispersion: rationale • Central tendency doesn’t tell us everything • We need to know about the spread, or dispersion of the scores • Big difference, or not?

  40. Measures of dispersion: examples

  41. Measures of dispersion: examples

  42. Measures of dispersion: examples

  43. Measures of dispersion: examples 99%CI • Range • Top to bottom • Not very useful • Interquartile range • Used with median • ¼ way to ¾ way • Standard deviation • Used with mean • Very useful 75%CI SD

  44. Measures of dispersion: standard deviation Standard deviation: • approximates population sigma as N increases Advantages: • with mean enables powerful synthesis mean±1SD 68% of data mean±2SD 95% of data (1.96) mean±3SD 99% of data(2.86) Disadvantages: • is based on normal assumptions

  45. n Measures of dispersion and confidence intervals Standard error (SE or SEM) can be used for making a confidence interval around a mean 95% CI = mean ± 2 SE S.E. = SD

  46. Shapes of distribution Frequency Value

  47. Power and sample sizes Whenever designing a study or analyzing a dataset, it is important to estimate the sample size or the power of the comparison SAMPLE SIZE Setting a specific alpha, a specific beta, you calculate the necessary sample size given the average inter-group difference and its variation POWER Given a specific sample size and alpha, in light of the calculated average inter-group difference and its variation, you obtain an estimate of the power (ie 1-beta)

  48. Diagnostic test accuracy SENSITIVITY measures of all the people with a particular disease, the proportion who will test positive for it, ie true positives/(true+false positives) SPECIFICITY measures of all the people without the disease, the proportion who will test negative for it, ie true negatives/(true+false negatives) ACCURACY is the sum of both, ie (true positives+ true negatives)/(all tested subjects)

  49. (Non-) Parametric tests Whenever normal or Gaussian assumptions are valid, we can use PARAMETRIC tests, which are usually more sensitive and powerful (eg Student t) However, if an underlying normal cannot be safely assumed (ie there is NON-GAUSSIAN distribution), non-parametric alternatives should be employed Even non-parametric tests are nonetheless based on ASYMPTOTIC assumptions…

  50. Exact tests Whenever asymptotic assumptions cannot be met, EXACT TESTS should be employed Exact tests are computationally burdensome (they involve PERMUTATIONS)*, but they do not rely on any underlying assumption eg if in a 2x2 table a cell has an expected event rate≤5, Pearson chi-square test is biased (ie ↑alpha error), and Fisher exact test is warranted *6! is a permutation, and equals 7x6x5x4x3x2=3780

More Related