1 / 39

Statistical Methods for Health Intelligence Lecture 2: Perspectives, Data Types & Summaries

Statistical Methods for Health Intelligence Lecture 2: Perspectives, Data Types & Summaries. Iain Buchan University of Manchester buchan@man.ac.uk. Course Material 1: Basic Text. Medical Statistics, 4 th Ed Campbell, Machin & Walters Wiley 2007

idra
Download Presentation

Statistical Methods for Health Intelligence Lecture 2: Perspectives, Data Types & Summaries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Methodsfor Health IntelligenceLecture 2: Perspectives,Data Types & Summaries Iain Buchan University of Manchester buchan@man.ac.uk

  2. Course Material 1: Basic Text • Medical Statistics, 4th EdCampbell, Machin & WaltersWiley 2007 • Statistical knowledge level:Public health practitioner • How are you getting on? • Are you using any other learning materials?

  3. Your Participation • Today: questions about your reading • Take notes on my comments • Prepare to reproduce exercises in R

  4. Course Material 2: R • Statistics: An Introduction Using RCrawley, Wiley 2005 • cran.r-project.org • Reproduce each example in course text • Prepare to do submit R scripts for assessment

  5. Course Material: Optional • Probability and Random Variables: a beginner’s guideStirzaker, Cambridge University Press 1999 • Bad ScienceGoldacre, Fourth Estate Ltd, 2008

  6. Define • statistics • quantitative information about a topic • Statistics • The measurement of uncertainty

  7. The Statistical Movement Circa 1900: Galton, Pearson, Edgeworth and Yule establish Statistics as a discipline Early/mid 1900s: Fisher consolidatesstatistical methods and experimental philosophy

  8. Think • Whose perspective is Chapter 1? • Medical Statistician • Why must the Informatician look wider? • May not have the luxury of study design • Data- vs. hypothesis-driven research • Maximise information validity & utility

  9. Health Statistics 1600-1860 Reasoning Summarisation Knowledge Observation

  10. Health Statistics 1860-≈2000/now Reasoning Summarisation & Statistical Modelling Knowledge Observation± Experimentation

  11. Early/mid 1900s: Greenwood, Bradford-Hill & Doll pushStatistics into medical research Evidence Based Medicine Causality Clinical Trials Mid-late 1900s: Cochrane pushes for the routine application of randomised clinical trials and leaves the evidence based medicine movement in his wake Effectiveness & Efficiency

  12. Hypothesis-driven Research

  13. Define • Epidemiology • the study ofthe distributionand determinantsof diseaseand health-related statesin populations JM Last, 2000

  14. Define • Confounding factor • A factor associated with bothexposure and outcomebut not on the causal pathwayabout which the inference is being made • What confounded the water cancer vs. water fluoridation example in the book?

  15. Causal Inference Exposure Outcome Causal pathway Association Confounder

  16. Sieving Associations C = caffeine, MI = myocardial infarction (heart attack) Disciplined approach to causal inference, Bradford-Hill: Criteria (temporality, strength, dose-response,consistency, plausibility, consideration of alternatives,open to experiment, specificity, coherence)

  17. Hard to Make a Confident Causal Inference • Plausible pathway to link outcome to exposure • Same results if repeat in different time, place person • Exposure precedes outcome • Strong relationship ± dose effect • Causal factor relates only to the outcome in question • Outcome falls if risk factor removed...

  18. Think • What is the most important question a Statistician wants a medic to ask? • How might I be wrong? • In designing my study • In making an inference about an association • In generalising my inference beyond the study population • Statisticians are understandably conservativeInformaticians must be carefully informative

  19. Exhausted Epidemiology Platform Problem 1:Dwindling hits from tools todetect independent “causes” Problem 2:Knowledge can’t be managedby reading papers any more The big public health problems e.g. Type 2 Diabeteshave “complex webs of causes” The “data-set” and structureextend beyondthe study’s observations

  20. Evidence limits showing • Epidemiology has exhausted the big simple causes of ill health • Many trials have weak external validity • Public health interventions are largely unstudied Many patterns of ill health in society remain unexplained via conventional studies

  21. Need Statistical Informatics Data Necessary Complexity of Models Human Resource

  22. Define • Statistical Data-types & Measurement Scales • Categorical  Qualitative measuring • Binary/Dichotomous • Nominal > 2 categories, without order • Ordinal (loose) • Nominal with order • Ordinal (ties = lack of measurement sensitivity) • Numerical  Quantitative measuring • Counts • Continuous (any value in a range) • Interval (fixed and defined, meaningful mean difference) • Ratio (zero means something)

  23. Caution • Don’t treat ordered nominal data as interval! • Why? • Give examples? • Relate these to software requirements

  24. Programming Note • Which has the greater information utility?Sex = 1|2Sex = m|fGender = m|fMale = 1|0Gender_Male = 1|0 • Maximum informationMinimum ambiguityGender_Male = 1|0

  25. Discuss • Why categorise continuous data? • Meaningful thresholds (e.g. Hypertensive) • Compact summary / easy presentation • Easier analysis (good / bad?) • Avoid regression to the mean (homework)

  26. Think • What is audit? • A quality improvement process that seeks to improve a service through systematic against explicit criteria and implementing change • How does this differ from research? • Ethics • Constrained design • What is a natural experiment? • Homework...

  27. Summarise Binary Data: r/n • Describe a proportion • r = outcome or feature present (numerator) • n = number of subjects observed (denominator) • p=r/n; RR = p1/p2; (A)RD = |p2-p1| • Relative Risk (RR) abuse • Pill ↑ risk DVT by (RR =) 2statistically significantclinically insignificant2 women in 10,000 pill-years

  28. Summarise Binary Data: r/n~t • Describe a rate • r = outcome/success/failure (numerator) • n = number of subjects observed (denominator) • t = time over which subjects observed • n*t = person time – why important? • Some may drop out or be lost to follow-up • (incidence) rate IR=r/n, IRR • IRR = 1R1/IR2; IRD = |IR2-IR1|

  29. 25% 20% 15% Males 10% Females 5% 0% Year Percentage excess deaths in North vs. South England Source: John Hacking & Iain Buchan, pre-publication 2009

  30. Summarise Binary Data: Crosstabs • Variables C1-Ck – what is a crosstab? • Cross-tabulate categorical variablessay disease registration by gender2 by 2  r by c tables • Usually two way or two dimensional • Models may need higher dimensionssay disease registration by gender by speciality • Is a data cube the same? • Data Cube: A relational aggregation operator generalizing group-by, crosstab, and subtotals

  31. Contingency Table Dimension 1: Exposure/Treatment/Category 1 Absent Present b a Present Dimension 2:Outcome/Status/Category 2 c d Absent

  32. Summarise Binary Data: Odds • How do odds differ from risk/proportion/probability? • Ratio of occurrence to non occurrence • Odds = p(1-p) • OR = (a/c)/(b/d)=ad/bc • p=a/(a+c),so if a<<c then a/(a+c) ≈ a/c and OR ≈ RR • OR_success = 1/OR_failure, not so for RR • Tractable computation with log odds

  33. Caution • If the odds ratio is interpreted as a relative risk it will always overstate any effect size: the odds ratio is smaller than the relative risk for odds ratios of less than one, and bigger than the relative risk for odds ratios of greater than one • The extent of overstatement increases as both the initial risk increases and the odds ratio departs from unity • However, serious divergence between the odds ratio and the relative risk occurs only with large effects on groups at high initial risk. Therefore qualitative judgments based on interpreting odds ratios as though they were relative risks are unlikely to be seriously in error • In studies which show reductions in risk (odds ratios of less than one), the odds ratio will never underestimate the relative risk by a greater percentage than the level of initial risk • In studies which show increases in risk (odds ratios of greater than one), the odds ratio will be no more than twice the relative risk so long as the odds ratio times the initial risk is less than 100%

  34. Visualise Categorical Data • When is a pie chart useful? • Seldom: arguably only in metaphor • How do you add dimensions to a bar chart? • Cluster • When is a 3D effect useful • Not in 2D concepts! • Showing additional dimensions e.g. 2nd level cluster

  35. What is arguably wrong with this visualisation?

  36. Preparation for 15 Feb • Read chapters 4,5,6 to understand natural distributions and sampling • Return to chapter 3, run the examples in R and generate some alternative examples • Prepare to show ideal visualisations and summaries with your R scripts

More Related