1 / 27

Iain Buchan University of Manchester buchan@man.ac.uk

Statistical Methods for Health Intelligence Lecture 3: Exploring quantitative data, distributions, sampling and basic probability. Iain Buchan University of Manchester buchan@man.ac.uk. Data: singular or plural?. Data are (plural) One measurement produces a datum or data point

khalil
Download Presentation

Iain Buchan University of Manchester buchan@man.ac.uk

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Methodsfor Health IntelligenceLecture 3: Exploring quantitative data,distributions, sampling and basic probability Iain Buchan University of Manchester buchan@man.ac.uk

  2. Data: singular or plural? • Data are (plural) • One measurement produces a datum or data point • Neuter past participle of Lating dare (to give):“something given” • Data is (singular) • Sloppy grammar? • Evolution of the word to incorporate structure as well as content in a set of data points thus referring to the whole in the singular

  3. Is this an accurate description of the location? Motorcylist Death Ages Counts Mid-points 1 75|======== 1 60|======== 3 45|====================== 8 30|============================================================ 7 15|==================================================== /+--------------+--------------+--------------+--------------+ 0 2 4 6 8 Central location measured by?... sort x() percent = 0.5 index = percent * (length + 1) If index < 1 then index = 1 If index > length then index = length If index - fix(index) = 0 then quantile = x(index) else quantile = x(fix(index)) + (x(fix(index) + 1) - x(fix(index))) * (index - fix(index)) end if Yes, the median is robust to outliers and asymmetry

  4. 24 24 15,16,16,18,20,20,21,24,24,24,24,28,31,32,33,41,44,52,64,71 30.9

  5. Where is the upper quartile? 01,02,03,04,05,06,07,08,09,10,11,12,13,14,15,16,17,18,19,20 15,16,16,18,20,20,21,24,24,24,24,28,31,32,33,41,44,52,64,71 • sort x() • percent = 0.5 • index = percent * (length + 1) • If index < 1 then index = 1 • If index > length then index = length • If index - fix(index) = 0 then • quantile = x(index) • else • quantile = x(fix(index)) + (x(fix(index) + 1) - x(fix(index))) * (index - fix(index)) • end if 0.75 * (20+1) = 15.75 33 …is 37 41 75% of the way to…

  6. What is the direction of skew? Skewness = 1.28 Positive or right skew The tail wags the dog

  7. BIRTHWEIGHT ------------------------------------------------------------- Percentiles Smallest 1% 1810 143 5% 2460 313 10% 2720 336 Obs 27676 25% 3100 369 Sum of Wgt. 27676 50% 3440 Mean 3410.968 Largest Std. Dev. 560.272 75% 3770 5580 90% 4100 5700 Variance 313904.8 95% 4280 5720 Skewness -.4359693 99% 4640 5900 Kurtosis 4.144782

  8. BIRTHWEIGHT Normal (mean = 0, sd = 1) Valid data 27676 27000 Missing data 67 0 Sum 94401963 221.85 Mean 3410.968 0.008 Variance 313904.761 1.003 Standard deviation 560.272 1.002 Variance coefficient 0.164 121.909 Standard error of mean 3.368 0.006 Upper 95% CL of mean 3417.57 0.02 Lower 95% CL of mean 3404.367 -0.004 Geometric mean 3359.111 * Skewness -0.436 -0.017 Kurtosis 4.145 3.044 Maximum 5900 4.001 Upper quartile 3770 0.686 Median 3440 0.009 Lower quartile 3100 -0.655 Minimum 143 -3.931 Range 5757 7.931 Centile 95 4280 1.644 Centile 5 2460 -1.65

  9. Which way around should the axes be? What is this plot?Would it be appropriate?

  10. Name some different types of probability Frequency Model-based Subjective

  11. Disease Disease or test? Test Which statistics reflect the test? Sensitivity = a/(a+c) Specificity = b/(b+d)

  12. Predictive values reflect the disease • What is the multiplication rule • p(A and B) = p(A|B) p(B) • P(A|B)p(B) = p(B|A) p(A) • So Sir Thomas Bayes said what about prior assessment of chances? • p(B|A)=[p(A|B)p(B)]/p(A) • Apply this to positive predictive value of a test • p(D+|T+)=[p(T+|D+)p(D+)]/p(T+) • PPV=[sensitivity * prevalence] / +ve test probability

  13. Odds before Odds after Likelihood ratio

  14. SDI conceived SDI not conceived 159 165 136 140 149 154 156 139 191 134 169 154 194 120 182 133 163 150 152 146 145 140 176 114 122 128 141 131 172 116 162 128 165 122 184 129 239 145 178 117 178 140 164 149 185 116 154 147 164 125 140 149 207 129 214 157 165 144 183 123 218 107 142 129 161 152 168 164 181 134 162 120 166 148 150 151 205 149 163 138 166 159 176 169 137 151 141 145 135 135 153 125 159 148 142 130 111 140 136 142 139 137 187 154 151 149 148 157 159 143 124 141 114 136 110 129 145 132 125 149 146 138 151 147 154 147 158 156 156 128 151 138 193 131 127 129 120 159 147 159 156 143 149 160 126 136 150 136 151 140 145 140 134 140 138 144 140 140 ROC Curve Analysis Data set: SDI conceived(+ve), SDI not conceived(-ve) Area under ROC curve by extended trapezoidal rule = 0.88 Wilcoxon estimate of area under ROC curve = 0.88 DeLong standard error = 0.03: 95% CI = 0.81 to 0.94 Optimum cut-off point selected = 161 Table at cut-off: a b 30 5 c d 12 111 sensitivity (95% CI) = 0.71 (0.55 to 0.84) specificity (95% CI) = 0.96 (0.9 to 0.99)

  15. Binomial Random variable, r (successes) Fixed variable, n (trials) (other way around for a negative binomial) Population response rate,  (probability of success on a single trial) (does not change) Variance = n(1-) Describe the characteristics… Jakob Bernoulli (1713, posth.)

  16. Poisson Simé on-Denis Poisson (1781 to 1840). Describe the characteristics… Random variable, r (events) Population event rate,  (probability mean rate of events/time) Fixed period of time Independent events Random events in time and space Variance = mean,  too

  17. Normal Adrian 1808, Gauss 1809, Laplace 1778 (central limit) Describe the characteristics… • Random variable, x (continuous) • Mean,  • Standard deviation,  • Standard Normal Distribution  = 0 •  = 1 • Standardized normal deviate

  18. The distribution of the mean tends to be Normal, even when the distribution from which the mean is computed is non-Normal. Mean is the same as the mean of the parent distribution. Variance = parent variance / sample size

  19. Where should 95% of the values lie if drawn from a Normal distribution? Define “confidence interval”

  20. SD or SE? • Describing the variability in a sample • SD • Describing the precision of the sample mean as an estimate of the population mean • SE • CI

  21. Proportion confidence intervals Total = 15, response = 1 Proportion = 0.066667 Exact (Clopper-Pearson) 95% confidence interval = 0.001686 to 0.319485 Using null hypothesis that the population proportion equals 0.5 Binomial one sided P = 0.0005 Binomial two sided P = 0.001 Approximate (Wilson) 95% mid-P confidence interval = 0.011867 to 0.298165 Binomial one sided mid-P = 0.0003 Binomial two sided mid-P = 0.0005 Total = 21, response = 0 Proportion = 0 Exact (Clopper-Pearson) 95% confidence interval = 0 to 0.161098 [97.5% one-sided CI] Using null hypothesis that the population proportion equals 0.5 Binomial one sided P < 0.0001 Binomial two sided P < 0.0001 Approximate (Wilson) 95% mid-P confidence interval = 0 to 0.154639 Binomial one sided mid-P < 0.0001 Binomial two sided mid-P < 0.0001 See: NewcombeR. Two sided confidence intervals for the single proportion: a comparative evaluation of seven methods. Statistics in Medicine 1998;17:857-872.

  22. Tempted to play with algorithms? • Low-level: www.boost.org • High-level: cran.r-project.org

More Related