1.1k likes | 2.51k Views
Introduction to Medical Statistics. Jan Klaschka 1st Faculty of Medicine Charles University. Statistical approach. Statistics deals with phenomena that occur repeatedly.
E N D
Introduction toMedical Statistics Jan Klaschka 1st FacultyofMedicine Charles University
Statistical approach Statistics deals with phenomena that occur repeatedly. In medicine, statistics does not cure individual patients but, by summarizing medical experience, helps to find ways how to cure people better. Statistics is an important tool of evidence based medicine.
STATISTICSDefinitions From status (Latin) – stateofthingsorstate as a political entity • The science of data • Theartofdealingwithvariation, uncertaintyandimpreciseinformation • The study ofpopulations
STATISTICSApplications • Biology, medicine, agriculture • Education, psychology, sociology • Business, marketing, economics medicalstatisticsbiostatisticsbiometrics
BIOSTATISTICSBios (Greek) ... life • Biological laboratory experiments • Medical research • evaluation of therapeutic effects of new treatments (clinical research) • assessment of possible risk factors of diseases (epidemiological studies) • Health services research
Uncertainty ...Similar situations do not always result in the same outcome Smoking andlungcancer: • Bothsmokerandnonsmokermayget, or not getlungcancer (LC). • Doesitmeanthat smoking is not injurious to health? • Incidence of LC in smokersis much higher (approx. 16% vs. 1%). • Smokershavehigher risk of LC.
Uncertainty ...Similar situations do not always result in the same outcome Smoking and lung cancer(cont): • Does higher risk of LC in smokers mean that smoking causes LC? • Higher LC risk in smokers might result from a confounding factor that predisposes to both addiction to nicotine, and LC. • Especially in non-experimental situations, causal interpretation of statistical asociation requires care, and often additional evidence.
Physicians depend heavily on the medical literature to stay up to date.Medical scientific journals and books are, however, full of statistical reasoning (mostly via statistical tests) that cannot be well understood without some statistical knowledge. Reading the medical literature
STATISTICS • Descriptive statistics - summarising data • means, standard deviations, proportions... • graphs, plots, tables • Statistical inference - general conclusions • hypothesis testing, p-values • estimates, confidence intervals • Design of experiments • randomization, stratification, matching ….
Descriptive StatisticsPurpose: to describe given sample • Mean, Standard deviation • Median, Quartiles • Minimum, maximum • Proportion, percentage • Histogram, Bar chart, Pie chart, Scatter plot ….
Arithmetic mean x = (165 + 176 + 152 + 194 + 171) / 5 = 171,6 cm 194cm 176cm 171cm 165cm 152cm
mid point: half the data are greater than it, half are less Median 194cm 176cm 171cm 165cm 152cm the items are ordered !
Minimum 194cm 176cm 171cm 165cm 152cm
Maximum 194cm 176cm 171cm 165cm 152cm
Standard deviation 194cm 176cm 171cm 165cm 152cm
Proportion of blood group A B 0 A AB A 0 B 0 AB A A 0
Statistical InferencePurpose: to generalize from sample to population • Population - a large set of items that have something in common • not accessible • contains too many items • Sample- a subset of the population • must be representative There must always be some uncertainty in any general conclusion
Statistical InferenceTwo approaches • Confidence intervals • Hypotheses testing
Proportion of blood group A in Czech Republic ? p? population (n=10000000) random sample (n=100)
Proportion of blood group A in Czech Republic ? population (n=10000000) random sample (n=100) point estimate for population
Proportion of blood group A in Czech Republic ? population (n=10000000) random sample (n=100) How precise? point estimate for population
Proportion of blood group A in Czech Republic ? population (n=10000000) another random sample (n=100)
Proportion of blood group A in Czech Republic ? n=100 1.sample p=0.38 2.sample p=0.34 3.sample p=0.31 4.sample p=0.40 ……………….. population (n=10000000) k.sample p=0.35 The estimates have variability. There is some uncertainty about population value ! SE ...standard error
Proportion of blood group A in Czech Republic ? • we take just 1 sample n=100 • estimate p=0.38 • SE of estimate • distribution 95% CONFIDENCE INTERVAL for p (proportion of blood group A in population): estimate ± critical value of distrib. * SE 0.38 ± 1.96 * 0.05 from 0.28 to 0.48
Confidence intervals • Take a random sample from a population • Calculate estimate • Calculate standard error • Assumption about distribution • Construct confidence interval • With the chosen confidence it covers the „real“ value
Test of statistical hypothesis • Make statement about population parameter - Hypothesis • Collect data (random sample from population) • Calculate test statistics T (from data) • Is T in concordance with the hypothesis? • Yes: „accept“ the hypothesis • No: reject the hypothesis
Test of statistical hypothesis Scientific hypothesis: drug A lowers sytolic BP in diabetic patients with hypertension aged 40-60years Statistical hypothesis H: mbefore = mafter • Take sample of patients with hypertension (n=100) • Measure BPS before treatment • Give them drug A • Measure BPS after treatment • Variation of differences • Test statistics T=3.1 (distribution) p=0.0025 • Provided H holds, this result is very rare we reject H
Statistical methods in medical research - some examples • Data description • T - test • c2 - test • Sample sizecalculation • Correlationandregression • Analysisof variance • Survivalanalysis
Example 1. Descriptive statistics We had sample of 902 patients with lipid disorder. For each patient we measured levels of cholesterols (total cholesterol, LDL-cholesterol and HDL-cholesterol) and we found out genotype of ApoE. What is description of that file?
Example 1. Descriptive statistics We had sample of 902 patientswith lipid disorder. Foreachpatientwemeasuredlevelsofcholesterols (total cholesterol, LDL-cholesterol and HDL-cholesterol) andwefoundout genotype ofApoE. Whatisdescriptionofthatfile?
Example 1. Descriptive statistics We had sample of 902 patients with lipid disorder. For each patient we measured levels of cholesterols (total cholesterol, LDL-cholesterol and HDL-cholesterol) and we found out genotype of ApoE. What is description of that file?
Example 1. Descriptive statistics We had sample of 902 patientswith lipid disorder. Foreachpatientwemeasuredlevelsofcholesterols (total cholesterol, LDL-cholesterol and HDL-cholesterol) andwefoundout genotype ofApoE. Whatisdescriptionofthatfile?
Example 1. Descriptive statistics We had sample of 902 patients with lipid disorder. For each patient we measured levels of cholesterols (total cholesterol, LDL-cholesterol and HDL-cholesterol) and we found out genotype of ApoE. What is description of that file?
Example 1. Descriptive statistics We had sample of 902 patients with lipid disorder. For each patient we measured levels of cholesterols (total cholesterol, LDL-cholesterol and HDL-cholesterol) and we found out genotype of ApoE. What is description of that file?
Example 2. T-test The investigation compared measurements of haemoglobin concentration in the blood of 80 healthy men and 100 healty women.Is there a significant difference between population means? Descriptive statisticsMen: 154.8 ± 24.9 g/lWomen: 140.2 ± 28.1 g/l95% confidence interval for the difference between the population meansfrom 6.3 g/l to 22.9 g/lTest of hypothesisH: means are equal p< 0.05 ---> rejection of H
Example 2. T-test The investigation compared measurements of haemoglobin concentration in the blood of 80 healthy men and 100 healty women.Is there a significant difference between population means? Descriptive statisticsMen: 154.8 ± 24.9 g/lWomen: 140.2 ± 28.1 g/l95% confidence interval for the difference between the population meansfrom 6.3 g/l to 22.9 g/lTest of hypothesisH: means are equal p< 0.05 ---> rejection of H
Example 3. c2-test All births in 10 Ontario (Canada) hospitals during 1960-61. Is there an association of perinatal events and maternal smoking during pregnancy? Relative riskRR = 1.27 ---> risk is 1.27 times as largeTest of hypothesisH: no association c2 = 17.76p < 0.001 --> rejection of H
Example 4. Sample size calculation Randomized double-blind trial comparing anturan and placebo in patients after a myocardial infarction. We know that 10% of patients with placebo die within a year. We are interested if anturan is able to halve the mortality (i.e. 5% die in 1 year). We consider 5% level of statistical significance and we‘d like to be 90% sure that the result is detected as significant. How many patients do we need? formula:n = f (p1,p2,a, b) p1 = 90%, p2 = 95%, a = 5%, 1 - b = 90%n = 10.5*(90*10 + 95*5)/(95 -90)2 = 578conclusion:578 patients required on each treatment
Example 5. Correlation and regression We have a group of 14 patients for whom we measured two variables: insulin sensitivity and DHEA.Are these variables related? Correlation coefficientr = 0.658, p = 0.015Regression equationSI = 0.105*DHEA - 0.534
Example 7. Survival analysis We want to compare kidney retention times for patients who entered the study in 1978 with kidney retention times for patients who entered the study in 1984. Beginning in 1984 drug cyclosporin was routinely used to reduce rejection of the kidney; prior to that time drug azathioprine was used. We measured retention times for all patients.Did the use of cyclosporin result in fewer cases of kidney rejection? Log-rank testc2 = 2.15, p = 0.10ConclusionThere is not a statistic. signif. difference in the distributions of kidney retention times for patients treated with A and those treated with C.(Sample sizes: 18, 21)
Literature • A. Petrie, C. Sabin: Medical Statistics at a Glance. Wiley 2005. • G. R. Norman, D. L. Streiner: Biostatistics. The Bare Essentials. BC Decker 2000.
Software for statistics • MS Excel offers some tools for the descriptive statistics, graphs and a limited selection of mathematical statistics methods. (Beware Czech Excel!) • Specialized SW: • SAS, SPSS, BMDP – rather for statistical professionals • Statistica, Statgraphics, Medcalc, SigmaStat/SigmaPlot – easy to use for non-experts in statistics and occasional users • Yellow items licensed to the Faculty, available to students