230 likes | 656 Views
Introduction to statistics in medicine – Part 1. Arier Lee. Introduction. Who am I Who do I work with What do I do. Why do we need statistics. Sample. Population. The important role of statistics in medicine. Statisticians pervades every aspect of medical research
E N D
Introduction • Who am I • Who do I work with • What do I do
Why do we need statistics Sample Population
The important role of statistics in medicine • Statisticians pervades every aspect of medical research • Medical practice and research generates lots of data • Research involves asking lots of questions with strong statistical aspects • The evaluation of new treatments, procedures and preventative measures relies on statistical concepts in both design and analysis • Statisticians are consulted at early stage of a medical study
Research process Research question Analyse data Primary and secondary endpoints Study design Interpret results Sampling and/or randomisation scheme Disseminate Power and sample size calculation Pre-define analyses methods
Bias • A form of systematic error that can affect scientific research • Selection bias – well defined inclusion / exclusion criteria, randomisation • Assessment bias – blinding • Response bias, lost-to-follow-up bias – maximise response • Questionnaire bias – careful wording and good interviewer training
Some common data types • Continuous age, weight, height, blood pressure • Percentages % of households owning a dog • Counts Number of pre-term babies • Binary yes/no, male/female, sick/healthy • Ordinal taste of biscuits: strongly dislike, dislike, neutral, like, strongly like • Nominal categorical Ethnicity: European, Maori, Pacific Islander, Chinese etc.
Descriptive statistics for continuous data – the average • Mean (sum of values)/(number in group) • Median The middle value, 50th percentile • Mode The value that occurs the most often 3 4 7 8 8 8 9 11 11 13 21 23 24 median mode=8 mean=11.54
Descriptive statistics for continuous data – the spread • Range Minimum and maximum numbers • Interquartile range Quartiles divide data into quarters • Standard deviation A statistic that tells us how far away from the mean the data is spread (95% of the data lies between 2 SD) √ (xi - x) 2 /(n-1) 0, 1, 2, 5, 8, 8, 9, 10, 12, 14, 18, 20 21, 23, 25, 27, 34, 43 18 numbers Q1 Q2 Q3
Estimation • Estimation: determine value of a variable and its likely range (ie. 95% confidence intervals) • Statistical inference is a process of generalising results calculated from a sample to a population • We are interested in some numerical characteristic of a population (called a parameter). e.g. the mean height or the proportion of pregnant women with hypertension • We take a sample from the population and calculate an estimateof this parameter
Estimation – a simple example • We want to estimate the mean height of 10 years old boys • Take a random sample of 100 ten years old boys and calculate the sample mean • The mean height of my random sample is 141cm • Based on our random sample, we estimate the mean height of 10 years old boys is 141cm
Distribution of Data • It is essential to know the distribution of your data so you can choose the appropriate statistical method to analyse the data • Data can be distributed (spread out) in different ways • Continuous data: There are many cases when the data tends to be around a central value with no bias to the left or right – normal distribution
Distribution of data – Normal distribution • Many parametric methods assumes data is normally distributed • Bell curve • Peak at a central value • Symmetric about the centre • Mean=median=mode • The distribution can be described by two parameters – mean and standard deviation
Standard deviation • Standard deviation – shows how much variation or ‘dispersion’ exists in the data. • 95% of the data are contained within 2 standard deviations
A simulated example – Birth weight Histogram of birth weight Mean=3250g SD=550g
Some other common distributions • Some common distributions • Binomial distribution – gestational diabetes (Yes/No) • Uniform distribution - throwing a die, equal (uniform) probability for each of the six sides • And many many more…
Sampling variability • Because of random sampling, the estimated value will be just an estimate – not exactly the same as the true value • If repeated samples are taken from a population then each sample and hence sample mean and standard deviation is different. This is known as Sampling Variability
Sampling variability • In practice we do not repeat the sampling to measure sampling variability we endeavour to obtain a random sample and use statistical theory to quantify the error • Fundamental principle to justify our estimate is reasonable: If it were possible to repeat a study over and over again, in the long run the estimates of each study would be distributed around the true value • If we have a random sample then the sampling variability depends on the size of the sample and the underlying variability of the variable being measured