340 likes | 463 Views
Session 1 Probability to confidence intervals. By : Allan Chang. Probability Mean The normal distribution Standard Deviation (of measurements) Confidence interval of measurements Population and sample Standard Error (of the mean) Confidence interval of the mean
E N D
Session 1Probability to confidence intervals By : Allan Chang
Probability • Mean • The normal distribution • Standard Deviation (of measurements) • Confidence interval of measurements • Population and sample • Standard Error (of the mean) • Confidence interval of the mean • Difference between two means (mean of differences) • Standard Error (of the difference) • Confidence interval of the difference
Probability • Philosophical nature of truth • Aristotle : things can be true or false, but cannot be both • Confucius : everything is in the middle • Science and empiricism • Truth cannot be known, only estimated from observations • Things repeatedly observed are more likely to be true • No theory can be accepted unless supported by repeated observations • Uncertainties in observations • Multiple influences on any observation • Precisions of measurement • Very unlikely to get the same result every time • Probability provides an expression of confidence in the results • Statistics is the set of tools to summarise the observations
How probability is expressed • Convention : a number between 0 and 1. • 0 (0%) = absolutely impossible, 1 (100%) = absolute certainty • Subjective probability :An expression of opinion. • e.g. the probability of my staying awake in this lecture is 10% • Logical probability : mathematical calculations based on assumptions, how most statistical theories are developed • e.g. If the probability of getting heads in tossing a coin is 0.5 • getting 2 heads in 2 tosses is 0.5x0.5=0.25, getting 2 tails = 0.25, and getting 1 head and 1 tail = 0.5 • Getting at least 1 heads in 2 tosses is 0.5 + (0.5x0.5) = 0.75 • Empirical probability : calculated from observations • e.g. if there are 10 women and 20 men in the class, the probability of being male is 20/(10+20) = 0.67 (67%)
Probability • Mean • The normal distribution • Standard Deviation (of measurements) • Confidence interval of measurements • Population and sample • Standard Error (of the mean) • Confidence interval of the mean • Difference between two means (mean of differences) • Standard Error (of the difference) • Confidence interval of the difference
Mean • Havara : a system of insurance amongst Phoenecian traders • Havara -> average -> mean • Mean is the centre of all the measurements
Probability • Mean • The normal distribution • Standard Deviation (of measurements) • Confidence interval of measurements • Population and sample • Standard Error (of the mean) • Confidence interval of the mean • Difference between two means (mean of differences) • Standard Error (of the difference) • Confidence interval of the difference
Gauss De Moivre
Probability • Mean • The normal distribution • Standard Deviation (of measurements) • Confidence interval of measurements • Population and sample • Standard Error (of the mean) • Confidence interval of the mean • Difference between two means (mean of differences) • Standard Error (of the difference) • Confidence interval of the difference
De Moivre Fisher
Fisher p z Z=0; p=0.5 Z=1.29; p=0.1 Z=1.65; p=0.05 Probability p Z=1.96; p=0.025 Z=2.33; p=0.01 Z=2.58; p=0.005 Z=3.09; p=0.001 Standard deviate z
Probability • Mean • The normal distribution • Standard Deviation (of measurements) • Confidence interval of measurements • Population and sample • Standard Error (of the mean) • Confidence interval of the mean • Difference between two means (mean of differences) • Standard Error (of the difference) • Confidence interval of the difference
Confidence interval Half of excluded each side e.g. in 95% CI 5% is excluded 2.5% each side (z=1.96) 95% CI = mean +/- 1.96 SD
Confidence intervalof measurements • The range any single measurement is likely to be within • CImeasurement = mean +/- z SD • 90% CI = mean +/- 1.65 SD • 95% CI = mean +/- 1.96 SD • 99% CI = mean +/- 2.58 SD
Probability • Mean • The normal distribution • Standard Deviation (of measurements) • Confidence interval of measurements • Population and sample • Standard Error (of the mean) • Confidence interval of the mean • Difference between two means (mean of differences) • Standard Error (of the difference) • Confidence interval of the difference
Population • Measure everyone, the results are true of the population • Very expensive and inconvenient to do, in most cases not possible • Sampling • Measure a small but representative portion of the population, and assume that the results are true also of the population as a whole • The results are estimates of the truth, approximations • The degree of approximation (error) is important and needs to be estimated
Probability • Mean • The normal distribution • Standard Deviation (of measurements) • Confidence interval of measurements • Population and sample • Standard Error (of the mean) • Confidence interval of the mean • Difference between two means (mean of differences) • Standard Error (of the difference) • Confidence interval of the difference
Standard Error of the mean • Conceptually, the Standard Deviation of all the mean values, if the samples are taken repeatedly
n mean SD SE 20 -0.14 1.01 0.23 20 0.10 1.30 0.29 20 0.11 1.20 0.27 20 0.44 0.90 0.20 20 -0.13 1.12 0.25 20 0.29 0.90 0.20 20 0.29 1.15 0.26 20 0.36 1.07 0.24 20 0.04 1.41 0.32 20 -0.02 0.90 0.20 mean and SD of the 10 means 10 0.13 0.20 • Standard Error of the mean • SE = SD / sqrt(n) • Random number mean=0 SD=1 • 10 samples of 20 each • SE each sample = 0.2 – 0.32 • SD of the 10 means = 0.2 • SE is an estimate of the SD of the means if repeated sampling of the same size occurs
Standard Error of the mean (SE) • A function of Standard Deviation, and sample size • If the sample size is infinite, or the whole population, then SE = 0, as the mean value has no error • If the sample size =1, then the SE is the same as the Standard Deviation • SEmean = SD / sqrt(n)
SE = SD / SQRT(n) SE as % of SD Sample size
Probability • Mean • The normal distribution • Standard Deviation (of measurements) • Confidence interval of measurements • Population and sample • Standard Error (of the mean) • Confidence interval of the mean • Difference between two means (mean of differences) • Standard Error (of the difference) • Confidence interval of the difference
Confidence interval of the mean • The range within which the true mean value can be expected • CImean = mean +/- z SE • 90% CI = mean +/- 1.65 SE • 95% CI = mean +/- 1.96 SE • 99% CI = mean +/- 2.58 SE • CImean = CImeasurement / SQRT(n)
Probability • Mean • The normal distribution • Standard Deviation (of measurements) • Confidence interval of measurements • Population and sample • Standard Error (of the mean) • Confidence interval of the mean • Difference between two means (mean of differences) • Standard Error (of the difference) • Confidence interval of the difference
Theory of sampling • Difference between the means of samples in two groups can be considered as the mean difference between these two groups in the population • e.g. The difference in the mean weights between 200 men and 200 women represents the mean difference in weight between men and women in the population • The problem is the Standard Error and the confidence interval
Probability • Mean • The normal distribution • Standard Deviation (of measurements) • Confidence interval of measurements • Population and sample • Standard Error (of the mean) • Confidence interval of the mean • Difference between two means (mean of differences) • Standard Error (of the difference) • Confidence interval of the difference
R.A. Fisher devised the Analysis of Variance between groups • Total variations in measurements are partitioned • That attributable to groups difference • That attributable to individual differences (residual, error)
From the Analysis of variance • Individual or random variation is partitioned • What is left are group variance and difference • From this • Further partition to Standard Errors for each group • What is left then is the variation or error of the difference • From this, the Standard Error of the difference is derived • SEdifference is a function of the sample size and SD of the two groups, and the difference in the means between the two groups
Probability • Mean • The normal distribution • Standard Deviation (of measurements) • Confidence interval of measurements • Population and sample • Standard Error (of the mean) • Confidence interval of the mean • Difference between two means (mean of differences) • Standard Error (of the difference) • Confidence interval of the difference
Confidence interval of the difference • The range within which the difference can be expected • CIdifference = mean +/- z SEdifference • 90% CI = difference +/- 1.65 SEdifference • 95% CI = difference +/- 1.96 SEdifference • 99% CI = difference +/- 2.58 SEdifference
Interpretation of CIdifference • CI represent the range within which the “true value” is likely to be • CI that does not traverse the null value means that the difference is not null, or that the difference is statistically significant • CI that are inside of a meaningless difference (tolerance) means that the difference is not of practical consequence, and can be considered to be equivalent
An example of CI • We do two studies • Study 1, the difference in weight between women and men • Study 2, the difference in weight between those named Jim and all those named Bill • We think that a difference of less than 1Kg is probably not big enough to worry about
Tolerance +/- 1Kg men – women n = 20 men – women n = 100 Jim – Bill n = 200 Jim – Bill n = 1000 Difference in weight (Kg)
Terms you need to know • Measurements • Mean, Standard Deviation, confidence interval • Means • Mean, Standard Error of the mean, confidence interval • Difference • Difference between means, mean difference, Standard Error of the difference, confidence interval, significant difference, not significantly different, equivalence