1 / 36

Biostats I – Review Lecture 4

Biostats I – Review Lecture 4. October 11, 2012. Last Time. Probabilities Binomial Distribution Normal Distribution. This Lecture. Sampling a population Central Limit Theorem t Distribution Sampling distribution and CI for proportions. Sampling the Population.

mareo
Download Presentation

Biostats I – Review Lecture 4

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Biostats I – Review Lecture 4 October 11, 2012

  2. Last Time • Probabilities • Binomial Distribution • Normal Distribution

  3. This Lecture • Sampling a population • Central Limit Theorem • t Distribution • Sampling distribution and CI for proportions

  4. Sampling the Population μ = True population mean X bar: mean from the sampled population

  5. Sampling the Population • In order to make inference about the entire population mean we must have: Random sample of the data • The larger the sample, the more reliable our estimates about the population parameters • Because we do not know μ (true mean) and only know x bar (sample mean) we use confidence intervals to quantify our uncertainty

  6. Sampling Distribution If we sample many different times in the population, sample distribution will approach a normal distribution

  7. Central Limit Theorem • With a random variable in a population with • mean = μ • standard deviation = σ • Then the sampling distribution of multiple sample means (x bars) with a sample size = n: • If n is large enough the sampling distribution will be approximately normal • The mean of the sampling distribution = μ • The standard deviation of the sample is σ/√n

  8. Note • Population Standard Deviation • σis the standard deviation of the original distribution • Sample Standard Deviation • σ/√n is called the standard error, or more precisely, the standard error of the mean, and it is the standard deviation of the distribution of the sample mean.

  9. Central Limit Theorem – uniform random variables

  10. Central Limit Theorem – Chi-square distribution

  11. Central Limit Theorem – Bimodal distribution

  12. Using the Central Limit Theorem • HIV population CD4 count • μ = 250 • σ = 200 • Repeated sample populations of n = 50 • X bar = μ = 250 • Standard error = σ/ √n = 200/ √50 = 28.3

  13. Using the Central Limit Theorem • What proportion of samples will have mean value < 100 cells/mm3 (X)? • Convert our population to the normal distribution use: • Z= (X – μ)/σ ** remember to use σ/ √n • Z= (100-250) / (200/ √50) • Z = -5.3 • P (Z < -5.3) = < 0.0001

  14. Using the Central Limit Theorem • What level of CD4 count is the lower 10th percentile of the mean values (Z <= 0.1)? • Calculate the Z value for this proportion: di invnormal(.10) -1.2815516 • Transform back to CD4 count value • -1.28 = (X – μ) / (σ/ √n) = (X – 250) / (200/√50) • di -1.28155*200/sqrt(50) + 250 213.75229

  15. Using the Central Limit Theorem • What level of CD4 count is the upper 2.5th percentile of the mean values? (Z > 0.025)? • ** remember Stata output invnormal is for <= • Calculate the Z value for this proportion: di invnormal(.975) 1.959964 • Transform back to CD4 count value • 1.96 = (X – μ) / (σ/ √n) = (X – 250) / (200/√50) • di 1.959964*200/sqrt(50) + 250 305.43717

  16. Using the Central Limit Theorem • Now we have the lower and upper 2.5% percentiles of the distribution of the sample means. • The interior area contains 95% of the sample means. • 95% of the means from samples of size 50 that come from the underlying distribution ~N(250,200) will lie within this interval (194.6, 305.4)

  17. Using the Central Limit Theorem • The interval for the means depends on the sample size n • As n increases, the width of the interval decreases

  18. Confidence intervals for means • Interval estimation provides a range of reasonable values that contain the population parameter (in this case ) with a certain degree of confidence

  19. Confidence intervals for means • We know from examining the standard normal distribution that P(-1.96 ≤ Z ≤ 1.96) = 0.95 95% 2.5% 2.5%

  20. Calculating CI for means when we know the standard deviation Thus the lower 95% confidence limit for µ is And the upper 95% confidence limit for µ is We say we are 95% confident that the interval we calculate using the above formulae includes 

  21. Calculating CI for means when we know the standard deviation • 90% confidence interval • Replace 1.96 in the formula with 1.64 • 99% confidence interval • Replace 1.96 in the interval with 2.58

  22. Interpreting confidence intervals for means • The probability that the interval contains the true population mean is 95% • If we were to select 100 random samples from the population and calculate confidence intervals for each, approximately 95 of them would include the true population mean µ (and 5 would not)

  23. Confidence intervals for means • How to get a tighter interval? • Decrease the confidence level • Increase n

  24. What do we do if we don’t know the standard deviation (σ)? • Use the Student t Distribution • If X is normally distributed, and a sample of size n is chosen, then follows a Student’s t distribution with n-1 degrees of freedom

  25. The t-distribution (t not z)

  26. 1) Calculating probability from t value • Use Table A.4: Gives P(T>t) at selected degrees of freedom 2) Using Stata • Stata is trying to confuse us even more! • Normal distribution: Stata gives P(Z<z) • Students t distribution: Stata gives P (T>t)

  27. 1) Calculating probability from t value • Stata Code P (T>t) • ttail (df,t) • Where df = n-1 • E.g., P(T>1.95) n=20 display ttail(19,1.95) .03304428

  28. 2) Calculating t value from probability • For example, for what t is P(T>t)=.05 for a sample of size 20? • Stata code • invttail(df,p) • Example display invttail(19,.05) 1.7291328

  29. Normal approximation of the binomial distribution • Parameters • n = number of trials • p = proportion of success • np = mean • (np(1-p) = variance • √(np(1-p) = standard deviation • As n, the number of “trials”, increases, the binomial distribution more closely resembles the normal distribution

  30. Binomial approximation to normal distribution • Considered valid when np≥5 and n(1-p) ≥5 • Why use it? • It is easier to use the normal distribution than to use table A.1. For example, if n=50, p=.45, and you wanted to know the P(X>=30), using table A.1 which gives you P(X=x), you would need to find P(X=30) + P(X=31) + .... + P(X=50) • Although in Stata the binomialtail function does actually give you P(X≥x)

  31. Sampling distribution of a proportion • We often are more interested in the proportionof successes, rather than the number of successes • The true population proportion p is estimated by x = the number of successes or events n=the number of trials or people or observations

  32. Sampling distribution of a proportion • If we take: • repeated samples of size n from a variable that follows the Bernoulli distribution (i.e. the outcome is 0 or 1) • calculate p̂=x/n for each of the samples (x=total count of successes) • if n is large enough, then p̂ will follow a normal distribution (by the central limit theorem) • The mean of this distribution is p • The standard deviation is which is also called the standard error

  33. Sampling distribution of proportions • So if p̂ follows a normal distribution with mean p and standard deviation • Then ~ N(0,1) • Considered valid when np≥5 and n(1-p) ≥5

  34. Sampling distribution of proportions • What proportion of samples of size 50 (n) from a population with p=.10 will have a p̂of .20 or higher? • Calculate the Z value for P(p̂ ≥ 0.20)? • (0.2 – 0.1) / √(0.1(1-0.1)/50 = 2.36 • Now want the P (Z>=2.36) ** remember correct Stata code** . display 1-normal(2.36) .00913747

  35. Confidence intervals for proportions • Lower 95% confidence limit: • Upper 95% confidence limit: • However we don’t know p (if we did we wouldn’t be calculating these intervals). So we substitute p̂ into the formula for the SEM. • Lower 95% confidence limit: • Upper 95% confidence limit:

  36. Confidence intervals for proportions • HIV prevalence in those testing at Mulago Hospital • Sample population n = 3389 (n) • n HIV+ = 1003 • Prevalence = 1003/3389 = 0.296 (p) • (.296 – 1.96*(√ [ .296*(1-.296)/3389 ]) , .296 + 1.96*(√ [ .296*(1-.296)/3389 ]) • (.281, .311) • Interpretation: we are 95% confident that the interval 0.281-0.311) includes the true HIV prevalence in the population

More Related