1 / 89

MBP1010 - Lecture 2: January 14, 2009

MBP1010 - Lecture 2: January 14, 2009. 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for the mean Hypothesis testing (1 sample t test). Reading: Introduction to the Practice of Statistics:

Download Presentation

MBP1010 - Lecture 2: January 14, 2009

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MBP1010 - Lecture 2: January 14, 2009 • 1. Density curves and standard • normal distribution • 2. Sampling distribution of the mean • 4. Confidence Interval for the mean • Hypothesis testing • (1 sample t test) Reading: Introduction to the Practice of Statistics: 1.3, 3.4, 5.2, 6.1-6.4 and 7.1

  2. Standard deviation vs standard error for describing data Table 1. Characteristics of study subjects (n=35)

  3. Importance of Normal Distribution* 1. Distributions of real data are often close to normal. 2. Mathematically easy to work with so many statistical tests are designed for normal (or close to normal) distributions). 3. If the mean and SD of a normal distribution are known, you can make quantitative predictions about the population. * also called Gaussian curve

  4. Red bars = scores  6 Proportion = 0.303

  5. Red area under the density cure are  6. Proportion = 0.293

  6. Cumulative proportion for value x is the proportion of all observations that are  x; this is the area to the left of the curve.

  7. “The 68-95-99.7 Rule” Mean = 64.5 inches SD = 2.5 inches

  8. The standard normal distribution is: a normal distribution with a mean of 0 and a SD of 1. Normal distributions can be transformed to standard normal distributions by the formula: where X is a score from the original normal distribution, μ is the mean of the original normal distribution, and σ is the standard deviation of original normal distribution. The standard normal distribution is sometimes called the z distribution.

  9. Standardized Normal Distribution

  10. Z-score A z score always reflects the number of standard deviations above or below the mean a particular score is. Ex. If a person scored 70 on a test with mean of 50 and SD of 10, then they scored 2 standard deviations above the mean. Converting the test scores to z scores, an X of 70 would be: So, a z score of 2 means the original score was 2 SD above the mean.

  11. Z Scores • Provide a meaningful way to compare individuals from • different normal distributions – on the same scale • Ie. How many SD above or below the mean? • Eg, - bone density measures • - growth charts – height of children at different ages • - “normalized” data

  12. Quantile-Quantile (Q-Q) Plot QQ-plot shows the theoretical quantiles versus the empirical quantiles. If the distribution is “normal”, we should observe a straight line.

  13. Rice Virtual Lab in Statistics http://onlinestatbook.com/rvls/ Hyperstat Online Section 5. Normal Distribution - theory

  14. Sampling and Estimation

  15. Populations and Samples Population: entire group of individuals that we want information about Sample: a part of the population that we actually examine in order to gather information Goal: to try to draw conclusions about the population from the sample

  16. Whole Population Mean =  SD =  Sample Inference Sample Mean = x SD = s

  17. Parameter: - a number that describes the population - number is fixed but in practice we do not know its value (eg, μ) Statistic: - a number that describes a sample (eg, x). - its value is known when we take a sample, but it can change from sample to sample. - often used to estimate an unknown parameter .

  18. Statistical inference is the process by which we draw conclusions about the population from the results observed in a sample.. Two main methods used in inferential statistics: estimation and hypothesis testing. In estimation, the sample is used to estimate a parameter and a confidence interval about the estimate is constructed.

  19. Random Sampling is Key! - every individual in the population sampled must have a chance of being included in the sample - the choice of one subject does not influence the chance of other subjects being chosen - use a method of sampling in which chance alone operates - toss of a coin, draw from a hat - random number generators - random assignment in clinical trials results in randomly selected groups

  20. Simple Random Sampling (SRS) - the chances for each individual in the population to be selected is equal - every possible sample an equal chance to be chosen Stratified Sampling - divide the population into strata - choose SRS in each stratum - combine these SRS to form full sample eg. Strata: prognostic factors in cancer patients; male/female, age - consult a statistician for more complex sampling

  21. Sample mean (x) as an estimator of the population mean () What would happen if we repeated the sample several times? Sampling variability: - repeated samples from the same population will not have the same mean - depends partly on how variable the underlying population is and on the size of the sample selected

  22. Sampling Distribution of X - the distribution of values taken by the mean (x) in all possible samples of the same size from the same population -

  23. 1. Mean of sampling distribution of x =  2. SD of sampling distribution = - called standard error of the mean 3. Shape of the sampling distribution is approximately a normal curve, regardless of the shape of the population distribution, provided n is large enough (Central Limit Theorem)

  24. Simulation of Sampling Distribution Central Limit Theorum Rice Virtual Lab in Statistics http://onlinestatbook.com/rvls/

  25. Population: All MBP1010 students n=37  = 1.00 cup  = 1.07 cups

  26. Population One Randomly n=37 Selected Sample n=12 x = 0.875 s = 0.78  = 1.00  = 1.07

  27. Population Sampling Distribution n=37 1000 repeats of n=12  = 1.00  = 1.07 Mean = 1.00 SD = 0.26

  28. Population Sampling Distribution One Sample n=37 1000 repeats of n=12 n=12 x = 0.875 s = 0.78 SEM = 0.23 Mean = 1.00 SD = 0.26  = 1.00  = 1.07 (SEM) s/n

  29. Confidence Interval of the Mean

  30. Standard Normal Distribution

  31. 95% Confidence Interval = 0.95 =0.025 =0.025 -1.96 1.96 2.5 th 97.5 th

  32. 95% Confidence Interval for a population mean If population  known (not realistic) Pr (-1.96  z  1.96) = 0.95 Pr (-1.96   1.96) = 0.95 Pr (x -1.96/n    x + 1.96/n ) = 0.95 x - 1.96(/n) and x + 1.96(/n) are the 95 percent confidence intervals on the population mean  x -  /n Express x in standardized form: z statistic

  33. 24 out of 25 samples included  (96%) In the long run, 95% of all samples will have an interval that includes .

  34. 90% Confidence Interval = 0.90 =0.05 =0.05 -1.645 1.645 5 th 95 th

  35. Confidence Interval for a population mean population  NOT known (usual) - use sample standard deviation (s) as an estimate of  - therefore, /nestimated from sample using: s/n (standard error of the mean;SE) - SE of the sample is the estimate of the SD that would be obtained from the means of a large number of samples drawn from that population

  36. Problem: Critical Ratio = x -  s/n is not normally distributed -need to consider reliability of both x and s as estimators of  and  respectively - shape of the distribution depends on the sample size n x -  s/n Therefore follows the t distribution

  37. t - distribution - a family of distributions indexed by the degrees of freedom (n-1) - degrees of freedom refer to number of independent quantities among a series of numerical quantities

  38. Degrees of Freedom For SD: - there are n deviations around the mean - there is one restriction: sum of deviations = 0 - therefore once we have calculated n-1 deviations around the mean, the last number would be already determined as the sum must be 0 (ie. not independent). - for n deviatons around the mean there are n-1 degrees of freedom (DF)

  39. 95% Confidence Interval for a population mean population  NOT known (usual) A sample consists of 25 mice with a mean tumor size of 2.1 cm and SD = 1.9 cm. x - t24,0.975 x s/n, x + t24,0.975 x s/n t24,0.975 = 2.064 (from tables of t dist) 2.1 - (2.064 x 1.9/  25), 2.1 + (2.064 x 1.9/  25) = 1.32 , 2.88 cm

  40. Confidence interval for a Mean Estimate of mean tumor size = 2.1 cm; n=25. 95% CI = 1.32 , 2.88 cm Interpretation: - 95% of the intervals that could be constructed from repeated random samples of size 25 contain the true population mean  - we are 95% confident that the mean tumor size is between 1.32 and 2.88 cm.

  41. Factors affecting the length of the confidence interval x  tn-1, .975 x s/n s/n = SE Sample size: as n increases, length of the CI decreases variation: as s, which reflects variability of the distribution of observations, increases, the length of the CI increases level of confidence: as the confidence desired increases (ie 90,95, 99% CI), the length of the CI increases.

  42. Standard deviation vs standard error for describing data Table 1. Characteristics of study subjects (n=35)

  43. Standard deviation vs standard error for describing data If the purpose is to describe the data (eg. to see if subjects are typical): standard deviation - variability of the observations If the purpose is to describe the results (outcome) of the Study: standard error confidence interval - precision of the estimate of a population parameter • Note: • can calculate one from the other • indicate clearly whether reporting SD or SE

  44. What Formal Statistical Inference Cannot Do • tell you what population you should be interested in • ensure that you sampled properly from the population • determine whether measurements made are • biased (systematically wrong) • DOES: • - give a quantitative indication of how much random • variation may have affected your results

  45. What/who are we trying to study? Target Population Patients with All rheumatoid voters arthritis Population Sampled Patients admitted telephone to a particular listings hospital Sample Studied Sample of sample of records of above listings above patients

More Related