1 / 27

Module 7: Comparing Datasets and Comparing a Dataset with a Standard

Module 7: Comparing Datasets and Comparing a Dataset with a Standard. How different is enough?. Concepts. Independence of each data point Test statistics Central Limit Theorem Standard error of the mean Confidence interval for a mean Significance levels How to apply in Excel.

trula
Download Presentation

Module 7: Comparing Datasets and Comparing a Dataset with a Standard

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?

  2. Concepts • Independence of each data point • Test statistics • Central Limit Theorem • Standard error of the mean • Confidence interval for a mean • Significance levels • How to apply in Excel module 7

  3. Independent Measurements • Each measurement must be independent (shake up basket of tickets) • Example of non-independent measurements • Public responses to questions (one result affects next person’s answer) • Samplers too close together, so air flows affected module 7

  4. Test Statistics • Some number calculated based on data • In student’s t test, for example, t • If t is >= 1.96 and • population normally distributed, • you’re to right of curve, • where 95% of data is in inner portion, symmetrically between right and left (t=1.96 on right, -1.96 on left) module 7

  5. Test statistics correspond to significance levels • “P” stands for percentile • Pth percentile is where p of data falls below, and 1-p fall above module 7

  6. Two Major Types of Questions • Comparing mean against a standard • Does air quality here meet NAAQS? • Comparing two datasets • Is air quality different in 2006 than 2005? • Better? • Worse? module 7

  7. Comparing Mean to a Standard • Did air quality meet CARB annual standard of 12 microg/m3? module 7

  8. Central Limit Theorem (magic!) • Even if underlying population is not normally distributed • If we repeatedly take datasets • These different datasets have means that cluster around true mean • Distribution of these means is normally distributed! module 7

  9. Magic Concept #2: Standard Error of the Mean • Represents uncertainty around mean • As sample size N gets bigger, error gets smaller! • The bigger the N, the more tightly you can estimate mean • LIKE standard deviation for a population, but this is for YOUR sample module 7

  10. For a “large” sample (N > 60), or when very close to a normal distribution… Confidence interval for population mean is: Choice of z determines 90%, 95%, etc. module 7

  11. For a “Small” Sample Replace Z value with a t value to get… …where “t” comes from Student’s t distribution, and depends on sample size module 7

  12. Student’s t Distribution vs. Normal Z Distribution module 7

  13. Compare t and Z Values module 7

  14. What happens as sample gets larger? module 7

  15. What happens to CI as sample gets larger? For large samples Z and t values become almost identical, so CIs are almost identical module 7

  16. First, graph and review data • Use box plot add-in • Evaluate spread • Evaluate how far apart mean and median are • (assume sampling design and QC are good) module 7

  17. Excel Summary Stats module 7

  18. Use the box-plot add-in • Calculate summary stats N=77 module 7

  19. Our Question • Can we be 95%, 90%, or how confident that this mean of 14.78 is really greater than standard of 12? • We saw that N = 77, and mean and median not too different • Use z (normal) rather than t module 7

  20. The mean is 14.8 +- what? • We know equation for CI is • Width of confidence interval represents how sure we want to be that this CI includes true mean • Now, decide how confident we want to be module 7

  21. CI Calculation • For 95%, z = 1.96 (often rounded to 2) • Stnd error (sigma/N) = (8.66/square root of 77) = 0.98 • CI around mean = 2 x 0.98 • We can be 95% sure that mean is included in (mean +- 2), or 14.8-2 at low end, to 14.8 + 2 at high end • This does NOT include 12 ! module 7

  22. Excel can also calculate a confidence interval around the mean Mean, plus and minus 1.93, is a 95% confidence interval that does NOT include 12! module 7

  23. We know we are more than 95% confident, but how confident can we be that Ft Smith mean > 12? • Calculate where on curve our mean of 14.8 is, in terms of z (normal) score… • …or if N small, use t score module 7

  24. To find where we are on the curve, calc the test statistic… • Ft Smith mean = 14.8, sigma =8.66, N =77 • Calculate test statistic, in this case the z factor (we decided we can use the z rather than the t distribution) • If N was < 60, test stat is t, but calculated the same way Data’s mean module 7 Standard of 12

  25. Calculate z Easily • Our mean 14.8 minus standard of 12 (treat real mean m (mu) as standard) is numerator (= 2.8) • Standard error is sigma/square root of N = 0.98 (same as for CI) • so z = (2.8)/0.98 = z = 2.84 • So where is this z on the curve? • Remember, at z = 3 we are to the right of ~ 99% module 7

  26. Where on the curve? Z = 2 Z = 3 So between 95 and 99% probable that the true mean will not include 12 module 7

  27. You can calculate exactly where on the curve, using Excel • Use Normsdist function, with z If z (or t) = 2.84, in Excel Yields 99.8% probability that the true mean does NOT include 12 module 7

More Related