1 / 19

Introductory Statistics for Laboratorians dealing with High Throughput Data sets

Introductory Statistics for Laboratorians dealing with High Throughput Data sets. Centers for Disease Control. Problem 7: Dispersion. Prepare 2 line graphs, one for males and one for females using the data presented below. Put both line graphs on the same axes. . Problem 7: Dispersion.

uttara
Download Presentation

Introductory Statistics for Laboratorians dealing with High Throughput Data sets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control

  2. Problem 7: Dispersion • Prepare 2 line graphs, one for males and one for females using the data presented below. • Put both line graphs on the same axes.

  3. Problem 7: Dispersion

  4. Problem 7: Dispersion

  5. Problem 7: Dispersion

  6. Problem 7: Dispersion • How can we quantify the difference between the men and the women in this problem. • Compute the mean (average) for the men. • Compute the mean (average) for the women.

  7. Problem 7: Dispersion • What are the highest and lowest scores for the men? • What are the highest and lowest scores for the women? • Count the number of scores from lowest to highest. This number is called the Range of the scores. • In this case the Range doesn’t help us describe the difference between the males and the females. We need better measures of dispersion.

  8. Problem 8: Dispersion • For the following data: • What is the highest and lowest score? • What is the Range? (count the number of scores from the lowest to the highest.) • What is the Mean (average)? • How far is each person from the Mean? (Fill in the column. Always subtract the mean from the score. )

  9. Problem 8: DispersionData Table

  10. Problem 8: Dispersion • Compute the “Sum of Squared Deviations from the Mean” (SS) for this data set (or sample or whatever you call it). • Compute the variance of the sample. • Compute the standard deviation of the sample.

  11. Dispersion Definitions • The range is the number of scores from the smallest to the largest. • Deviation Score = Score – Mean • Always subtract the mean from the score • Always preserve the sign (positive or negative) • The total of the deviation scores is always zero • Sum Squares = Total of the squared deviation scores. (SS) • Variance = SS/N • Standard Deviation = square root of variance

  12. Standard Deviation • Surely there is an easier way to measure dispersion than using all this squaring and square rooting. • Turns out, the standard deviation is the exact point on a normal curve where the second derivative is zero. • If you were skiing down the slope, it would get steeper and steeper then it would start to flatten out. That point is the standard deviation. • That’s why it is the preferred measure of dispersion.

  13. Standard Deviation

  14. Problem 9 • Given the following collection of scores: 2, 3, 5, 6, 6, 8 • Calculate the range of the scores • Calculate the sum of squares • Calculate the variance • Calculate the standard deviation

  15. Problem 9Data Table

  16. Normal distributions Normal—or Gaussian—distributions are a family of symmetrical, bell- shaped density curves defined by a mean m (mu) and a standard deviation s (sigma): N (m, s). x x e = 2.71828… The base of the natural logarithm π = pi = 3.14159…

  17. A family of density curves Here the means are the same (m = 15) while the standard deviations are different (s = 2, 4, and 6). Here the means are different (m = 10, 15, and 20) while the standard deviations are the same (s = 3).

  18. All Normal curves N (m, s) share the same properties • About 68% of all observations are within 1 standard deviation (s) of the mean (m). • About 95% of all observations are within 2 s of the mean m. • Almost all (99.7%) observations are within 3 s of the mean. Inflection point mean µ = 64.5 standard deviation s = 2.5 N(µ, s) = N(64.5, 2.5) Reminder: µ (mu) is the mean of the idealized curve, while is the mean of a sample. σ (sigma) is the standard deviation of the idealized curve, while s is the s.d. of a sample.

  19. Definitions: Statistical Symbols • In an actual sample • Scores are represented by • Mean = • Deviation Score • Standard Deviation = s • Variance = s2 • In a theoretical distribution (density curve) • Mean = μ • Standard Deviation = σ • Variance = σ2

More Related