1 / 68

Understanding Statistics: Mean, Variance, and Error Minimization

Learn how to calculate SS, variance, and standard deviation, and predict scores using the population mean. Explore sources of error and theoretical histograms in statistics.

Download Presentation

Understanding Statistics: Mean, Variance, and Error Minimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 1-6 Review

  2. Chapter 1 • The mean, variance and minimizing error

  3.  = = 1.79 (X- ) = 0.00 (X- )2 = SS = 16.00 X = 30 N = 5  = 6.00 To calculate SS, the variance, and the standard deviation: find the deviations from , square and sum them (SS), divide by N (2) and take a square root(). Example: Scores on a Psychology quiz Student John Jennifer Arthur Patrick Marie X 7 8 3 5 7 X -  +1.00 +2.00 -3.00 -1.00 +1.00 (X - )2 1.00 4.00 9.00 1.00 1.00 2 = SS/N = 3.20

  4. If you must make a prediction of someone’s score, say everyone will score precisely at the population mean, mu. • Without any other information, the mean is the best prediction. • The mean is an unbiased predictor or estimate, because the deviations around the mean sum to zero [(X- ) = 0.00]. • The mean is the smallest average squared distance from the other numbers in the distribution. So it is called a least squares predictor.

  5. Error is the squared amount you are wrong • When you predict that everyone will score at the mean, you are wrong. • The amount you are wrong is the difference between each score and the mean (X- ). • But in statistics, we square the amount that we are wrong when we measure error.

  6. 2 is precisely how much error we make, on the average, when we predict that everyone will score right at the mean. • Another name for the variance (2) is the “mean square for error”.

  7. Why doesn’t everyone score precisely at the mean? • Two sources of error • Random individual differences • Random measurement problems Because people will always be different from each other and there are always random measurement problems, there will always be some error inherent in our predictions.

  8. Theoretical histograms

  9. Rolling a die – Rectangular distributionThe mean provides no information 100 75 50 25 0 120 rolls - how many of each number do you expect? 1 2 3 4 5 6

  10. Normal Curve

  11. J Curve Occurs when socially normative behaviors are measured. Most people follow the norm, but there are always a few outliers.

  12. Principles of Theoretical Curves • Expected freq. = Theoretical relative frequency (N) • Expected frequencies are your best estimates because they are closer, on the average, than any other estimate when we square the error. • Law of Large Numbers - The more observations that we have, the closer the relative frequencies should come to the theoretical distribution.

  13. The Normal Curve

  14. The Z table and the curve • The Z table shows a cumulative relative frequency distribution. • That is, the Z table lists the proportion of the area under a normal curve between the mean and points further and further from the mean. • Because the two sides of the normal curve areexactly the same, the Z table shows only the cumulative proportion in one half of the curve. The highest proportion possible on the Z table is therefore .5000

  15. KEY CONCEPT The proportion of the curve between any two points on the curve represents the relative frequency of scores between those points.

  16. The mean The standard deviation Standard deviations 3 2 1 0 1 2 3 -3.00 -2.00 -1.00 0.00 1.00 2.00 3.00 Z scores Normal Curve F r e q u e n c y Measure |---34.13--|--34.13---| Percentages |--------47.72-----------|---------47.72--------| |------------------------------97.72--------------------------| Percentiles

  17. Z scores • A Z score indicates the position of a raw score in terms of standard deviations from the mean on the normal curve. • In effect, Z scores convert any measure (inches, miles, milliseconds) to a standard measure of standard deviations. • Z scores have a mean of 0 and a standard deviation of 1.

  18. score - mean Z = standard deviation 6’ - 5’8” Z = 3” 72 - 68 4 1.33 = = = 3 3 Calculating z scores What is the Z score for someone 6’ tall, if the mean is 5’8” and the standard deviation is 3 inches?

  19. 2100 Standard deviations 3 2 1 0 1 2 3 Production F r e q u e n c y Z score = ( 2100 - 2180) / 50 = -80 / 50 = -1.60 units 2030 2330 2080 2280 2130 2180 2230 What is the Z score for a daily production of 2100, given a mean of 2180 units and a standard deviation of 50 units?

  20. We have already seen these! Common Z table scores Z Proportion Score mu to Z 0.00 .0000 1.00 .3413 2.00 .4772 3.00 .4987 1.960 .4750 X 2 = 95% 2.576 .4950 X 2 = 99%

  21. CPE - 3.4 - Calculate percentiles Z Area Add to .5000 (if Z > 0) Scoremu to ZSub from .5000 (if Z < 0)ProportionPercentile -2.22 .4868 .5000 - .4868 .0132 1st -0.68 .2517 .5000 - .2517 .2483 25th +2.10 .4821 .5000 + .4821 .9821 98th +0.33 .1293 .5000 + .1293 .6293 63rd +0.00 .0000 .5000 + .0000 .5000 50th

  22. Proportion mu to Z for -1.06 = .3554 Proportion mu to Z for .37 = .1443 +0.37 -1.06 Area Area Add/Sub Total Per Z1Z2mu to Z1mu to Z2Z1 to Z2AreaCent -1.06 +0.37 .3554 .1443 Add .4997 49.97 % Proportion of scores between two points on opposite sides of the mean F r e q u e n c y -3.00 -2.00 -1.00 0.00 1.00 2.00 3.00 Z scores Percent between two scores.

  23. Proportion mu to Z for 1.12 = .3686 +1.12 +1.50 Area Area Add/Sub Total Per Z1Z2mu to Z1mu to Z2Z1 to Z2AreaCent +1.50 +1.12 .4332 .3686 Sub .0646 6.46 % Proportion of scores between two points on the same side of the mean Proportion mu to Z for 1.50 = .4332 F r e q u e n c y -3.00 -2.00 -1.00 0.00 1.00 2.00 3.00 Z scores Percent between two scores.

  24. Translating to and from Z scores, the standard error of the mean and confidence intervals

  25. X -  score - mean = Z =  standard deviation Definition If we know mu and sigma, any score can be translated into a Z score:

  26. Definition Conversely, as long as you know mu and sigma, a Z score can be translated into any other type of score: Score =  + ( Z *  )

  27. Scale scores Z scores have been standardized so that they always have a mean of 0.00 and a standard deviation of 1.00. Other scales use other means and standard deviations. Examples: IQ -  =100;  = 15 SAT/GRE -  =500;  = 100 Normal scores -  =50;  = 10

  28. Convert Z scores to IQ scores Z  (Z*)  + (Z * ) +2.67 +2.67 15 40.05 +2.67 15 +2.67 15 40.05 100 +2.67 15 40.05 100 140 -.060 15 -9.00 100 91

  29. X  (X-)  (X-)/  Translate to a Z score first, then to any other type of score Convert IQ scores of 120 & 80 to percentiles. 120 100 20.0 15 1.33 mu-Z = .4082, .5000 + .4082 = .9082 = 91st percentile, Similarly 80 = .5000 - .4082 = 9th percentile Convert an IQ score of 100 to a percentile. An IQ of 100 is right at the mean and that’s the 50th percentile.

  30. SAT  (X-)  (X-)/  SAT / GRE scores - Examples How many people out of 400 can be expected to score between 550 and 650 on the SAT? 550 500 50 100 0.50 650 500 150 100 1.50 Proportion mu to Z0.50 = .1915 Proportion mu to Z1.50 = .4332 Proportion difference = .4332 - .1915 = .2417 Expected people = .2417 * 400 = 96.68

  31. Raw  (X- ) Scale Scale Scale score (raw) (raw)  Z   score Midterm type problems:Double translations On the verbal portion of the Wechsler IQ test, John scores 35 correct responses. The mean on this part of the IQ test is 25.00 and the standard deviation is 6.00. What is John’s verbal IQ score? 35 25.00 10.00 6.00 1.67 6.00 1.67 100 15 125 Z score = 10.00 / 6.00 = 1.67 Scale score = 100 + (1.67 * 15) = 125

  32. The standard error = the standard deviation divided by the square root of n, the sample size

  33. Let’s see how it works • We know that the mean of SAT/GRE scores = 500 and sigma = 100 • So 68.26% of individuals will score between 400 and 600 and 95.44% will score between 300 and 700 • But if we take random samples of SAT scores, with 4 people in each sample, the standard error of the mean is sigma divided by the square root of the sample size = 100/2=50. • 68.26% of the sample means will be within 1.00 standard error of the mean from mu and 95.44% will be within 2.00 standard errors of the mean from mu • So, 68.26% of the sample means (n=4) will be between 450 and 550 and 95.44% will fall between 400 and 600

  34. What happens as n increases? • The sample means get closer to each other and to mu. • Their average squared distance from mu equals the standard deviation divided by the size of the sample. • The law of large numbers operates – the pattern of actual means approaches the theoretical frequency distribution. In this case, the sample means fall into a more and more perfect normal curve. • These facts are called “The Central Limit Theorem” and can be proven mathematically.

  35. Let’s make the samples larger • Take random samples of SAT scores, with 400 people in each sample, the standard error of the mean is sigma divided by the square root of 400 = 100/20=5.00 • 68.26% of the sample means will be within 1.00 standard error of the mean from mu and 95.44% will be within 2.00 standard errors of the mean from mu. • So, 68.26% of the sample means (n=400) will be between 495 and 505 and 95.44% will fall between 490 and 510. • Take random samples of SAT scores, with 2500 people in each sample, the standard error of the mean is sigma divided by the square root of 2500 = 100/50=2.00. • 68.26% of the sample means will be within 1.00 standard error of the mean from mu and 95.44% will be within 2.00 standard errors of the mean from mu. • 68.26% of the sample means (n=2500) will be between 498 and 512 and 95.44% will fall between 496 and 504

  36. CONFIDENCE INTERVALS

  37. We want to define two intervals around mu:One interval into which 95% of the sample means will fall. Another interval into which 99% of the sample means will fall.

  38. 95% of sample means will fall in a symmetrical interval around mu that goes from 1.960 standard errors below mu to 1.960 standard errors above mu • A way to write that fact in statistical language is: CI.95: mu + 1.960 sigmaX-bar or CI.95: mu - 1.960 sigmaX-bar < X-bar < mu + 1.960 sigmaX-bar

  39. As I said, 95% of sample means will fall in a symmetrical interval around mu that goes from 1.960 standard errors below mu to 1.960 standard errors above mu • Take samples of SAT/GRE scores (n=400) • Standard error of the mean is sigma divided by the square root of n=100/ = 100/20.00=5.00 • 1.960 standard errors of the mean with such samples = 1.960 (5.00)= 9.80 • So 95% of the sample means can be expected to fall in the interval 500+9.80 • 500-9.80 = 490.20 and 500+9.80 =509.80 CI.95: mu + 1.960 sigmaX-bar = 500+9.80 or CI.95: 490.20 < X-bar < 509.20

  40. 99% of sample means will fall within 2.576 standard errors from mu • Take the same samples of SAT/GRE scores (n=400) • The standard error of the mean is sigma divided by the square root of n=100/20.00=5.00 • 2.576 standard errors of the mean with such samples = 2.576 (5.00)= 12.88 • So 99% of the sample means can be expected to fall in the interval 500+12.88 • 500-12.88 = 487.12 and 500+12.88 =512.88 CI.99: mu + 2.576 sigmaX-bar = 500+12.88 or CI.99: 487.12 < X-bar < 512.88

  41. Chapter 5-Samples

  42. REPRESENTATIVE ON EVERY MEASURE • The mean of the random sample will be similar to the mean of the population. • The same holds for weight, IQ, ability to remember faces or numbers, the size of their livers, self-confidence, etc., etc., etc. ON EVERY MEASURE THAT EVER WAS OR CAN BE AND ON EVERY STATISTIC WE COMPUTE, SAMPLE STATISTICS ARE LEAST SQUARED, UNBIASED, CONSISTENT ESTIMATES OF THEIR POPULATION PARAMETERS.

  43. The sample mean is called X-bar and is represented by X. X = X / n X is the best estimate of , because it is a least squares, unbiased, consistent estimate. The sample mean

  44. Consistent estimation Population is 1320 students taking a test.  is 72.00,  = 12 Let’s randomly sample one student at a time and see what happens.

  45. Scores Mean Standard deviations 3 2 1 0 1 2 3 102 72 66 76 66 78 69 63 Test Scores F r e q u e n c y score 36 48 60 96 108 72 84 Sample scores: Means: 87 80 79 76.4 76.7 75.6 74.0

  46. More scores that are free to vary = better estimates Each time you add a score to your sample, it is most likely to pull the sample mean closer to mu, the population mean. Any particular score may pull it further from mu. But, on the average, as you add more and more scores, the odds are that you will be getting closer to mu.. Remember, if your sample was everybody in the population, then the sample mean must be exactly mu.

  47. Consistent estimators We call estimates that improve when you add scores to the sample consistent estimators. Recall that the statistics that we will learn are: consistent, least squares, and unbiased.

  48. SSW = (X - X)2 MSW = (X - X)2 / (n-k) Estimated variance Our best estimate of 2 is called the mean square for error and is represented by MSW. MSW is a least squares, unbiased, consistent estimate.

  49. s = MSW Estimated standard deviation The least squares, unbiased, consistent estimate of  is called s.

  50. (X - X)2 0.00 4.00 4.00 X 6.00 6.00 6.00 (X - X) 0.00 2.00 -2.00 (X-X)=0.00 X=18 N= 3 X=6.00 (X-X)2=8.00 = SSW s = MSW = 2.00 Estimating mu and sigma – single sample S# ABC X 6 8 4 MSW = SSW/(n-k) = 8.00/2 = 4.00

More Related