220 likes | 340 Views
Statistics Review II. Empirical Rule. Many naturally occurring variables have bell-shaped distributions. That is, their histograms take a symmetrical and unimodal shape (statistics is open-minded as always). E.g., Shoe size, adult waistline, self-esteem, annual temperature Empirical rule:
E N D
Empirical Rule Many naturally occurring variables have bell-shaped distributions. That is, their histograms take a symmetrical and unimodal shape (statistics is open-minded as always). E.g., Shoe size, adult waistline, self-esteem, annual temperature Empirical rule: If the variable’s histogram of values is approximately bell-shaped, then: • About 68% of the values are between Y – s.d. and Y + s.d. • About 95% of the values are between Y – 2s.d. and Y + 2s.d. • All or nearly all values are between Y – 3s.d. and Y + 3s.d.
Empirical Rule Empirical rule: If the variable’s histogram of values is approximately bell-shaped, then: • About 68% of the values are between Y – s.d. and Y + s.d. • About 95% of the values are between Y – 2s.d. and Y + 2s.d. • All or nearly all values are between Y – 3s.d. and Y + 3s.d. -----------------Body Pile: 100% of Cases-------------------- s.d. 15 15 15 s.d. 15 M = 100 s.d. = 15 55 70 85 115 130 145 + or – 1 s.d. + or – 2 s.d. + or – 3 s.d.
Normal Probability Distribution The Normal Probability Distribution: A continuous probability distribution in which the horizontal axis represents all possible values of a variable and the vertical axis represents the probability of those values occurring. Values are clustered around the mean in a symmetrical, unimodal pattern known as the bell-shaped curve or normal curve. . Note: All five distributions in the graph are normal curves that are normal probability distributions—because the height represents probabilities for values rather than cases
Equivalence of ALL Normal Probability Distributions The Normal Probability Distribution No matter what the actual s.d. () value is, the proportion of cases under the curve that corresponds with the mean ()+/- 1s.d. is the same (68%). The same is true of mean+/- 2s.d. (95%) And mean +/- 3s.d. (almost all cases) Because of the equivalence of all Normal Distributions, these are often described in terms of the Standard Normal Curve where mean = 0 and s.d. = 1 ... and the standard deviation is referred to as “z” Z=1s.d.=6 Z=1s.d.=12
Z Write this down. Memorize it. Let it be your dying words . . . “’Z’ is the number of standard deviations away from the mean of a normal curve.” Dream about it. zzzzzzzz…. 68% 68% Z = -3 -2 -1 0 1 2 3 Z=-3 -2 -1 0 1 2 3
Z-score Conversion Z = Y – = Deviation Standard Deviation It is like having two rulers beneath the normal curve. One for data values, the second for z-scores: What is the z for 100? 145? 70? and 105? Z = 100 – 100 / 15 = 0 Z = 145 – 100/ 15 = 45/15 = 3 Z = 70 – 100 / 15 = -30/15 = - 2 Z = 105 – 100 / 15 = 5/15 = 0.33 IQ = 100 = 15 I call this the “flippy ruler!” Values 55 70 85 100 115 130 145 Z-scores -3 -2 -1 0 1 2 3
Practice with the Normal Kurve Kitten Example: Assume this: The mean cost of a kitten is $500, and the standard deviation is $100. Is that too low? Z = Y – How many z’s up or down is a $500 kitten? $600? $200? $550? Z = 500 – 500 / 100 = 0 Z = 600 – 500 / 100 = 100/100 = 1 Z = 200 – 500 / 100 = -300/100 = -3 Z = 550 – 500 / 100 = 50/100 = .5 Kittens = $ 500 = 100 Values $ 200 300 400 500 600 700 800 Z-scores -3 -2 -1 0 1 2 3
Applied Empirical Rule The curve can be used with the empirical rule to determine relative position in a group. Mean cost of kittens is $500, and the standard deviation is $100. Now, use the empirical rule… What percentage of kitties will cost more or less than my preferred kitten price of $300? Use the flippy ruler and apply the empirical rule. Z = ? Kitties = 500 = 100 2.5% 2.5% 68% Values 200 300 400 500 600 700 800 Z-scores -3 -2 -1 0 1 2 3
Z-score Conversion Z = Y – Another Use: To compare different normal curves, it is helpful to know how to convert data values into z-scores. It is like having a flippy ruler beneath each normal curve. Z can be used to find equal scale locations. Like comparing cats and dogs. Dogs ??? Cats The “flippy ruler!” Values 200 300 400 500 600 700 800 Z-scores -3 -2 -1 0 1 2 3
Comparing two distributions by using Z-scores Imagine that your friend prefers dogs over cats. You want to know who is more willing to purchase an expensive breed. You could convert her dog price into a cat price by using z-scores. The average cat costs $500 with a standard deviation of $100. You paid $600. Your friend’s dog cost $2,000. The average dog costs $1,500 with a s.d. of $250. What is the equivalent cat price? Was she more willing to spend a lot than you? Dogs Cats 200 300 400 500 600 700 800 750 1000 1250 1500 1750 2000 2250 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
Comparing two distributions by using Z-scores • Y1 – M1/ s.d.1 = z = Y2 – M2/s.d.2 • Y1 – M1/ s.d.1 = z = $2000 – $1500 / $250 • Y1 – M1/ s.d.1 = z = 2 • Y1 – $500/ $100 = 2 • Y1 – $500 = $200 • Y1 = $700
Normal Curve Empirical Rule Again: Your friend’s dog cost $2,000. The average dog costs $1,500 with a s.d. of $250. What is the equivalent cat price? What percentage of persons get dogs that cost less than your friends”? Dogs Cats 200 300 400 500 600 700 800 750 1000 1250 1500 1750 2000 2250 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
Comparing two distributions by using Z-scores Comparing two distributions by Z-score What about ACT versus SAT scores? 32 ACT ≈ ___ SAT; 800 SAT ≈ ___ ACT NOTE: This is a helpful process, but can be illogical at times. Remember that you are comparing scores on a “population base” or percent of people above or below each score. Is it logical to compare SAT score to self-esteem this way? No. SAT ACT 15 18 21 24 27 30 33 400 600 800 1000 1200 1400 1600 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
Empirical Probability Distribution So far, we have been discussing empirical data—things were actually measured or recorded Empirical probability distribution: All the outcomes in a distribution of research results and each of their probabilities The probability distribution of a variable lists the possible outcomes together with their probabilities E.G., Age of new JS students last year.
Theoretical Probability Distribution Theoretical probability distribution: The proportion of times we would expect to get a particular outcome in a large number of trials of generating outcomes. 1. For Example: Average age of a random sample of JS students. Expected distribution of average age across all possible samples. AGE 18 19 20 21 22 23 24 25 26 27 2. For Example: Coin toss at the beginning of a football game. Expected proportion of “heads” thrown for 50 Super Bowl kickoffs. 1 25 50 Proportions for each age (vertical) add to 1—100% of samples. What are the odds of getting 50 consecutive “heads?” Proportions for each success of getting heads add to 1—100 percent of samples of 50 coin tosses.
Empirical versus Theoretical Probability Distributions Empirical: Distribution of Real Data Values Obtained in Research. (Our observations of how something is distributed in our universe.) Theoretical: Expected Distribution of Values for Research and Statistical Use. (Our understanding of how something is distributed in our universe.)
Theoretical Probability Distribution Q: Why are theoretical PD’s important? A: Researchers usually get only one chance to take a sample from a population, getting data from only one sample out of multiple possibilities. When we see (theoretically, not actually) what kind of variation in measurement we could have gotten had we collected data from other possible samples (theoretically, not actually) . . . We are able to judge the likelihood that numbers produced by our selected sample reflect a reality about the population from which they came.
Probability Distributions Theoretical probability distribution: The number of times we would expect to get a particular outcome in a large number of trials. For Example: Let’s say the mean GPA at SJSU is 2.5. Randomly take 100 SJSU students’ GPAs. Record it. Now, take 100 more SJSU students’ GPAs. Record that. Now, repeat the above. Record again. Now, lather, rinse, repeat. Again. Again. And on and on. What might you see?
Probability Distributions Theoretical probability distribution: The number of times we would expect to get a particular outcome in a large number of trials. 50% of samples would have a mean GPA greater than 2.5 1.3 1.5 1.7 1.9 2.1 2.3 2.52.7 2.9 3.1 3.3 3.5 3.7 3.9 = a sample’s mean 2.5 = Samples’ Mean of Means 2.5 = Population Mean
We are able to judge the likelihood that numbers produced by our selected sample reflect a reality about the population from which they came. • The probability distribution tells us the likelihood that our sample’s value could be obtained, as it is one of a variety of possible samples. Population Mean is Known Sampling variabilityisknown… 1.3 1.5 1.7 1.9 2.1 2.3 2.52.7 2.9 3.1 3.3 3.5 3.7 3.9 = our sample’s mean
We are able to judge the likelihood that numbers produced by our selected sample reflect a reality about the population from which they came. • The probability distribution tells us the likelihood that each value on the scale is the population parameter given our sample’s value, knowing that there is a variety of possible samples. Population Mean is Unknown Sampling variability is known… 1.3 1.5 1.7 1.9 2.1 2.3 2.52.7 2.9 3.1 3.3 3.5 3.7 3.9 = our sample’s mean