1 / 43

Psychology 10

Psychology 10. Analysis of Psychological Data March 3, 2014. The Plan for Today. The binomial distribution. Introducing the idea of the sampling distribution. The central limit theorem. Introduction to hypothesis testing. The probability distribution.

cathal
Download Presentation

Psychology 10

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Psychology 10 Analysis of Psychological Data March 3, 2014

  2. The Plan for Today • The binomial distribution. • Introducing the idea of the sampling distribution. • The central limit theorem. • Introduction to hypothesis testing.

  3. The probability distribution • Each person tossed the coin 16 times and observed the number of heads. • What are the possible values of that random variable? • Now, what is the probability that (assuming the coin is fair) someone would get zero heads? • Multiplication rule: ½16 = .000015258. • So only about 1½ out of every 100000 people who tried this would get zero heads.

  4. The probability distribution (cont.) • The same holds true for 16heads. • For 1 head in 16 tosses, things are a little more complex. • A person might get HTTTTTTTTTTTTTTT; the probability of that outcome would be ½16 = .000015258. • But you can also get THTTTTTTTTTTTTTT (with the same probability).

  5. The probability distribution (cont.) • In all, there are 16 mutually exclusive ways to get one head in 16 tosses, each of which occurs with probability ½16 = .000015258. • So the probability of getting one head by the first way OR the second way OR the third way… (etc.) is the sum of the probabilities: 16 × .000015258 = .00024414.

  6. The probability distribution (cont.) • Things get pretty complicated pretty fast. For example, there are 120 ways to get exactly 2 heads, so the probability of 2 heads in 16 tosses is 120 × ½16 = .001831054. • By the time we get to 8 heads, there are 12870 ways, so the probability is 12870 × ½16 = .196380615.

  7. The probability distribution (cont.) • We’ll spare ourselves the work of figuring out all those probabilities; the table now on the document camera shows the result. • Now I want you to imagine that you actually had a purpose when you tossed the coin. • Your purpose was to estimate the actual probability of heads for your coin.

  8. The sampling distribution • If, for example, you got 11 heads, then your best guess would be that the probability of heads for your coin was 11/16 = .6875. • On the other hand, if you got 8 heads, your best guess would be .5000. • The table on the document camera now has replaced the left column with the corresponding estimated probabilities of success.

  9. The sampling distribution • This is exactly the same table we saw before, but now the left column represents a statistic. • That is, it’s still a random variable, but now it has a special purpose: to estimate something about a population. • We call the probability distribution of a statistic a sampling distribution.

  10. The sampling distribution (cont.) • The sampling distribution we are looking at right now is telling us what estimates of the probability of heads people would be likely (or unlikely) to make from an experiment with 16 tosses… • …if the true probability of heads is actually .5.

  11. Why is the sampling distribution important? • It is useful in decision making. • Imagine you are about to toss your coin 16 times again. • This time, though, I tell you that your purpose is to estimate the probability of heads (based on the outcome of your 16 tosses).

  12. Hypothesis testing. • Furthermore, I tell you that you are to decide on the basis of your outcome whether the coin is fair. • You will decide your coin is unfair only if your estimated probability of heads is so far from .5 that it would be a really rare to get an estimate this far (or further) from .5 if the real probability of heads is .5.

  13. Hypothesis testing • Let’s agree to define “really rare” as “occurring no more than 5% of the time.” • Then the estimates that would be “really rare” would be .0000, .0625, .1250, .1875, .8125, .8750, .9375, and 1.000. • The probabilities of those outcomes sum to .02. One step less extreme, and the probabilities sum to .058, which no longer is “really rare.”

  14. Hypothesis testing • So that means that if you got 0, 1, 2, 3, 13, 14, 15, or 16 heads, you would reject the idea that the real probability of success is .5. • How many people in the class would have concluded that they had an unfair coin?

  15. Hypothesis testing • Notice that we have just used probability as a tool to make an inference about what is true in the world. • Knowing the sampling distribution of the statistic was an important step in that process.

  16. Sampling distributions • When a random variable is a statistic, its uncertainty derives from the process of sampling. • The probability distribution of a statistic is therefore called a sampling distribution. • In our example, we used our coin toss experiment to estimate the probability that the coin comes up heads.

  17. Hypothesis testing • We used the sampling distribution to conduct a hypothesis test. • Our conclusion was that we did not have evidence that the probability of “heads” is different from ½.

  18. The sampling distribution of the mean • Often, in psychological research, we are going to be interested in what values means have… • …or in whether means differ because of some other variable. • We’ll start simply, by focusing on inference about a single mean.

  19. The Central Limit Theorem • The Central Limit Theorem informs us about the sampling distribution of the sample mean. • The theorem has two parts: • Part One is concerned with the mean and standard deviation of the sampling distribution; • Part Two is concerned with other aspects of shape of the sampling distribution.

  20. The CLT, Part One • Part One of the Central Limit Theorem states that: • The mean of the distribution of sample means is the population mean; • The variance of the distribution of sample means is the population variance divided by the sample size; • Equivalently, the standard deviation of the distribution of sample means is the population standard deviation divided by the square root of the sample size.

  21. CLT Part One (cont.) • Some of that information is comforting. The first bullet says that on average, the sample mean equals the population mean. • Another way of expressing that is to say that the sample mean is an unbiased estimate of the population mean. • The second bullet point implies that the larger the sample size, the closer on average the estimated mean will be to the population mean.

  22. Some technical vocabulary • It is easy to become confused about exactly what standard deviation we are talking about here. • To avoid confusion, the term standard error is used to describe the standard deviation of a sampling distribution. • We say that the standard error of the mean is population standard deviation divided by the square root of sample size.

  23. Formal notation

  24. The CLT, Part Two • Part Two of the Central Limit Theorem states that: • If the population of interest is normally distributed, then the sampling distribution of the means is also normal; • If the population is not normally distributed, then the sampling distribution of the means becomes normal as N becomes large.

  25. Central Limit Theorem Applet • Here is an applet that illustrates these principles: http://www.chem.uoa.gr/applets/appletcentrallimit/appl_centrallimit2.html

  26. Examples • Suppose a population of interest to us is normally distributed with a mean of 1000 and a standard deviation of 200. • What will the sampling distribution of the mean be if we consider taking a sample of size 25?

  27. m = 1000, s = 200, N = 25 • Part One: • The mean of the sampling distribution is 1000. • The standard error of the mean is 200/5 = 40. • Part Two: • The population is normally distributed, so the sampling distribution of the mean is also normal.

  28. m = 1000, s = 200, N = 25

  29. m = 1000, s = 200, N = 25 • What is the probability that a draw from the population is > 1050? • Z = (1050 – 1000) / 200 = 0.25. • From the table, the area above 0.25 is .4013. • What is the probability that a sample mean is > 1050? • Z = (1050 – 1000) / 40 = 1.25. • From the table, the area above 1.25 is .1056.

  30. Another example • Suppose a population of interest has a skewed distribution with a mean of 8 and a standard deviation of 4. • What will the sampling distribution of the mean be if we consider taking a sample of size 64?

  31. m = 8, s = 4, N = 64 • Part One: • The mean of the sampling distribution is 8. • The standard error of the mean is 4/8 = 0.5. • Part Two: • The population is skewed, but the sample size is relatively large, so the sampling distribution of the mean is approximately normal.

  32. m = 8, s = 4, N = 64

  33. m = 8, s = 4, N = 64 • What is the probability that a draw from the population is < 9? • Who knows? We don’t know anything about the distribution except that it is skewed. • What is the probability that a sample mean is < 9? • Z = (9 – 8) / .5 = 2.0. • From the table, the area below 2.0 is .9772.

  34. Using the sampling distribution of the mean • Suppose we have an interest in a statistics pill that will magically improve our comprehension of statistics. • We want to demonstrate the effectiveness of the pill, so we give one to everyone on the class. • Then we administer a test of statistics knowledge that ordinarily has a mean of 50 and a standard deviation of 10.

  35. Hypothesis test about a mean. • We think about this situation, and realize that if the pill is not effective, then the sampling distribution of the mean is normal with mean = 50 and standard error = 10 / √119 = 0.9166985. • Why do we know that? • We decide that if we get a sample mean that would be an unusual draw from that distribution, we can take it as evidence that the pill has an effect.

  36. Hypothesis testing (cont.) • Just to keep ourselves honest, let’s decide in advance what we’ll mean by “unusual.” • First, note that we would be interested if the pill helps, but we would also be interested if it hurts statistical knowledge. • We decide to define “unusual” as any mean that’s so far away from 50 that it would occur only 5% of the time if the true mean is 50.

  37. What is unusual? • If we divide that 5% between unexpectedly large means and unexpectedly small means, that’s 2½% in each tail of the sampling distribution. • What value separates off the bottom or top 2½% of a normal distribution with mean = 50 and standard deviation = 0.9166985? • We don’t know. But we do know that the value for a standard normal distribution is 1.96.

  38. The test statistic • So if we standardize our result, we can observe whether it is larger than 1.96 or smaller than -1.96 to decide if it’s “unusual.” • Suppose we actually give the pill, administer the test and get a mean of 52.1. • Z = (52.1 – 50) / 0.9166985 = 2.29083.

  39. Evaluating the test statistic • We said that if we got a Z bigger than 1.96 or smaller than -1.96 we would consider the result to be an unusual draw from the theoretical sampling distribution. • We got a Z of 2.29, which is unusual. • Accordingly, we conclude that we really weren’t sampling from the sampling distribution centered at 50.

  40. Interpreting the test • We agree that because our sample mean is unexpectedly large, this is evidence that our pill is effective. • (Comments on terrible experimental design.)

  41. Formalizing the process • What did we just do? • We stated what interested us. (Research hypothesis: H1: m≠ 50.) • We figured out the negation of that hypothesis. (Null hypothesis: H0: m= 50.) • We identified a statistic that would have a known distribution if the null hypothesis is true. Z = (M – m0) / sM has a standard normal distribution if the null is true.

  42. Formalizing the process (cont.) • We stated in advance of any observation of data how extreme a result we would consider convincing. (Setting the alpha level: a (two tailed) = .05.) • Finally, we observed data, calculated the statistic, and reached a conclusion.

  43. In-class exercises An elite private school claims that on average it raises children’s IQs. You decide to test that claim. You identify an IQ test that ordinarily has a mean of 100 and s = 15. A sample of 36 students from the school has a mean of 104. Identify the null hypothesis, and test it at an alpha level of .05.

More Related