290 likes | 419 Views
Psychology 10. Analysis of Psychological Data April 2, 2014. The plan for today. Another example of the two-sample t test. Effect sizes related to the two-sample test. Introducing the idea of confidence intervals. Another two-sample t test example. Stereogram fusion experiment.
E N D
Psychology 10 Analysis of Psychological Data April 2, 2014
The plan for today • Another example of the two-sample t test. • Effect sizes related to the two-sample test. • Introducing the idea of confidence intervals.
Another two-sample t test example • Stereogram fusion experiment. • Source: the Data and Story Library (http://lib.stat.cmu.edu/DASL/) • Members of one group were given no information about the embedded image, or were told about it in words. • Members of the other group were shown a picture of the embedded image. • Interest was in how long it takes to fuse the stereogram.
Stereogram fusion (cont.) • Let’s call the groups Group 1 (not shown object) and Group 2 (shown object). • What are our research and null hypotheses? • m1–m2 = 0. • m1–m2≠ 0. • What should we do before we look at results? • a = .05.
Stereogram fusion (cont.) • For Group 1, n = 43, SX = 368.09998, and SX2 = 5896.80986. • For Group 2, n = 35, SX = 194.30001, and SX2 = 1862.57036.
The pooled variance estimate • SS = SX2 – (SX)2 / N. • For Group 1, this is 5896.80986 – 368.099982 / 43 = 2745.702993. • For Group 2, we have 1862.57036 – 194.300012 / 35 = 783.9276775. • The pooled variance is (SS1 + SS2) / (n1 + n2– 2). • Here, that is (2745.702993 + 783.9276775) / 76 = 46.44250882.
Standard error • The standard error of the difference between means is • Here, that is
The t statistic • The means of the two groups are 368.09998 / 43 = 8.560464651, and 194.30001 / 35 = 5.551428857. • Recall that • Here, that is
Making a decision • We have 43 + 35 – 2 = 76 degrees of freedom, so we’ll have to use 60 df in the table. • From the table, the critical value of t is 2.000. • (The real critical value for 76 df is 1.992.) • Our t had the value 1.939, so we fail to reject the null hypothesis.
Interpreting the results • Because we failed to reject the null hypothesis, we have been unable to show that there is a difference in mean fusion times between the population that was shown an image of the object and the population that was not shown the object. • Note that a full interpretation includes discussion of the ideas that were being tested.
Assessing the assumptions • Independence between groups: • We were not told there was random assignment. In fact we know nothing about how the groups were created. So we really can’t evaluate this. • Independence within groups: • We are told that in the first group, people either received no information or they received only verbal information. • This could create clusters of scores that are similar to each other. We should be concerned.
Assumptions (cont.) • Equal variances in the two populations: • We should compare the standard deviations. • SS1 / (n1-1) = 2745.702993 / 42 = 65.37388079. • The square root is 8.085. • SS2 / (n2-1) = 783.9276775 / 34 = 23.0566964. • The square root is 4.802.
Assumptions (cont.) • Those look pretty different. The first (8.1) is about 1.7 times the size of the second (4.8). • On the other hand, it’s not so hard to believe that they could both be estimating a population standard deviation around 6.5. • The evidence is right on the border of where we might worry. • (Brief rant about the F-max test.)
Assumptions (cont.) • What about normality? No Image Image 444333322222222 | 0 | 1111222222233333444 9998888766655 | 0 | 55666667999 33220000 | 1 | 0 75 | 1 | 566 200 | 2 | 0 | 2 | | 3 | | 3 | | 4 | 7 | 4 |
Assumptions (cont.) • There appears to be a problem with the assumption that both populations are normal. • Moreover, we have one observation that appears to be an extreme value. • We have at least three concerns about our assumptions, then. We really shouldn’t put much credence in this test (even if we had rejected the null hypothesis).
Effect sizes and the two-sample test • The equivalent of Cohen’s d for the two-sample situation (sometimes called Hedges’ g) is: • In our example, this is (8.560464651 – 5.551428857) / 6.81487409 = 0.44. • Note, however, that the time is measured in seconds. We understand seconds. • A better effect size, then, would be 8.560 – 5.551 = 3.01 seconds.
Effect sizes (cont.) • Your book also mentions r2, defined as t2/ (t2 + df). • In our example, this would be 1.9392 / (1.9392 + 76) = .047. • Note, however, that this measure of effect size is rarely used in practice.
Confidence Intervals • Sometimes, instead of testing a hypothesis about a parameter, we are interested in identifying a range of reasonable values for the parameter. • A logical way to do that is to figure out what values of the parameter would not lead to rejection if we used them in a null hypothesis.
Confidence Intervals • Example: the sample mean. • If we were conducting a one-sample t test, we would reject the null whenever • That will happen for null values between
Confidence Intervals • We call such a range of values a 100 times (1-alpha) percent confidence interval. • For example, if alpha = .05, the corresponding confidence interval is a 95% confidence interval. • For alpha = .01, the related confidence interval is a 99% confidence interval.
Example • In the stereogram fusion example, the first group had a mean of 8.5605, N = 43, and SS = 2745.702993. • If we were conducting a one-sample t test using that group, the variance would be 2745.702993 / 42 = 65.37388079, s = 8.085411603,and the standard error would be that value over √43, or 1.233.
Example (cont.) • If we were conducting a t test, our df would be 42, so the critical value of t would be approximately 2.021 (using 40 df). • A 95% confidence interval would be given by 8.5605 ± 2.021(1.233) = (6.07, 11.05). • (The actual critical value is 2.018, but the CI would round to the same bounds.)
Interpreting Confidence Intervals • A very common error is to say that there is a .95 probability that mis within the bounds of a particular interval. • That’s wrong. Once an interval has been calculated, mis either in it or not in it (with probability 1.0). We just don’t know which is true.
Interpretation (cont.) • What we can say is that the interval was calculated in such a way that, if we were to repeat the process of sampling and calculating an interval, 95% of the time the resulting interval would contain mu. • Therefore it makes sense to behave as though this particular interval contains mu. • More on interpretation next time.
Another confidence interval • We can also do a confidence interval for the difference between means. • In the dot fusion example, the estimated difference between means was 8.560464651-5.551428857 = 3.009036. • The standard error of the difference was 1.551446798. • The critical value of t was 1.992.
The 95% CI • The 95% confidence interval, then, is given by 3.009036 ± 1.992 × 1.551446798 = (-0.08, 6.10). • Interpretation?
Next time • Confidence intervals, continued. • The t test for repeated measures (also called the t test for related samples).
Exercise • Confidence interval using male runner data from last class.