510 likes | 608 Views
Psych 5500/6500. The t Test for a Single Group Mean (Part 2): p Values One-Tail Tests Assumptions. Fall, 2008. Reporting Results.
E N D
Psych 5500/6500 The t Test for a Single Group Mean (Part 2): p Values One-Tail Tests Assumptions Fall, 2008
Reporting Results When a t test analysis is reported in the literature the tcritical values and a curve with the rejection regions are rarely given. Usually there is not even an explicit statement about whether or not H0 was rejected. Instead, the first example from the previous lecture would commonly be reported like this: t(5) = 3.82, p=.012 The general format is: t(d.f.) = tobtained, p=...
p values Let’s take a closer look at what is reported: t(5) = 3.82, p=.012, or in generic terms, t(d.f.) = tobtained, p=... While the d.f. and the value of tobtained are of interest, the most important piece of information is the value of ‘p’, this tells you whether or not H0 was rejected.
1) Pragmatical Understanding of ‘p values’ The single most important thing to understand about a ‘p value’ is this: If p is less than or equal to your significance level then you reject H0, if it is greater than your significance level then you do not reject H0. t(5) = 3.82, p=.012 If our significance level was the usual .05, then we know that H0 was rejected. Even if you are unfamiliar with the statistic procedure i.e. χ²(5) = 6.43, p=.06 you can still easily tell if H0 was rejected of not simply by looking at the p value (here H0 is not rejected).
Examples In the following analyses H0 was rejected (assuming the significance level was set at .05): p=0.04 p<.05 p<.001 p=.05 In the following H0 was not rejected: p=0.07 p>.05 p=0.051 Giving a range for the p value (e.g. ‘p<.05’) used to be quite common because computing the exact value of p was difficult without a computer, now an exact value of p is usually provided.
2) Conceptual Understanding of ‘p values’ The ‘p value’ is a probability, in this case it is a conditional probability: p(getting a result that far or further from what H0 predicted | H0 is true) Let’s go back and take a look at the t test examples from the previous lecture.
First Example (Sample Mean = 106) Note that the obtained value of t falls within the ‘reject H0’ region.
H0 stated that the mean of the population was 100, the sample mean we obtained was 106. The ‘p value’ in this case would be the probability of obtaining a sample mean that is 6 or more away from 100 (in either direction).
The p value is the probability of obtaining a sample mean of 106 (t=3.82) or greater, or a sample mean of 94 (t=-3.82) or less, if this curve (based upon H0) is correct. In this case p = .012 (.006 on each tail). How to compute that is covered next, for now note that as the reject H0 regions add up to .05, then the p value of our result is obviously less than .05.
p=0.012 (.006 on each tail). The t table is not very useful for computing the p value here as the table covers only a small number of possible p values (you can’t look up t=3.82 and see what value of p goes with that). We have two tools we can use: 1) in the t tool I have written you can input the value of t and the df and the tool will give you the value of p; and 2) if you have SPSS analyze the data using the ‘One Sample t Test’, it will give you both the value of t and the value of p.
Second Example (Sample Mean = 102) Note the obtained value of t falls within the ‘do not reject H0’ region.
H0 stated that the mean of the population was 100, the sample mean we obtained was 102. The ‘p value’ in this case would be the probability of obtaining a sample mean that is 2 or more away from 100 (in either direction).
The p value is the probability of obtaining a sample mean of 102 (t=1.27) or greater, or a sample mean of 98 (t=-1.27) or less, if this curve (based upon H0) is correct. Note that p must be greater than .05 here, in this case p=.25 (.125 on each tail).
Important! Note again the ‘p value’ is the probability of the obtained sample mean given H0 is true, it is not the probability that H0 is true given the sample mean we obtained. This is a common mistake.
‘A’ = getting a sample mean that far from what H0 predicted. ‘B’ = H0 being true. p value = p(A|B). It would be nice if it were p(B|A), or in other words if it were the probability that H0 is true given our sample mean, because that is what we really want to know. To find the p(B|A), however, we would have to turn to Bayes Theorem (we will take a look at that again later in the semester).
3) Mechanical Understanding of ‘p values’ Looking back at the graphs with the rejection regions and the p values, we can see that another way to understand a p value is that it is what our significance level would have to be in order to reject H0. Thus p=.04 means that we could have set the significance level at .04 (rather than .05) and still have rejected H0. While p=.06 would mean we would have had to set our significance level at .06 (which is not allowed) in order to reject H0.
So What is Our Significance Level Really? We have to decide upon a significance level before we run an analysis (we have to set up our decision making criteria before we look at the results) and so we set our significance level to .05. Say we then analyze the data and find that p=.03 (this is often reported as ‘being significant at the .03 level’). Is our significance level .05 or .03? Authors often write as if it were .03 .
My Answer The significance level is .05 and α=.05. We would have rejected H0 if p=.05 or p=.049, that is our criterion. After the fact we see that we could have made p=.03 and still have rejected H0, but our decision-making -criterion was .05, and thus that was the actual probability of making a type 1 error if H0 were true.
On ‘Significance’ Dictionary definition of significance: ‘Full of meaning, important, momentous’. Statistical Significance: ‘We were able to reject H0 (i.e. the results were unlikely if H0 were true)’. When p is less than or equal to .05 we way the results were ‘statistically significant’, this does not necessarily mean the results are very meaningful or important or momentous. It does mean that we can conclude that H0 is probably false, which may or may not be important.
Example Let’s go back to: H0: μElbonia = 100 (same as USA) HA: μElbonia 100 (different than the USA) Let’s say that with a sample N=10,000 we obtain a sample mean of 101. As we will see in the lecture on power, with such a large N even such a small difference is likely to lead to rejection of H0.
The results are ‘statistically significant’ because we rejected H0. Whether the results are ‘theoretically significant’ would depend upon whether rejecting H0 sheds important light on the theory that predicted Elbonians would have an IQ other than 100. Whether the results are ‘socially significant’ would depend upon whether it significantly adds to our world to know that Elbonians are just a little tiny bit smarter on average than people in the USA.
Bottom Line Statistical significance is important because it is a prerequisite to making something out of the analysis. Whether that something is important or trivial is beyond the scope of statistics. People sometimes lose track of that, and begin to believe that getting statistically significant results is an end in itself.
Final Point about Significance (Really...I Promise) One view on statistical significance is that results are either statistically significant or they are not. To say something ‘neared significance’ or was ‘almost significant’ is meaningless (like saying someone is almost pregnant) at best and violates the logic of null hypothesis testing at worst. Others argue that .05 is arbitrary, and so a result of .06 is just about as good and probably reflects that the null hypothesis is wrong. More on this later in the semester when we tackle the controversy surrounding null hypothesis testing.
However, it seems like both sides agree that if the results are statistically significant (i.e. p.05) then it makes sense to say that a p=.001 is more convincing than a p=.049 (following along with the pregnancy metaphor: if you are pregnant you can either be at the beginning of the pregnancy or well advanced).
Segue Recall that when we run an experiment we are testing a theory that makes some prediction, and that prediction becomes our alternative hypothesis (HA). So far our examples have involved examining a theory which predicts a difference (i.e. μElbonia 100) but does not predict in which direction that difference will go (whether μElbonia will be less than or greater than 100). Such ‘nondirectional’ hypotheses are examined by putting a reject H0 region on both tails of the sampling distribution, and are thus called ‘two-tailed tests’.
One-tailed Tests When the theory we are testing specifically predicts in which direction the difference should go (i.e. we are testing a ‘directional’ hypothesis), then we perform a ‘one-tailed test’. First we will look at how to perform the t test when the theory predicts that Elbonians should have higher IQ’s than people in the USA.
Writing the Hypotheses If the theory predicts that the mean score of Elbonians is greater than 100: H0: μElbonia 100 HA: μElbonia> 100 Notes: • HA always state what the theory predicts. • Conceptually H0 is still the hypothesis of ‘no difference’, but it needs to be written the way it is to insure that the two hypotheses are mutually exclusive and exhaustive.
Setting up the Rejection Region Note that the full .05 is in one tail now, making it easier to reject H0 if the results go in that direction and impossible to reject H0 if the results go in the other direction. The new tc value is 2.015, for the two-tail test it was 2.571.
Determining Which Tail Gets the Rejection Region H0: μElbonia 100 HA: μElbonia> 100 • The conceptual approach: If HA is correct we will want to ‘reject H0’, so put the reject H0 region where HA predicts the results will fall (in this case above 100). • The idiot proof approach: pretend that the symbol in HA is an arrow, it points to the tail with the rejection region.
p Value With a one-tail test the p value is the area of the curve from the tobt value to the end of the tail that has the rejection region.
p Value The p value is the probability of getting a result that is that far from what H0 predicted if H0 is true. We can see that the p value must be less than .05, actually its p=.006.
Another Example Now let’s see how to set things up if we are testing a theory which specifically predicts that the mean IQ of Elbonians is less than 100.
Writing Hypotheses If the theory predicts that the mean score of Elbonians is less than 100: H0: μElbonia 100 HA: μElbonia<100 Again, HA expressed the prediction made by the theory, while H0 is every other possibility.
Which Tail? H0: μElbonia 100 HA: μElbonia<100 • The conceptual approach: If HA is correct we will want to ‘reject H0’, so put the reject H0 region where HA predicts the results will fall (in this case below 100). • The idiot proof approach: pretend that the symbol in HA is an arrow, it points to the tail with the rejection region.
p Value With a one-tail test the p value is the area of the curve from the tobt value to the end of the tail that has the rejection region..
p Value The p value is the probability of getting a result that is that far from what H0 predicted if H0 is true. We can see that the p value must be greater than .05, actually its p=.994.
The Trade-Off Doing a one-tail test is a bit of a gamble; if the theory is correct in its prediction then it is easier to reject H0 in favor of HA (i.e. the theory) because all .05 is put in the tail the theory predicts. If the results go in the opposite direction from that predicted by the theory, however, then it is impossible to reject H0 no matter how much a difference there is.
Justifications for Making a Test One-Tailed Bottom line: you have to decide that the test is one-tailed before you obtain your data. • Refer to results from prior, similar experiments. • You are testing a theory which specifically predicts which way the results should go. • Only one tail is of interest.
SPSS Analysis The t test for a single group mean can be found in the SPSS: ‘Analyze>>Compare Means>>One-Sample T Test...’ menu. For the value of the ‘Test Value’ input the population mean according to H0.
SPSS Analysis (cont.) When SPSS does a ‘One Sample T Test’ analysis it will give a value of ‘t’ and a value of ‘p’. The value of t is for tobtained. The value of p is for a two-tailed test. If you want the p value for a one-tailed test do the following: First, determine whether the difference between the sample mean and population mean proposed by H0 was in the direction predicted by Ha. Second, if the direction was predicted by Ha then the actual p value = (SPSS p value)/2, if the direction was the opposite of that predicted by Ha then the actual p value = 1 – ((SPSS p value)/2)). In the former case (Ha made the correct prediction) the p value goes down by half, in the latter case (Ha made the wrong prediction) the p value is greater than .50
SPSS Analysis (cont.) The confidence interval given by SPSS in this analysis is the interval that has the stated probability (95% unless you indicate otherwise) of containing the true value of the difference between the mean of the population from which the sample was drawn, and the mean of the population as stated by H0, i.e. the true value of:
SPSS Analysis (cont.) For the example when the sample mean was 106, the SPSS output states that the 95% confidence interval of the difference between the mean IQ of Elbonians and the value proposed by H0 is: 1.96 Difference 10.04 As that interval does not contain ‘0’ we reject H0.
Assumptions In all of the statistical tests we look at we will be assuming that we have a valid measure procedure, and that there is no systematic bias in our samples.
Assumptions Underlying This t Test The t test for a single group mean has these additional assumptions: • That the scores in the population are normally distributed. • That the scores are independent.
Assumption of Normality The t test is said to be ‘robust’ in terms of this assumption, which means that the population can be fairly non-normal without it having much of an effect on the validity of the analysis, particularly when the N of our sample is large (which will influence the shape of the SDM towards normality thanks to the Central Limit Theorem). But some deviations from normality can create problems. We will take a closer look at this assumption in an upcoming lecture.
Assumption of Independence The assumption of independence of scores means that any one person’s score is unaffected by (can’t be predicted by) anyone else’s score. This would be violated if we measured the same person twice, or if we let two Elbonians work together on the IQ test.