540 likes | 659 Views
The 2 test. Sections 19.1 and 19.2 of Howell This section actually includes 2 totally separate tests goodness-of-fit test contingency table analysis Each has its own point, and requires different things Only thing in common - same formula Keep them separate in your mind!.
E N D
The 2 test • Sections 19.1 and 19.2 of Howell • This section actually includes 2 totally separate tests • goodness-of-fit test • contingency table analysis • Each has its own point, and requires different things • Only thing in common - same formula • Keep them separate in your mind!
Return to hypothesis testing • We can test statistical significance, no prob • need p and alpha (and a computer) • Sometimes, no computer available • can use tables to test statistical significance • Little more work, but works just as well • This method uses the same logic as the p value method
Testing Ho without a PC • The strategy (new stuff is underlined) • Step1: Set up Ho, Ha and decide on alpha • Step 2: Calculate the statistic and df • Step 3:Get the critical value from the table • Step 4:Compare critical value to statistic
Step 1 • Set up Ho and alpha - already know • Ha - the alternative hypothesis • If Ho is false, what do we believe then? (Ha) • Ha represents the opposite of Ho • eg. if Ho: r = 0 then Ha: r 0 • If we reject Ho (because its false), then we must accept Ha as being true.
Step 2 • Nothing different • use appropriate formulas for stat and df!
Step 3 • Get the critical value • from the table (back of Howell) • Use alpha and df to look it up • Critical value: the value of your statistic at which p = alpha (the edge of the rejection region)
Step 4 • Compare your stat to the crit value: • Ignore any minuses (look only at value) • If your calculated stat is morethan the crit value, then p < alpha (ie. significance!) • The test is significant if calculated value is greater than the crit value • Reject the Ho, and accept the Ha. • Pretty easy!
Example • Lets use an r value: • We get r = 0.61 with df = 10, alpha = 0.05 • Is this significant? • Critical value: use df and alpha on table D2 in Howell (significant values of the correlation coefficient) • for alpha = 0.05 and df = 10, crit value = 0.576
Example • Now we have the calculated value and crit value • Calculated = 0.61 • Critical = 0.576 • Check: • if calculated > critical, reject Ho • 0.61 > 0.576, so we reject Ho • The result is statistically significant!
Return to 2 • Note: 2 only works with discrete data • What is the point of 2 ? • Goodness-of-fit: Used to see if data matches a hypothetical distribution • Are there the same number of men as women? • Are about 25% of South Africans unemployed? • Contingency table analysis (independence test): used as a correlation for discrete data (are the variables related?)
Goodness-of-fit 2 • Used to test a model distribution of data • Have an idea of how data should be distributed • eg. There should be 60% brunettes, 40% blondes • Collect data, check to see if our idea (model) is supported by the data • Does the data fit the model? • Before starting a goodness-of-fit test, always be sure of what the model is
Creating a model • We put our expectations as percentages on a table • One cell of the table for each possible value of the variable • Each cell has the percentage of observations we expect
Example model • We expect 40% brunettes, 60% blondes, so Blondes Brunettes 40% 60%
Observed scores and Expected scores • Strategy: Want to see if our observation matches our model • We collect some data (Observed scores) • We work out what the data would look like if our model were correct (Expected scores) • Compare the two: do the observed scores show the same pattern as the expected scores?
Converting the model to expected scores • We have our model as percentages • We must now convert % to actual values (frequencies) - use n (number of observations) If we collected 134 observations, then Blondes Brunettes 60% 40% Blondes Brunettes (40/100) x 134 = 53.6 (60/100) x 134 = 80.4
Converting % to frequency • To do this: • (percentage / 100) x n • Keep the decimals! • You cannot work with % for 2 - you must have frequencies (number of observations)
Beginning the 2analysis • To begin, need Ho • For , 2 it is always “observed data = expected data” • Need to state the model (in %) • Collect the data • Create an expected freq table (using your model and n) • Calculate 2 to see if the observed = expected
2 Formula O = observed score E = expected score
2 formula, step by step • Step 1: for each subject, that subject’s O minus that subject’s E • Step 2: for each subject, square the step 1s. • Step 3: for each subject, take their step2, and divide it by that subject’s E • Step 4: sum all the step 3’s
Table method for 2 • Use the following columns: • O E O-E (O-E)2 (O-E)2 E Add up here
Degrees of freedom (df) • The df for goodness-of-fit tests is easy to calculate: • df = k-1 • k is the number of possible values for your variable (categories) • using males and females k = 2 • using coke, pepsi, sprite k = 3 • using easy, moderate, hard, awesome k =4
Worked example 1 • We suspect that there is a 50%/50% gender distribution at UCT. We observed 147 people, 68 male, 79 female. Do we really have a 50%/50% distribution? • Set up (step 1) • Ho: Distribution is 50%/50% • Ha: Distribution is not 50%/50% • alpha = 0.05
Example: work out expected scores • (What would we have seen if Ho were true?) • Model: • Males 50% • Females 50% • Convert to scores • n = 147 • Males expected: (50/100) x 147 = 73.5 • Females expected: (50 / 100) x 147 = 73.5
Example: O and E values • Now we have our values (O-E)2 O E O-E (O-E)2 Value E 68 73.5 Male 79 73.5 Female
Example - Work out the columns (O-E)2 O E O-E (O-E)2 Value E 68 73.5 -5.5 30.25 0.411 Male 30.25 0.411 5.5 79 73.5 Female
Example - Add up the values in the last column (O-E)2 O E O-E (O-E)2 Value E 68 73.5 -5.5 30.25 0.411 Male 30.25 0.411 5.5 79 73.5 Female 0.823
Example - df • Now we have our 2 value: 0.823 • Is it statistically significant? (does the model explain the population?) • Need the critical value for this! • Degrees of freedom: k-1 • 2 categories (male, female) • so df = 1
Example: critical value • What is the critical value for our male/female example? • Df: k = 2 (male and female), so df = 1 • For df = 1 and alpha = 0.05, the table says: • crit = 3.84 • To be significant, our value must be more that 3.84
Example: conclusions • Calculated < critical • (0.823 < 3.84), so the Ho is true • (this means: it is true that “distribution is 50%/50%) • Conclusion: it seems that at UCT there are as many males as there are females.
Interpreting 2 findings • 2 findings are interpreted a little differently • False Ho (significance) means we cannot accept the model (the model is wrong for this population) • True Ho (non-significance) means we must assume that the model applies to this population • This is the case for goodness-of-fit tests
Contingency table analysis with 2 • Pearson’s product moment allowed us to establish a relationship between 2 continuous variables • doesn’t work for discrete data (categories) • Eg. “is there are relationship between gender and owning a dog or cat?” (2 discrete variables) • Contingency table analysis is used for this • can work with nominal variables
Something old, something new • Quite similar to goodness-of-fit tests • Work out the expected values • Use the chi square formula • Work out df • get a critical value from the table • Differences: • Slightly different O table • New way of working out expected values • New way of working out df
Observed values • For each person, we ask 2 questions (2 vars) • “are you male/female” and “do you have a dog or a cat” (let’s assume we sample only pet owners) • We end up with: • Subject Gender Pet • 1 M D • 2 M C • 3 F D etc.
O table • We need to convert those data into a frequency table that looks like: GENDER Male Female Dog PET Cat
Filling in the O table • Each cell has only one number in it • number of people fitting that condition In cell 1: number of people who are Male AND have a dog GENDER Male Female In cell 2: number of people who are Female AND have a dog 1 2 Dog PET 4 In cell 3: number of people who are Male AND have a cat 3 Cat etc
The finished O table • An o table usually looks like: GENDER Male Female We had 7 males with cats 36 34 Dog PET We had 34 females with dogs 32 7 Cat This table is a 2x2 table - 2 rows (pet) and 2 columns (gender)
Notes about O tables • The numbers inside the cells are frequencies (just like goodness-of-fit) • You can have as many levels of a variable as you like • eg. dog, cat, parakeet, moose, hamster, other (6 levels) • BUT you can only have 2 variables • eg. not gender, pet AND car type
E values • Expected values are a bit more tricky • We want to finish with an E table, of the same form as the O table Male Female Expected Need to calculate a value for each cell we will use the O values to do this Dog Cat
E values, step by step • Step 1: work out the grand total from the O table (N) • Step 2: work out the marginal totals from the O table • Step 3: use a formula (RiCj/N) to get a value for each cell of the E table
Step 1: Grand total (N) • How many people did we use? • Same idea as the usual n • called capital N (for some reason) • To calculate: Add up all the numbers in each of the cells • So in the gender/pet example: N is • 36+34+7+32 = 109 • N = 109
Step 2: Marginal totals • We can work out the total of the margins of the O table The marginal totals are written on the edges of the o table Male Female O 36 34 70 Dog 32 7 39 Cat 43 66
Step 2: Calculating marginals • For each marginal, add up the numbers in that line, so: Do the rows AND the columns! Male Female O 36 34 36+34 = 70 Dog 32 7 Cat 7 + 32 = 39 36+7 = 43 34+32 = 66
Step 3: Work out E table • Write your marginals around your blank E table - in the right places! We will now use the marginals to compute one E value for each cell Male Female E 70 Dog The formula for E: E = 39 Cat Ri x Cj 43 66 N
Step 3: Work out a single cell • For each cell, look at the cell’s row and column marginal (Ri and Cj) For Male/Dog Ri = 70 Cj = 43 Male Female E R = 70 C = 43 70 Dog The formula for E: E = 70 x 43 39 Cat = 27.614 109 43 66 Do the same for each cell
Ready to calculate 2 • Now we have O and E, ready to calculate 2 (using the same formula as before) Male Female O E 42.385 36 34 27.614 Dog 32 7 15.385 23.614 Cat
Calculate 2 • This is almost the same as for goodness-of-fit, but be careful in building your table (the O and the E columns) (O-E)2 O E O-E (O-E)2 E 36 27.614 34 42.385 7 15.385 32 23.614
Matching up the O and E columns • Be careful!! Each type of response has an O and an E - match up the correct ones! • Male/Cat has O = 7 and E = 15.385 • Female/Dog has O=34 and E = 42.385 • If you get the wrong E for an O, all your results are wrong!! Do it slowly.
Working out the table • Step 1: O-E (go row by row, slowly) (O-E)2 O E O-E (O-E)2 E 36 8.385 27.614 34 42.385 -8.385 7 15.385 -8.385 32 23.614 8.385
Working out the table • Step 2: square the differences (O-E)2 O E O-E (O-E)2 E 36 8.385 70.3136 27.614 34 42.385 -8.385 70.3136 7 70.3136 15.385 -8.385 32 70.3136 23.614 8.385
Working out the table • Step 3: divide the squares by E (O-E)2 O E O-E (O-E)2 E 36 8.385 70.3136 2.546 27.614 34 42.385 -8.385 70.3136 1.658 7 15.385 -8.385 70.3136 4.57 32 23.614 8.385 70.3136 2.977