Chi Square & Correlation

Chi Square & Correlation

Nonparametric Test of Chi2 • Used when too many assumptions are violated in T-Tests: • Sample size too small to reflect population • Data are not continuous and thus not appropriate for parametric tests based on normal distributions. • χ2 is another way of showing that some pattern in data is not created randomly by chance. • X2 can be one or two dimensional. • X2 deals with the question of whether what we observed is different from what is expected

Calculating X2 What would a contingency table look like if no relationship exists between gender and voting for Bush? (i.e. statistical independence) Male Female Voted for Bush 50 Voted for Kerry 50 100 50 50 NOTE: INDEPENDENT VARIABLES ON COLUMS AND DEPENDENT ON ROWS

Calculating X2 What would a contingency table look like if a perfect relationship exists between gender and voting for Bush? Male Female Voted for Bush Voted for Kerry

Calculating the expected value The expected frequency of the cell in the ith row and jth column Fi = The total in the ith row marginal Fj = The total in the jth column marginal N = The grand total, or sample size for the entire table Expected Voted for Bush = 50x50 / 100 = 25

Nonparametric Test of Chi2 • Again, the basic question is what you are observing in some given data created by chance or through some systematic process? O= Observed frequency E= Expected frequency

Nonparametric Test of Chi2 • The null hypothesis we are testing here is that the proportion of occurrences in each category are equal to each other (Ho: B=K). Our research hypothesis is that they are not equal (Ha: B =K). Given the sample size, how many cases could we expect in each category (n/#categories)? The obtained/critical value estimation will provide a coefficient and a Pr. that the results are random.

(50-25)2/25=25 (0 - 25)2 /25=25 (0 - 25)2 /25=25 (50-25)2 /25=25 X2=100 Let’s do a X2 Male Female Voted for Bush Voted For Kerry What would X2 be when there is statistical independence?

Let’s corroborate with SPSS

How do we know if the relationship is statistically significant? We need to know the df (df= (R-1) (C-1) ) (2-1)(2-1)= 1 We go to the X2 distribution to look for the critical value (CV= 3.84) We conclude that the relationship gender and voting is statistically significant. Testing for significance Male Female Voted for Bush Voted for Kerry X2= 4

When is X2 appropriate to use? • X2 is perhaps the most widely used statistical technique to analyze nominal and ordinal data • Nominal X nominal (gender and voting preferences) • Nominal and ordinal (gender and opinion for W)

X2 can also be used with larger tables 45 (19.4) (15.8) 30 (.88) (.72) 70 (8.6) (6.9) 65 80 145 X2=52.3 Do we reject the null hypothesis?

Correlation (Does not mean causation) • We want to know how two variables are related to each other • Does eating doughnuts affect weight? • Does spending more hours studying increase test scores? • Correlation means how much two variables overlap with each other

Types of Correlations

Conceptualizing Correlation Measuring Development Strong Weak GPD POP WEIGHT GDP EDUCATION Correlation will be associated with what type of validity?

Correlation Coefficient

Home Value & Square footage

Correlation Coefficient

Rules of Thumb

Multiple Correlation Coefficients

Limitation of correlation coefficients • They tell us how strong two variables are related • However, r coefficients are limited because they cannot tell anything about: • Causation between X and Y • Marginal impact of X on Y • What percentage of the variation of Y is explained by X • Forecasting Because of the above Ordinary Least Square (OLS) is most useful

Do you have the BLUES? • B for Best (Minimum error) • L for Linear (The form of the relationship) • U for Un-bias (does the parameter truly reflect the effect?) • E for Estimator

Home value and sq. Feet Does the above line meet the BLUE criteria?

Chi Square & Correlation