260 likes | 392 Views
POLS 570 week 7. Introduction to Correlation, crosstabs and causation. The test statistic and its distribution. Recall the thought experiment in which you estimated the average height of the people in ESU through random sampling
E N D
POLS 570week 7 Introduction to Correlation, crosstabs and causation
The test statistic and its distribution • Recall the thought experiment in which you estimated the average height of the people in ESU through random sampling • Your result was a frequency distribution, which had a mean and standard deviation • Test statistics, similarly, have probability distributions which they tell us the statistical likelihood of the observed relationship between the variables
Cross tab Chi-square statistic • In the case of cross-tabulation, the test statistic is called “Chi-squared;” it follows the Chi-squared distribution, which is defined as
Chi-squared, continued • In other words, the Chi-squared statistic indicates the sum of the squared deviations from what would be expected assuming that no differences existed between men and women (for the variable of interest) divided by the expected observations
Hypothesis testingusing cross-tabulation • Since the Chi-squared distribution is known, we can calculate the probability with which the statistic would have been generated if the null hypothesis were true • When the null hypothesis (no relationship between the variables) is true with very small probability (usually, <.05), we say that there is a significant relationship • and thus we reject the null hypothesis
Significance • A statistical finding is significant when it rejects the null hypothesis at a pre-specified level of confidence (probability). THIS IS NOT the “significance” referred to in qualitative case studies. • The null hypothesis, in the case of cross-tabulation, is that there is no relationship between the row variable and the column variable––i.e., men and women equally reflect the average probability of owning guns
Statistical Significance • The “significance” of a statistic is the probability with which it could have been generated by chance (given that no real relationship exists between the variables) • When this “significance” is very low (usually less than .05, unless otherwise specified), we reject the null hypothesis and interpret this to mean that a real relationship exists
Interpreting the output • SPSS gives us several statistics related to the ordinary Chi-squared statistic, which is the one we are interested in • It tells us • the value of the statistic, • the number of “degrees of freedom” the “significance” of the statistic
Our results • Our results show that there is a strong and significant relationship between race and gun-ownership. Whites are more likely to own guns than Blacks, and this holds true even when controlling for gender • The relationship is significant, rejecting the null hypothesis at the .001 level
Other examples: Let’s experiment with some other variables • Gender and attitude toward prayer in school • Gender and presidential voting in 1992 • Race and attitude toward prayer in school • Race and presidential voting in 1992 • Gender and attitude toward homosexuality • Race and attitude toward homosexuality
Our findings • We find: • a significant relationship between gender and • attitude toward prayer in school • presidential voting in 1992 • a significant relationship between race and • attitude toward prayer in school • presidential voting in 1992 • attitude toward homosexuality
Our findings, continued • How about the relationship between gender and attitude toward homosexuality? Are men more likely to have a negative attitude toward it than women? • Is there a significant relationship between race and attitude toward homosexuality? Are non-whites are more likely than whites to have a negative attitude towards it? • PRE: proportional reduction in error, there are different indicators- you need not memorize these
Correlation: a measure of association for interval- and ratio-level data • Correlation (like cross-tabulation) can be used as a descriptive or an inferential technique • Hypothesis: students enrolled as social science majors (e.g. economics, history, sociology, political science) have a greater interest in politics than those enrolled in professional programs (e.g. business, engineering, medicine). If supported, then there is a correlation between majoring in the social sciences and being interested in politics.
What does correlation tell us? • Correlation answers the question, “is there a linear association between two variables”? • Are high values of one variable associated with high values of another? • Are low values of one variable associated with high values of another? • Is the relationship significant?
Correlation • The result of a correlation analysis is the correlation coefficient, called r (the “Pearson” correlation coefficient) • It varies from -1 to 1 • r = 1 indicates perfect positive correlation; r = -1 indicates perfect negative correlation • r = 0 indicates no correlation • Good with quantitative variables and a linear relationship
Correlation in SPSS • To find the correlation between Respondent’s Highest Degree (degree) and Respondent’s Income (rincome): • Go to the “Analyze” menu and choose “Correlate,” then “Bivarate” • The correlation coefficient is the correlation, in this sample, between degree and rincome • The significance level tells us: • if there were zero correlation. . . , • with what probability would we observe such a large (positive or negative) correlation coefficient in our sample?
Correlation output • The correlation coefficient • The significance of the coefficient
Scatterplots • Scatterplots show the relationship between two variables, shows the type of relationship • For each observation (e.g., each year or each individual), the values of two variables are recorded as a point on a two-dimensional graph, but does not show level of relationship. • Example: the relationship between unemployment and gross domestic product
To reiterate and preview: Recall the normal distribution…. • The normal distribution is important because we can use the assumption of normality for the sampling mean of certain variables with unknown distributions • There are also important distributions that can be derived as functions of the normal distribution, such as • the t distribution • the Chi-squared distribution • the F distribution • These are important in statistical inference