500 likes | 814 Views
The Chi Square statistic tests :. Whether the difference between what you observe and what chance would predict is due to sampling error. The greater the deviation of what we observe to what we would expect by chance, the greater the probability that the difference is NOT due to chance.
E N D
The Chi Square statistic tests : • Whether the difference between what you observe and what chance would predict is due to sampling error. • The greater the deviation of what we observe to what we would expect by chance, the greater the probability that the difference is NOT due to chance.
Hypothesis Testing • Step 1: State Hypothesis • What is the Null? • What is the Alternative (Research) Hypothesis?
FIRST STEP: COMPUTE PERCENTAGE TABLE • Row margins: • 335/908 = 36.9% Disapprove • 573/908 = 63.1% Approve • NOTE: Most Citizens Approve of Clinton • BUT: We Are Testing for a Gender Effect:Are Women more Supportive Than Men?
INTERPRETATION • Row Marginal: Most Citizens (63%) Approved of Clinton. • Column Marginal: There Are More Men in the Sample (54%). • Based on the percent table looks like women are more supportive of Clinton than are men – 69% vs. 58%. Step #1: Hypotheses: • Null Hypothesis:H0: Women – Men = 0 + Chance • Cell Percents Show Women More Supportive Null Hypothesis Challenged.
STEP # 2: WHAT IS THE DISTRIBUTION? Categorizing same individuals in two ways:Approval and Gender Looking at the Effect of an Independent Variable (Gender) on Dependent Variable (Approval). This is a classic application for the c2 test. · 1. We have nominal data in both variables - men vs women, approve vs disapprove. · 2. The data are in the form of frequencies and · 3. We are looking to see if there is a relationship between the two variables.
Step 3: DETERMINE LEVEL OF SIGNIFICANCE We set as our standard 95% confidence that the difference we observe in our study is not due to chance This is equivalent of setting alpha at risk level of 5% (=.05). STEP 4: DETERMINE CRITICAL VALUE OF 2 • Degrees of Freedom: • (# rows – 1) * (# Columns – 1) • Here: (2 – 1) * (2 – 1) = 1 * 1 = 1 • Look up Critical Value of 2* at 5% level with 1 df and find: 2* = 3.84
STEP 5: Calculate Test Statistic and Make Decision • Question: Is the proportion of men and women approving Clinton different from what you would expect by sampling error in more than 5% of all samples? Assuming the null hypothesis is true what would be the expected values? • What we need now are the expected values against which to compare our observed values. What would you expect by assuming the null is true?
CALCULATING EXPECTED VALUES • Look at the Marginal Values:Note: 63% of all respondents approve • Therefore, assuming the null is true — that there is no gender difference — what would you expect the cell percentages to look like? • 37% of women should disapprove as well as 37% of men, + sampling error • 63% of women as well as 63% of men should approve, + sampling error • The proportions of men and women approving and disapproving should be the same + sampling error
TWO METHODS FOR COMPUTING EXPECTED VALUES • Method (Easiest): Row Margin * Column Margin Total N Cell a: 335 * 418 / 908 = 140,030 / 908 = 154 Cell b: 335 * 490 / 908 = 164,150 / 908 = 181 Cell c: 573 * 418 / 908 = 239,514 / 908 = 264 Cell d: 573 * 490 / 908 = 280,770 / 908 = 309
2. Null Percentage Method:If null is true, the Percentage of Men and Women should be the same. Then compute the frequency based on that percentage. • Cell a: .369 * 418 = 154 • Cell b: .369 * 490 = 181 • Cell c: .631 * 418 = 264 • Cell d: .631 * 490 = 309
KEY QUESTIONS • How closely do fo values match fe values? • Do the squared fo – fe differences fit the null hypothesis? • Or, are the differences between observations and chance expectations so different as to justify rejecting the null?
Compare Chi-Square Values: Critical value: 3.84 Chi-square computed from data: 10.07 Decision: Reject Null. STEP 6: STATE CONCLUSION • Computed value of chi-square greater than critical value, therefore, reject the null hypothesis. • Substantive interpretation?: The difference between groups on the IV is statistically different from the null hypothesis.
GAMMA • Γ = AP-DP/AP+DP • Will give us a value between -1 and 1. • Tells us strength and direction.
How to Calculate Gamma Thinking about cases as pairs - Concordant and Discordant Pairs. • Concordant: A pair where case A scores higher or lower than does case B on BOTH variables. • Discordant: A pair where case A scores higher or lower than does case B on ONE variable and the opposite for the other variable. • Tied: Cases A and B tie on at least one of the variables. • We add up the number of tied and Concordant and Discordant pairs.
Concordant Pairs: Down and Right Calculating Gamma CP = 30(30 + 40) + 20(40) = 2900
Discordant = Down and Left Calculating Gamma DP = 20(20) + 10(20+30) = 900
Tau-alpha AP – DP N(N-1)/2 Same numerator as Gamma but out of total number of pairs. Tau-C AP-DP ½N2[m-1/m] Adjusts for the size of the table M = rows or columns, whichever is less. Τα and Τc
Quantifying Linear Relationships • Introduction to Regression Analysis
Two Interests All science is concerned with the relationships between variables -- the effect of one variable on another. This is what hypothesis testing is all about. We hypothesize that X is related to Y. The two most powerful techniques for analyzing the relationship between interval level variables are: 1. Regression:Magnitude of relationship between the independent variable and the dependent variable (how much change in one yields how much change in the other). 2. Correlation: the predictive power of one variable on another (direction and strength of association).
Regression Consider the relationship between education and income. First we could look at the strength of the relationship, for example, the impact of education on income, asking how much of a change in income is associated with one’s # of years of education. EG., how many more dollars of income would someone earn, on average, if he or she finished college rather than drop out after 2 years? We are asking, as education increases how much does income change? Given a positive relationship between education and income (as X goes up Y goes up) how do years of education vary with dollars of yearly income? Is the effect big or small?
Correlation Correlation analysis asks: how good a predictor is the "independent variable" of the dependent variable? Here, how good a predictor of income is education? Is education a good indicator of income or not? How accurate is our prediction of the effect of education on income. It tells us how strongly related - how predictive - is one variable of another, say, education of income. • Both types of analyses go together and both concepts can be pictured on scatterplots • Whereas regression effects are depicted by the slope of the line correlation can be seen as the spread of points around the regression line.
Types of Correlation • Positive Correlation: An increase in one variable results in an increase in the other • Negative Correlation: An increase in one variable results in a decrease in the other. The greater the amount of spread of points around the regression line, the less predictive is X of Y and consequently, the weaker the correlation.
Scatter Plot • Is a pictorial depiction of the relationship between variables. • Is a two-dimensional surface on which all the X and Y scores of all the objects in your study are represented with each object’s value on X and value on Y appearing as a single point. Draw a straight line through these points. Connect the dots. That line is called the "regression line". The regression line is the "best-fitting line" drawn through the points on an X-Y scatterplot.
Correlation = 1 Slope = -2
Imperfect Correlation and Relationships • We rarely see perfect correlation • However, even with imperfect correlation, we can have some expectation of what will happen on average. • While correlation is never perfect, we can draw a line to summarize the trend in the data points. This is the Regression Line
Establishing Relationships Now Add 5 years of education 10 Years of Education Means about $12,000 Income It adds an Additional $4,000 of Income!
Slope of the regression line, called “beta”, written b.Note that some of the points are above the line, some below. The regression line – the best fitting line -- is that one line that can be drawn through the plot of points that produces the minimum amount of deviation of points from the best-fitting line. If you drew the line properly, no other line would yield a smaller overall summary measure of distance from each point to the line. Beta is the change in the dependent variable associated with one unit of change on the predictor variable. Deviation is the sum of the squared distance of points to the regression line.
Plan 1: Minimize the sum of the distances between the points and the line -.25 +2 +2 -3.5 -.25 Problem: They all add up to zero! Solution: Square the Distances
Yhat is the point where X meets the regression line; it is the estimated point of each X valueon Y. Yhatis that point on the regression line predicted for each value of X – it’s the predicted value of Y for each value of X. Fitting a regression line to data points by this method is called the "least squares method" -- the regression line is that line which minimizes the squared difference between the observed points and the point predicted by the line. The best-fitting line is that line which -- compared to any other line you could plot through the points -- produced the lowest sum of squared deviations. So what we do in a regression analysis is compute that line which minimizes the squared deviation of points from the "best-fitting line".
The regression line represents the average amount of change on Y due to changes in X. Hopefully, these pictures will help you visualize relationships. What regression and correlation analyses each do is produce a summary number to represent a relationship. Regression tells you the strength of the relation [shown by slope of the line], and the predictive power of the relationship [as summarized by the correlation coefficient, written r] gives you a summary measure of errors in prediction.
The point where the slope intersects the Y axis is called the "intercept" or"constant“, written a. It is the point where the independent variable is zero. It is the value of Y when X is zero. b (beta) is the slope of the regression line X is the value of the independent variable Interpretation: a one unit change on X relates to a beta change on Y, plus the value of the intercept. y = a + bX e.g., Income = $4100 (intercept) + $800 * X(Years of Education)
The dependence of Y on X can be of two types: “deterministic” or “probabilistic”. The classic case of deterministic relationship is that between Fahrenheit and Centigrade measure of temperature: F0 = 32 + (9/5)C Where a, the intercept, is 320. So when C=0 degrees F=32, b beta is the slope of the line, here (9/5) or 1.8. C is X degrees Centigrade. So for every one degree of change in degrees C, Fahrenheit goes up by 1.8 degrees, starting at 32 degrees: when C =0 F = 320 + (9/5)0 = 320 when C = 1000 F = 32 + (9/5)100=2120
Probabilistic Regression • Not perfectly predictive. • On average, we expect a certain amount of change in Y for a certain change in X • The formula for beta, the slope of the regression line Where the numerator is the covariance of X and Y and the denominator is the variance.