310 likes | 478 Views
Statistics and Research methods . Wiskunde voor HMI Bijeenkomst 2. Correlation. Association between scores on two variables e.g., age and coordination skills in children, price and quality. Scatter Diagram.
E N D
Statistics and Research methods Wiskunde voor HMI Bijeenkomst 2
Correlation • Association between scores on two variables • e.g., age and coordination skills in children, price and quality
Scatter Diagram • A Scatter Diagram (or scatterplot) is a visual display of the relationship between two variables • Example: A company is interested in whether there is a relationship between the number of employees supervised by a manager and the amount of stress reported by that manager
Cause and Effect • An important type of relationship between two variables: cause and effect • Independent variable = cause • Dependent variable = effect
Correlation and Causality • Three possible directions of causality: 1. X Y 2. X Y 3. Z X Y
Correlation and Causality • In situations where variables cannot be manipulated experimentally, it is difficult to know whether one is actually causing the other • Example in newspaper: “drinking coffee causes cancer” • However, a third variable may cause both high coffee consumption and cancer • Such third variables are called ‘confounds’
However, we can still try to predict one variable on the basis of a second variable, even if the causal relationship has not been determined • Predictor variable • Criterion variable
Scatter Diagrams • The independent (or predictor) variable goes on the horizontal (x) axis; the dependent (or criterion) variable on the vertical (y) axis.
Patterns of Correlation • Linear correlation • Curvilinear correlation • No correlation • Positive correlation • Negative correlation
Degree of Linear CorrelationThe Correlation Coefficient • Figure correlation using Z scores • Cross-product of Z scores • Multiply score on one variable by score on the other variable • Correlation coefficient • Average of the cross-products of Z scores
Degree of Linear CorrelationThe Correlation Coefficient • Formula for the correlation coefficient: • Positive perfect correlation: r = +1 • No correlation: r = 0 • Negative perfect correlation: r = –1
Correlation and Causality • Correlational research design • Correlation as a statistical procedure • Correlation as a kind of research design
Issues in Interpreting the Correlation Coefficient • Statistical significance e.g. p < .05 • Proportionate reduction in error =Proportion of variance accounted for • r2 • Used to compare correlations
Issues in Interpreting the Correlation Coefficient (continued) • Restriction in range • Unreliability of measurement
Correlation in Research Articles • Scatter diagrams occasionally shown • Correlation matrix
Regression • Making predictions • does knowing a person’s score on one variable allow us to say what their score on a second variable is likely to be? • The method we use to make predictions is called regression • When scores on one variable are used to predict scores on another variable, it is called bivariate regression (two variables) • When scores on two or more variables are used to predict scores on another variable, it is called multiple regression
These two variables correlate positively • People who drink a lot of coffee tend to be happy, and people who do not tend to be unhappy • Preview: The line is called a regression line, and represents the estimated linear relationship between the two variables. Notice that the slope of the line is positive in this example.
The Regression Line • Relation between predictor variable and predicted values of the criterion variable • Formula: Y = a + (b) X • Slope of regression line • Equals b, the raw-score regression coefficient • Intercept of the regression line • Equals a, the regression constant • Method of least squares to derive a andb
Method of least squares • a and b derived by: • least squares method (drawing) • line through MX and MY
The Regression Line Y = a + (b) X
Bivariate Raw Score Prediction • Direct raw-score prediction model • Predicted raw score (on criterion variable) = regression constant plus the result of multiplying a raw-score regression coefficient by the raw score on the predictor variable • Formula • The “hat” over Y means “predicted”
Bivariate prediction with Z scores • Given the Z score for X, what is the Z score for Y? • We use the prediction model: • where b (beta) is the “standardized regression coefficient” • It’s also called “beta weight”, because it tells us how much “weight” to give to ZX when making a prediction for ZY. • The “hat” over ZY means “predicted”.
What is b? • It turns out that the best value to use for b in the prediction model is r, the (Pearson) correlation coefficient • Thus, the bivariate regression model is • When r = 1, ; when r = -1, • When r = 0; no relation; “best guess” for Y is the mean score
Proportionate Reduction in Error • We want a measure of how accurate our regression model (raw score prediction formula) is predicting the data • We can compare the error we make when predicting with our regression model, SSError to the error that we would make if we didn’t have the model SSTotal
Proportionate Reduction in Error • Error • Actual score minus the predicted score • SSError = Sum of squared error using prediction model • SSTotal = Sum of squared error when predicting from the mean =
Error and Proportionate Reduction in Error • Formula for proportionate reduction in error: • Proportionate reduction in error = r2 • Proportion of variance accounted for