300 likes | 488 Views
Psych 230 Psychological Measurement and Statistics. Pedro Wolf September 30 , 2009. Homework. Question 2 a. Homework. 2 b. = 0.9661737 2 c. r 2 = .93. Ninety three percent of the variance in test y can be predicted in test x.
E N D
Psych 230Psychological Measurement and Statistics Pedro Wolf September 30, 2009
Homework Question 2 a.
Homework 2 b. = 0.9661737 2 c. r2 = .93. Ninety three percent of the variance in test y can be predicted in test x. 2d. Yes the large correlation indicates that one can predict scores on test Y with scores on test X.
Homework • 6 a • This statement implies a cause and effect relation. • The correlation by itself does not imply this. • It may be that people with more education have more access to mental healthcare and therefore have the opportunity to use it.
Homework • 6b. • Again this statement implies a cause and effect relation. • There could be other factors. • Children who play an instrument may have parents who stress the value of practice and education • 6c • This statement is false. • A negative correlation means there is a relationship
Homework • 8 a. Test 4, r=.633: r2 =.4007 • b. .4007 x 2= √.8014; r=.8952 • c. They ask the same type of question.
Last Time…. • Correlation • r value indicates • Direction of relationship between two variables • Strength of relationship between two variables
Today…. Regression
Regression • Correlation tells us about the strength of the relationship between 2 variables • It does not let us predict • We can use linear regression to do this
Correlation • When you run a • correlation you • convert everything • to z scores • r = (ΣZxZy) / N
Regression • We build on correlation by adding a “line of best fit” to the data • The previous plot was on a stand- ardized scale • Any known X score lets us predict the Y score
Line of best fit • Remember this from high school? • Y = mX + b • We use: Y = α + by(X)+ error • Where by is the slope of the line • a is the Y intercept (where the line hits the y-axis) • error is the unexplained variance
Slope • Slope (by) is the angle of the line • Change in Y / Change in X • The more Y changes for every unit change of X, the steeper the slope
Y-intercept • This is where the line crosses the Y axis • When X = 0, the value of Y is the intercept
Line of best fit • The resulting line comes as close as possible to the existing data points
Determining the Regression Line • The following is the formula for determining the slope • For the intercept
Y prime • The line formula gives us the value of Y we would predict if given X • We write this as Y’ • We have to differentiate from the actual Y, because our estimate Y’ is not totally accurate
Why predict Y? • We already have Y scores • Y’ isn’t as good as Y • But, the regression lets you predict new data • Use SAT scores to predict college performance • Use morbidity data to predict longevity of smokers • Use past status of markets to predict their future status
Making predictions • You can rewrite the line formula as: • The slope is the middle term by = r(Sy/Sx) • Get the intercept by moving stuff around
Example • Jessica wants to predict her final exam grade from the midterm • She earned a 74 on the midterm • The mean grade on the midterm was 70 and s = 4 • In previous years, the mean on the final was a 75 and s = 4. The correlation between the two tests was r = .60 • What score can Jessica predict? • Y’ = 75 + .6(4/4)(74 – 70)
Example • Jessica wants to predict her final exam grade from the midterm • She earned a 74 on the midterm • The mean grade on the midterm was 70 and s = 4 • In previous years, the mean on the final was a 75 and s = 4. The correlation between the two tests was r = .60 • What score can Jessica predict? • Y’ = 75 + .6(4/4)(74 – 70)
Example • Y’ = 75 + .6(4/4)(74 – 70) • Y’ = 75 + (.6)(1)(4) • Y’ = 75 + 2.4 • Y’ = 77.4 • What if the correlation between the midterm and final was 1?
Example • Y’ = Ybar + r(Sy / Sx) (X – Xbar) • Y’ = 75 + (1)(4/4)(74 – 70) • Y’ = 75 + 4 = 79 • The correlation is perfect here • A difference in score values reflects a difference in scale • The distance from the mean is identical
Example • Y’ = Ybar + r(Sy / Sx) (X – Xbar) • What if the correlation between the midterm and final was 0?
Example • Y’ = Ybar + r(Sy / Sx) (X – Xbar) • Y’ = 75 + (0)(4/4)(74 – 70) • Y’ = 75 • The best prediction is the mean when the variables are uncorrelated, or the correlation is unknown. • Regression allows us to beat the mean
Variation • If r = +-1, all variation is explained, if r = 0 all variation is unexplained • The closer the points fall to the regression line, the greater the variation explained
Causation • As with correlation, we can’t infer causation with regression • We’re observing variables that correlate, not running experiments • Beware of lurking variables. Another explanation may fit the data better
Midterm • For the midterm you are going to have to integrate what you have learned. • You are going to be given one or more research problems with small datasets. • Because all you know how to do right now is descriptive statistics and correlation/regression analyses they will be correlational designs. • You are going to have to run all the descriptive statistics you know. (e.g. what the mean, standard deviation, range, mode, etc. for the two variables). • Draw a scatterplot. • You will then calculate the correlation, report whether or not it is significant. • You will then do a regression, calculate the slope and intercept and draw the line of best fit through the scatterplot. • I may give you a value for x and ask you to predict a corresponding y value given your regression line.
Homework • 2 a-d • 10 • 15