150 likes | 266 Views
Evaluating Theoretical Models. R-squared represents the proportion of the variance in Y that is accounted for by the model.
E N D
Evaluating Theoretical Models • R-squared represents the proportion of the variance in Y that is accounted for by the model. • When the model doesn’t do any better than guessing the mean (i.e., if we assume X does not cause Y), R2 will equal zero. When the model is perfect (i.e., it accounts for the data perfectly), R2 will equal 1.00.
Why is R2 useful? • R2 is useful because it is a standard metric for interpreting model fit. • It doesn’t matter how large the variance of Y is because everything is evaluated relative to the variance of Y • Set end-points: 1 is perfect and 0 is as bad as a model can be.
Why is R2 useful? • Finally, and importantly, we can begin to compare the relative fit of alternative models • Why is this useful? • When we began our discussion of modeling, we noted that there are ways to estimate parameter values, assuming the basic model is correct. • Now, we can begin to address the question of whether the basic model is correct (or, more specifically, how good it is) by studying the model’s R2 and comparing it to the R2 of competing models.
Example • Data Person x y 1 -2 -11.6 2 -1 -4.4 3 0 1.0 4 1 0.4 5 2 -3.6
Model with no x • The most basic model we can study is one in which Y-hat = My • Recall, that the predicted values yield a horizontal line centered at the mean of Y (-4 in this example)
Model with no x • The variance of Y is 18 (rounded) • The dotted lines here represent the error in prediction • If we square these errors, we find the average squared error to be approximately 18 • Thus, R2 for this model is 1-(18/18) or 0.
Model with linear term • Next, let’s see what happens if we study a linear model of form Y-hat = a + bX • The average squared error in this example is 10.07 • R2 is .44 (1 – (10/18)). The linear model accounts for 44% of the variance in Y.
Model with a quadratic term • Next, let’s see what happens if we study a model of form Y-hat = a + bX2 • The average squared error in this example is approximately 8. • R2 is .55 (1 – (8/18)). The quadratic model accounts for 55% of the variance in Y (11% more than the linear model).
Model with linear and quadratic terms • Next, let’s see what happens if we study a linear + quad model of form Y-hat = a + bX + cX2 • The average squared error in this case is about .10. • R2 is .99 (1 – (.10/18)). The linear + quadratic model accounts for 99% of the variance in Y (44% more than the quadratic model alone).
Summary of model comparisons • Summary of the fit statistics for the various models Model R2 No X .00 Linear .44 Quadratic .55 Linear + Quad .99
Summary • So, it looks like the model that combines the linear and the quadratic terms is the best model, of the four that we studied. It accounts for the data almost perfectly (99% of the variance in Y was explained by the model) • Note: Even if the model does a decent job at explaining the variation in Y, it isn’t proper to conclude that it is correct. • It might be the best model of those that were articulated, even if it is not literally correct.
Residual term • The part of Y that is unexplained by the model is called residual or error variance, and is often represented an an explicit variable in the model. • This variable is often called the residual or error term, and is typically denoted by the Greek symbol epsilon or the Roman letter E. The variance of the residual scores is identical to the proportion of variance in Y that is unexplained by the model. If the model is good, the residual variance will be very small.
Residual Term • DATA = MODEL + RESIDUAL
In the next class we will discuss three reasons why the error variance is greater than zero. • errors of measurement • sampling error • incorrect model