110 likes | 323 Views
Geometric Representation of Regression. ‘Multipurpose’ Dataset from class website Attitude towards job Higher scores indicate more unfavorable attitude toward company Number of years worked Days absent 12 cases. EMP DAYSABS ATTRATE YEARS a 1 1 1 b 0 2 1
E N D
‘Multipurpose’ Dataset from class website • Attitude towards job • Higher scores indicate more unfavorable attitude toward company • Number of years worked • Days absent • 12 cases EMP DAYSABS ATTRATE YEARS a 1 1 1 b 0 2 1 c 1 2 2 d 4 3 2 e 3 5 4 f 2 5 6 g 5 6 5 h 6 7 4 i 9 10 8 j 13 11 7 k 15 11 9 l 16 12 10
Typical representation with response surface • Correlations .89 and up* • R2 model = .903 DAYSABS ATTRATE YEARS DAYSABS 1.0000000 0.9497803 0.8902164 ATTRATE 0.9497803 1.0000000 0.9505853 YEARS 0.8902164 0.9505853 1.0000000 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.2630 1.0959 -2.065 0.0689 . ATTRATE 1.5497 0.4805 3.225 0.0104 * YEARS -0.2385 0.6064 -0.393 0.7032
Typical representation with response surface • Where the response surface crosses the y axis (daysabs) provides the intercept in our formula • Holding a variable ‘constant’ is like adding a plane perpendicular to that variable’s axis • The process as a whole minimizes the sum of the squared distances between the original data points and their projection onto the plane
Alternative • Given a variable, we can instead view it as a vector projection from an origin into some n-dimensional space • In another way, the space is the number of dimensions, one for each individual (for this data 12 dimensions), where this vector, which represents their values on some predictor, occupies only a single dimension within that space
Assume now two standardized variables of equal N • Now we have 2 vectors (of N components) emanating from the origin* • The cosine of the angle they create is the simple correlation of the two variables • If they were perfectly correlated they would occupy the same dimension (i.e. be right on top of one another) X1 X2
Adding a third variable, we can again understand their simple correlations as the cosines of the respective angles they create • Given the plane created by X1 and X2, might we find a way to project Y onto it? Y X1 X2
That is in fact what multiple regression does and this projection is that linear combination* resulting in our predicted values • The cosine of the angle created by Y and Y-hat is the multiple R, which when squared gives the amount of variance in Y accounted for by the model containing X1 and X2 • The attempt is made in regression to minimize that angle/max its cosine • Partial correlations may be represented too, by creating a plane perpendicular** to one variable and projecting the others onto that plane • The cosine of the angle they create will be their partial correlation Y X1 Y-hat X2