130 likes | 279 Views
Discussion of “Least Angle Regression” by Weisberg. Mike Salwan November 2, 2006 Stat 882. Introduction. “Notorious” problem of automatic model building algorithms for linear regression Implicit Assumption Replacing Y by something without loss of info Selecting variables Summary.
E N D
Discussion of “Least Angle Regression” by Weisberg Mike Salwan November 2, 2006 Stat 882
Introduction • “Notorious” problem of automatic model building algorithms for linear regression • Implicit Assumption • Replacing Y by something without loss of info • Selecting variables • Summary
Implicit Assumption • We have n x m matrix X and n-vector Y • P is the projection onto the column space • LARS assumes we can replace Y with Ŷ = PY, in large samples F(y|x) = F(y|x’β) • We estimate residual variance by • If this assumption does not hold, then LARS is unlikely to produce useful results
Implicit Assumption (cont) • Alternative: let F(y|x) = F(y|x’B), where B is an m x d rank d matrix. The smallest d is called the structural dimension of the regression problem • The R package dr can be used to estimate d using methods such as sliced inverse regression • Find a smooth function that operates on a variable set of projections • Expanded variables from 10 to 65 in paper such that F(y|x) = F(y|x’β) holds
Implicit Assumption (cont) • LARS relies too much on correlations • Correlation measures degree of linear association (obviously) • Requires linearity in conditional distributions of y and of a’x and b’x for all a and b, otherwise bizarre results can come • Any method replacing Y by PY cannot be sensitive to nonlinearity
Implicit Assumption (cont) • Methods based on PY alone can be strongly influenced by outliers and high leverage cases • Consider • Estimate σ² by • Thus the ith term is given by: • Ŷi is the ith element of PY and hi is the ith leverage which is a diagonal element in P
Implicit Assumption (cont) • From the simulation in the article, we can approximate the covariance term by , where ui is the ith diagonal of the projection matrix on the columns of (1,X) at the current step of the algorithm • Thus, • This is the same formula in another paper by Weisberg where is computed from LARS instead of a projection
Implicit Assumption (cont) • The value of depends on the agreement between and ŷi, the leverage in the subset model and the difference in the leverage between the full and subset models • Neither of the latter two terms has much to do with the problem of interest (study of conditional distribution of y given x), but they are determined by the predictors only
Selecting Variables • We want to decompose x into two parts xu and xa where xa represents the active predictors • We want the smallest xa such that F(y|x) = F(y|xa), often using some criterion • Standard methods are too greedy • LARS permits highly correlated predictors to be used
Selecting Variables (cont) • Example to disprove LARS • Added nine new variables by multiplying original variables by 2.2, then rounding to the nearest integer • LARS method applied to both sets • LARS selects two of the rounded variables including one variable and its rounded variable (BP)
Selecting Variables (cont) • Inclusion or exclusion depends on the marginal distribution of x as much as the conditional distribution of y|x • Ex: Two variables have a high correlation. • LARS selects one for its active set • Modify the other to make it now uncorrelated • Doesn’t change y|x, changes marginal of x • Could change set of active predictors selected by LARS or any method that uses correlation
Selecting Variables (cont) • LARS results are invariant under rescaling, but not under reparameterization of related predictors • By first scaling predictors then adding all cross-products and quadratics, we get a different model if done other way around • This can be solved by considering them simultaneously, but this is self-defeating in terms of subset selection
Summary • Problems gain notoriety because their solution is illusive but of wide interest • LARS nor any other automatic model selection considers the context of the problem • There seems to be no foreseeable solution to this problem