120 likes | 136 Views
This presentation highlights Inverse Regression Methods, such as Principal Components Regression (PCR), Sliced Inverse Regression (SIR), Constrained Inverse Regression (CIR), and Partial Inverse Regression (PIR). It discusses applying these methods to estimate high-dimensional models where the link function is unknown, overcoming the curse of dimensionality, and addressing the p > N problem. Examples and simulation results are presented to showcase the effectiveness of these methods in reducing variables and capturing meaningful factors. The advantages of closed-form estimators, computational efficiency, and guaranteed convergence are emphasized for broad applications in high-dimensional data analysis.
E N D
Inverse Regression Methods Prasad Naik 7th Triennial Choice Symposium Wharton, June 16, 2007
Outline • Motivation • Principal Components (PCR) • Sliced Inverse Regression (SIR) • Application • Constrained Inverse Regression (CIR) • Partial Inverse Regression (PIR) • p > N problem • simulation results
Motivation • Estimate the high-dimensional model: • y = g(x1, x2, ..., xp) • Link function g(.) is unknown • Small p ( 6 variables) • apply multivariate local (linear) polynomial regression • Large p (> 10 variables), • Curse of dimensionality => Empty space phenomenon
Principal Components (PCR, Massy 1965, JASA) • PCR • High-dimensional data X x • Eigenvalue decomposition • x e = e • (1, e1), (2, e2), ... , (p, ep) • Retain K components, (e1, e2, ..., eK) • where K < p • Low-dimensional data, Z = (z1, z2, ..., zK) • where zi = Xei are the “new” variables (or factors) • Low-dimensional subspace, K = ?? • Not the most predictive variables • Because y information is ignored
Sliced Inverse Regression (SIR, Li 1991, JASA) • Similar idea: Xn x p Z n x K • Generalized Eigen-decomposition • e = x e • where = Cov(E[X|y]) • Retain K* components, (e1, ..., eK*) • Create new variables Z = (z1,..., zK*), where zi = Xei • K* is the smallest integer q (= 0, 1, 2, ...) such that • Most predictive variables across • any set of unit-norm vectors e’s and • any transformation T(y)
SIR Applications (Naik, Hagerty, Tsai 2000, JMR) • Model • p variables reduced to K factors • New Product Development context • 28 variables 1 factor • Direct Marketing context • 73 variables 2 factors
Constrained Inverse Regression (CIR, Naik and Tsai 2005, JASA) • Can we extract meaningful factors? • Yes • First capture this information in a set of constraints • Then apply our proposed method, CIR
Example 4.1 from Naik and Tsai (2005, JASA) • Consider 2-Factor Model • p = 5 variables • Factor 1 includes variables (4,5) • Factor 2 includes variables (1,2,3) • Constraint sets:
CIR (contd.) • CIR approach • Solve the eigenvalue decomposition: • (I-Pc) e = x e • where the projection matrix • When Pc = 0, we get SIR (i.e., nested) • Shrinkage (e.g., Lasso) • set insignificant effects to zero by formulating an appropriate constraint • improves t-values for the other effects (i.e., efficiency)
p > N Problem • OLS, MLE, SIR, CIR break down when p > N • Partial Inverse Regression (Li, Cook, Tsai, Biometrika, forthcoming) • Combines ideas from PLS and SIR • Works well even when • p > 3N • Variables are highly correlated • Single-index Model • g(.) unknown
p > N Solution • To estimate , first construct the matrix R as follows • where e1 is the principal eigenvector of = Cov(E[X|y]) • Then
Conclusions • Inverse Regression Methods offer estimators that are applicable for • a remarkably broad class of models • high-dimensional data • including p > N (which is conceptually the limiting case) • Estimators are closed-form, so • Easy to code (just a few lines) • Computationally inexpensive • No iterations or re-sampling or draws (hence no do or for loops) • Guaranteed convergence • Standard errors for inference are derived in the cited papers