1 / 24

Dimension reduction (3)

Understand dimension reduction techniques like Sliced Inverse Regression and Multi-dimensional LDA to predict Y accurately based on X data, without losing essential information. Explore models and strategies for optimal regression analysis.

wlyon
Download Presentation

Dimension reduction (3)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dimension reduction (3) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis

  2. EDR space Now we start talking about regression. The data is {xi, yi} Is dimension reduction on X matrix alone helpful here? Possibly, if the dimension reduction preserves the essential structure about Y|X. This is suspicious. Effective Dimension Reduction --- reduce the dimension of X without losing information which is essential to predict Y.

  3. EDR space The model: Y is predicted by a set of linear combinations of X. If g() is known, this is not very different from a generalized linear model. For dimension reduction purpose, is there a scheme which can work on almost any g(), without knowledge of its actual form?

  4. EDR space The general model encompasses many models as special cases:

  5. EDR space Under this general model, The space B generated by β1, β2, ……, βK is called the e.d.r. space. Reducing to this sub-space causes no loss of information regarding predicting Y. Similar to factor analysis, the subspace B is identifiable, but the vectors aren’t. Any non-zero vector in the e.d.r. space is called an e.d.r. direction.

  6. EDR space This equation assumes almost the weakest form, to reflect the hope that a low-dimensional projection of a high-dimensional regressorvariable contains most of the information that can be gathered from a sample of modest size. It doesn’t impose any structure on how the projected regressorvariables affect the output variable. Most regression models assume K=1, plus additional structures on g().

  7. EDR space The philosophical point of Sliced Inverse Regression: the estimation of the projection directions can be a more important statistical issue than the estimation of the structure of g() itself. After finding a good EDR space, we can project data to this smaller space. Then we are in a better position to identify what should be pursued further : model building, response surface estimation, cluster analysis, heteroscedasticity analysis, variable selection, ……

  8. SIR Sliced Inverse Regression. In regular regression, our interest is the conditional density h(Y|X). Most important is E(Y|x) and var(Y|x). SIR treats Y as independent variable and X as the dependent variable. Given Y=y, what values will X take? This takes us from a p-dimensional problem (subject to curse of dimensionality) back to a 1-dimensional curve-fitting problem: E(xi|y), i=1,…, p

  9. SIR

  10. SIR

  11. SIR covariance matrix for the slice means of x, weighted by the slice sizes Find the SIR directions by conducting the eigenvalue decomposition of with respect to : sample covariance for xi ’s

  12. SIR An example response surface found by SIR.

  13. SIR and LDA Reminder: Fisher’s linear discriminant analysis seeks a projection direction that maximized class separation. When the underlying distributions are Gaussian, it agrees with the Bayes decision rule. It seeks to maximize: Between-group variance: Within-group variance:

  14. SIR and LDA The solution is the first eigen vector in this eigen value decomposition: If we let , the LDA agrees with SIR up to a scaling.

  15. PLS • Finding latent factors in X that can predict Y. • X is multi-dimensional, Y can be either a random variable or a random vector. • The model will look like: • where Tjis a linear combination of X • PLS is suitable in handling p>>N situation.

  16. PLS Data: Goal:

  17. PLS Solution: ak+1 is the (k+1)theigen vector of Alternatively, The PLS components minimize Can be solved by iterative regression.

  18. PLS Example: PLS v.s. PCA in regression: Y is related to X1

  19. Network component analysis Other than dimension reduction, hidden factor model, there is another way to understand a model like this: It can be understood as explaining the data by a bipartite network --- a control layer and an output layer. Unlike PCA and ICA, NCA doesn’t assume a fully linked loading matrix. Rather, the matrix is sparse. The non-zero locations are pre-determined by prior knowledge about regulatory networks. For example,

  20. Network component analysis Motivation: Instead of blindly search for lower dimensional space, a priori information is incorporated into the loading matrix.

  21. NCA XNxP=ANxKPKxP+ENxP Conditions for the solution to be unique: A is full column rank; When a column of A is removed, together with all rows corresponding to non-zero values in the column, the remaining matrix is still full column rank; P must have full row rank

  22. NCA Fig. 2. A completely identifiable network (a) and an unidentifiable network (b). Although the two initial [A] matrices describing the network matrices have an identical number of constraints (zero entries), the network in b does not satisfy the identifiability conditions because of the connectivity pattern of R3. The edges in red are the differences between the two networks.

  23. NCA Notice that both A and P are to be estimated. Then the criteria of identifiability is in fact untestable. The compute NCA, minimize the square loss function: Z0 is the topology constraint matrix – i.e. which position of A is non-zero. It is based on prior knowledge. It is the network connectivity matrix.

  24. NCA Solving NCA: This is a linear decomposition system which has the bi-convex property. It is solved by iteratively solving for A and P while fixing the other one. Both steps use least squares. Convergence is judged by the total least-square error. The total error is non-increasing in each step. Optimality is guaranteed if the three conditions for identifiability are satisfied. Otherwise a local optimum may be found.

More Related