1 / 20

Ordinary least squares regression (OLS)

Ordinary least squares regression (OLS). Minimize Solve the equation system that can be written where is a matrix of observed x -variables. Ridge regression. Minimize

edward
Download Presentation

Ordinary least squares regression (OLS)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ordinary least squares regression (OLS) • Minimize • Solve the equation system that can be written where is a matrix of observed x-variables Computational statistics 2009

  2. Ridge regression • Minimize where  is a given shrinkage factor • Solve the equation system that can be written where is a matrix of standardized x-variables Computational statistics 2009

  3. Smoothing a time series of observations • Minimize where  is a given smoothing factor • Solve the equation system i.e. solve an equation system with n unknowns and n equations Computational statistics 2009

  4. Smoothing a time series of observations • Solve the equation system Computational statistics 2009

  5. Smoothing a time series of observations • Solve the equation system where A is symmetric and Computational statistics 2009

  6. Gauss elimination Consider the equation system where A is a square nonsingular matrix Set Form and continue to eliminate variables one by one Computational statistics 2009

  7. LU-decomposition • Any non-singular matrix can be decomposed into a product of an upper triangular matrix U and a lower triangular matrix L with ones on the diagonal • We can then solve the equation system by first solving and then Computational statistics 2009

  8. LU-decomposition • The number of additions/multiplications needed for an LU-decomposition is approximately p3 • The numerical stability of LU-decomposition can be increased by pivoting the rows of the coefficient matrix Computational statistics 2009

  9. Choleski decomposition • Any positive definite symmetric matrix A can be uniquely decomposed into a product where U is an upper triangular matrix with positive diagonal elements • We can then solve the equation system by first solving and then Computational statistics 2009

  10. Fitting a single regression model to data • The matrix X’X is always symmetric and it is positive definite provided that rank(X) = p • Use Cholesky decomposition for fitting a single regression model to data Computational statistics 2009

  11. Stepwise regression and the sweep operator • Consider the augmented matrix • Sequentially apply the sweep operator to this matrix. • This yields the least squares estimates and residual sum of squares corresponding to models with just the first k covariates included, k = 1, …,p. • It is easy to update the fit for adding or deleting a covariate. Computational statistics 2009

  12. Fitting a ridge regression model to data • The introduction of a shrinkage factor has two effects: • The numerical stability of the equation system is higher than that of the OLS system • The variance of the obtained predictor is reduced Computational statistics 2009

  13. Smoothing a time series of observations • The equation system set up to minimize has a coefficient matrix that is a symmetric, positive definite band matrix • The upper triangular matrix in the Cholesky decomposition is also a band matrix • The number of operations needed is O(n) • The smoothing conditions can be tailored to the application Computational statistics 2009

  14. Smoothing of time series of data collected over several seasons Sequential smoothing over seasons Smoothing over years Computational statistics 2009

  15. Smoothing of time series of data representing several sectors Circular smoothing over sectors Temporal smoothing over years Computational statistics 2009

  16. Regression using QR-decomposition of the X matrix • Assume that the X-matrix has full rank p • For any n x n orthogonal matrix Q • Find a Q so that where R is an upper triangular matrix • The least squares solution is then given by where Q1 contains the first p columns of Q Computational statistics 2009

  17. Calculation of sample variances • Algorithm 1: • Formula 2: • Are the two algorithms numerically equivalent? Computational statistics 2009

  18. Least squares regression with constants • Reparameterize the regression model to • This often gives a much better conditioned problem • If the first column of the X-matrix is constant, QR-factorization automatically transforms the problem in this manner. Computational statistics 2009

  19. Singular value decomposition • The singular value decomposition (SVD) of an nxp matrix X with n  p is of the form where U is an orthonormal nxp matrix (U’U = Ip), D is a diagonal matrix with elements d1  d2 … dp 0, and V is a pxp orthonormal matrix (V’V = Ip) • Normally, SVD provides stable solutions of linear regression problems • In addition, the columns of UD and the singular values d1,d2 ,…, dp 0 have interesting statistical interpretations Computational statistics 2009

  20. Statistical interpretation of SVD components • The principal components of a set of data in Rp provide a sequence of best linear approximations to that data, of all ranks q  p • The directions of the extracted vectors are given by v1, …, vp • The coordinates of the data points in the new coordinate system are given by the columns of UD • In addition, • The linear combination Xv1 has the highest variance of all linear combinations of the features for which v1 has length 1. • The linear combination Xv2 has the highest variance of all linear combinations of the features for which v2 has length 1 and is orthogonal to v1. • … Computational statistics 2009

More Related