430 likes | 625 Views
SVD and LS. M.A. Miceli University of Rome I Stats in the Château Jouy-en-Josas August 31 - September 4 2009. Motivations. Problems of high dimensionality in estimation: Rank < actual dimension of the data sets inverse problems
E N D
SVD and LS M.A. Miceli University of Rome I Stats in the Château Jouy-en-Josas August 31 - September 4 2009 M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
Motivations • Problems of high dimensionality in estimation: • Rank < actual dimension of the data sets inverse problems • Threholds in accepting variables eases on every dimension, as the number of variables/dimensions increases (ex. Wald test). • How the SVD helps in extracting robust correlations between dependent and independent variables: automatic choice of “model”. • Why • Some evidence in predicting US CPIs indexes • Some issues about normalizations M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
Motivations Given a simultaneous linear system of equations • Collapsing dimensionality of the system to its min rank = min [rank(Y), rank (X)], • Advantages of SVD w.r.t. Principal Components: • PC requires a sqare matrix, e.g. autocorrelation matrix, and ranks the dimensions within that single matrix; • SVD ranks the correlations between X and Y dimensions • Discretionary possibility of getting rid of some - believed negligible – dimensions: we are interested in getting rid of those dimensions that can be generated by a totally random system of same dimensions (Marchenko-Pastur conditions adapted to a rectangular matrix). M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
Definition of SVD of a matrix product • SVD definition Having two matrices one can write and therefore If T << max(M,N)? No problems M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
Diagonalizing the LS estimator • Consider regressing every column y over the set of explanatory variables X: • we write • We diagonalize both matrices: (X’X) and (X’Y): • X’X • X’Y rectangular • NB. The SVD of a square matrix IS the same as the diagonalisation. We will write M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
SVD of the covariance matrix 0 (X’ Y) Vxy Uxy Sxy M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
SVD mapping from column basis to row basis 0 X’Y Vxy Uxy Sxy M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
SVD: splitting the product X’Y Y Vxy X Uxy Sxy Y linear combin X linear combin M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
Adding diagonalisation of both X and Y matrices M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
Returning to the original variables Replacing the old “B”: any advantage??!! Vxy ‘ Vyy ’ Inv(Dxx) Sxy Y X Uxx Uxy We may cancel factors: any criterium? M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
RMT • Marcenko-Pastur conditions compute singular values density and interval limits for square matrices. Bouchaud, Miceli et al (2005) derive them for rectangular matrices. • We run exactly the same experiment with purely random generated matrices for “many times”: limits and densities reply the theory M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
Marcenko-Pastur limits and density M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
RMT • Density and limits do change if we use raw or already diagonalized data. • Is this “double diagonalization” worthwhile? • singular values are HD0 in standardization, eigenvectors are NOT. M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
Diagonalized “LS estimator” Very disturbing We may approach the same problem in different ways • raw data • normalized factors • non normalized factors “unfortunately” 3. works best. Why? … Is it because factor normalization changes the ranking of the SVD singular values and this affect eventually the factor selection? NO! Answer at the end …. M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
Example: Forecasting US CPIs Indexes Time series are mom % changes: • Y:= 9 CPIs Indexes, aug83 – apr07 • X:= 77 macroeconomic series nov83-apr07 including 3 lags of the Ys. T=282, N=9, M=77, rolling window W=100 or else. n= N/W, m=M/W. M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
CPIs M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
Xs M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
Estimation by Model III M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
Singular values: Model I – Random generated DATA M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
Singular values for SVD on raw and random DATA M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
Estimation by Model II Factors are divided by their own eigenvalue M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
Singular values: Model II – Data NORMALIZED FACTORS lambda max = 0.934 Lambda min =0.608 M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
Singular values: Model II – Random generated NORMALIZED FACTORS lambda max = 0.934 Lambda min =0.608 Random generated singular values don’t look very differently …. M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
Singular values for SVD on raw and random FACTORS M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
Let’s see estimations by Model III M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
P&L Model III - Factors on raw data M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
P&L Model III - CPI Indexes (Model of Non Normalized Factors) – In sample With ALL svd factors 2 svd factors M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
Let’s see estimations by Model II (normalized factors) M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
P&L Model II (Normalized factors) - Factors M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
P&L Model II (Normalized factors) – CPI’s M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
Example of CPI_comdty estimation Non normalized factors Normalized factors M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
OUT OF SAMPLE • Estimation on t=1,…,120 • Forecast at fixed coefficients for t= 121, … 282 M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
P&L: Factors (Model II) M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
Forecast on CPI’s All factors 2 factors only Easier to predict: 1. medical care (since stable), 2. commodities (oil), 3. Transports M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
Forecasts on Cpi’s Comdty M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
Conclusions 1 M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009
Conclusions on the example M. A. Miceli “SVD and LS” - Stats in the Château - August 31 - September 4 2009