200 likes | 408 Views
Statistical Arbitrage. Ying Chen, Leonardo Bachega Yandong Guo, Xing Liu March, 2010. Outline. Overview of the project Implement issues Data adjustment mistakes Stocks classification Future work. Framework. Current stock prices. PCA Eigenportfolios. Market model. Adjusted
E N D
Statistical Arbitrage Ying Chen, Leonardo Bachega Yandong Guo, Xing Liu March, 2010
Outline Overview of the project Implement issues Data adjustment mistakes Stocks classification Future work
Framework Current stock prices PCA Eigenportfolios Market model Adjusted Stock price Series + indices Raw Historical Data From WRDS Compute S-scores Signal trade orders 252-day returns ETFs for industry sectors Market model Residual process model 60-day returns Residuals as increments of AR process Data pre-processing (python scripts) Back-testing simulations (matlab scripts)
Implementation Issues Delist tomorrow Criteria: detect tomorrow’s outstanding shares In the portfolio, close transaction Not in the portfolio, not consider trading but still consider PCA calculating Today’s price == 0 in the middle Not consider PCA calculating and trading In the portfolio, keep it
Implementation Issues (Cont’d) Market Cap < 1B If already in the portfolio, keep it and consider trading No, not consider PCA calculating and trading Stocks picked to calculate Eigenportfolio Today’s price != 0 Previous 252 days have nonzero prices Market Cap > 1B or already in the portfolio
Data Adjustment Mistakes Dividend adjustment
Data Adjustment Plan Dividend adjustment Split detection and adjustment using CFACPR and CFACSHR
Stock Classification Using GIC (Global Industry Classification) in CRSP 10 Sectors, 24 Industry Groups, 67 Industries and 147 Sub-Industries XXXXXXXX Sector Sub-Industry Industries Industry Groups
PCA eigenportfolio Weights Normalization • Basic principle • Find the most important eigenvectors (15 in the paper) and normalize them by the corresponding standard deviations of each stock return
PCA algorithm by the author • Suppose X is a nxp matrix including n samples and p features; • Original algorithm: • Calculate the Eigen-decomposition of the correlation matrix: • The matrix Q consists of the Eigen-vectors of the correlation matrix
PCA discussion? • Question • Should the eigenvector be divided by the sigma, the sample standard deviation? • Answer: • No. (different from the paper)
PCA discussion • The meaning of “risk factor” F • F should represent the market overall performance. • The behavior of F should act as the “market return” • What can PCA do? • PCA is mathematically defined as an orthogonal linear transformation that transforms the data to a new coordinate system such that the greatest variance by any projection of the data comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on. • PCA is theoretically the optimum transform for given data in least square terms.
PCA discussion • Derivation • Notations • F =EX • F :mxn matrix, represents the eigenportifolio • E: mxp matrix, first m important eigenvectors • X: pxn matrix, contains the stock return • m: 15 in the paper • n: the number of days, (samples) • p: the number of stocks
PCA discussion • Derivation • The i-th row of the eigenportfolio • The variation should be maximized under the constraint that • to be maximized, then • That is to say, the weighting factor should be the eigenvectors rather than the eigenvectors divided by the standard deviation. (The experiment is the same without dividing)
Experiment result Top 50 eigenvalues of the correlation matrix of market returns computed on May 1 2007 estimated using a 1-year window and a universe of 1590 stocks
Future work • Data adjustment • Experiment on ETF • Compare ETF with PCA • Take into account • Transaction fee, interest, dividend • Volume