470 likes | 650 Views
Imre Kondor Collegium Budapest and Eötvös University, Budapest COST, Palermo, September 21-23 2007. Divergent Estimation Error in Portfolio Optimization and in Linear Regression. Coworkers. Szilárd Pafka (P aycom .net, California) Gábor Nagy (CIB Bank, Budapest)
E N D
Imre Kondor Collegium Budapest and Eötvös University, Budapest COST, Palermo, September 21-23 2007 Divergent Estimation Errorin Portfolio Optimization and in Linear Regression
Coworkers • Szilárd Pafka(Paycom.net, California) • Gábor Nagy (CIB Bank, Budapest) • Nándor Gulyás (Collegium Budapest) • István Varga-Haszonits (Morgan-Stanley Fixed Income, Budapest) • Andrea Ciliberti (Science et Finance, Paris) • Marc Mézard (Orsay University) • Stefan Thurner (Vienna University
Summary When there are no sufficient data - portfolio selection is highly unstable: the estimation error diverges for a critical value of the ratio of the portfolio size N and the length of the time series T, and the weights are unstable even when the overall estimation error is tamed by filtering or constraints - this divergence is an algorithmic phase transition that is characterized by universal scaling laws, - multivariate linear regression is equivalent to quadratic optimization, so concepts, methods, and results can be taken over to the regression problem, - when applied to complex phenomena, the classical problems with regression (hidden variables, correlations, non-Gaussian noise) are supplemented by the high number of the explicatory variables and the scarcity of data, - so modeling is often attempted in the vicinity of, or even below, the critical point.
Consider the simplest portfolio optimization problem Minimize the variance subject to the budget constraint
Remarks • Fat tails may necessitate to consider other risk measures • The return constraint has been dropped for simplicity – index tracking leads to precisely this form • Unlimited short selling is allowed – later linear constraints will be introduced • The problem considered here is regarded as a representative of a larger class of quadratic optimization problems
How do we know the returns and the covariances? • In principle, from observations on the market • If the portfolio contains N assets, we need O(N²) data • The input data come from T observations for N assets, and T is always limited • The estimation error is negligible as long as NT>>N², i.e. N<<T • This condition is often violated in practice (there is no natural cutoff in company sizes or capitalizations, so N is large)
Information deficit • Thus the Markowitz problem suffers from the „curse of dimensions”, or from information deficit • The estimates will contain error and the resulting portfolios will be suboptimal
Fighting the curse of dimensions • Economists have been struggling with this problem for ages. Since the root of the problem is lack of sufficient information, the remedy is to inject external info into the estimate. This means imposing some structure on σ. This introduces bias, but beneficial effect of noise reduction may compensate for this. • Examples: • single-factor models (β’s) All these help to • multi-factor models various degrees. • grouping by sectors Most studies are based • principal component analysis on empirical data • Bayesian shrinkage estimators, etc. • Random matrix theory
Our approach: • Analytical: Applying the methods of statistical physics (random matrix theory, phase transition theory, replicas, etc.) • Numerical: To test the noise sensitivity of various risk measures we use simulated data, so that to have full control over the underlying stochastic process. For simplicity, we mostly use iid normal variables in the following.
For such simple underlying processes the exact risk measure can be calculated. • To construct the empirical risk measure we generate long time series, and cut out segments of length T from them, as if making observations on the market. • From these „observations” we construct the empirical risk measure and optimize our portfolio under it.
The ratio qo of the empirical and the exact risk measure is a measure of the estimation error due to noise:
The relative error of the optimal portfolio is a random variable, fluctuating from sample to sample. • The weights of the optimal portfolio also fluctuate.
Critical behaviour for N,T large, with N/T=fixed The average of qo as a function of N/T can be calculated from random matrix theory:it diverges at the critical point N/T=1
The standard deviation of the estimation error diverges even more strongly than the average: , where r = N/T
Instability of the weigthsThe weights of a portfolio of N=100 iid normal variables for a given sample, T=500
The distribution of weights in a given sample • The optimization hardly determines the weights even far from the critical point! • The standard deviation of the weights relative to their exact average value also diverges at the critical point
If short selling is banned If the weights are constrained to be positive, the instability will manifest itself by more and more weights becoming zero – the portfolio spontaneously reduces its size! Explanation: the solution would like to run away, the constraints prevent it from doing so, therefore it will stick to the walls. Similar effects are observed if we impose any other linear constraints, like limits on sectors, etc. It is clear, that in these cases the solution is determined more by the constraints (and the experts who impose them) than the objective function.
If the variables are not iid Experimenting with various market models (one-factor, market plus sectors, positive and negative covariances, etc.) shows that the main conclusion does not change – a manifestation ofuniversality Overwhelmingly positive correlations tend to enhance the instability, negative ones decrease it, but they do not change the power of the divergence, only its prefactor
After filtering the noise is much reduced, and we can even penetrate into the region below the critical point T<N . BUT: the weights remain extremely unstable even after filtering ButBut:BUT:
Similar studies under alternative risk measures: mean absolute deviation, expected shortfall and maximal loss • Lead to similar conclusions, except that the effect of estimation error is even more serious • In the case of ES and ML the existence of a solution becomes a probabilistic issue, depending on the sample • Calculation of this probability leads to some intriguing problems in random geometry that can be solved by the replica method.
A wider context • The critical phenomena we observe in portfolio selection are analogous to the phase transitions discovered recently in some hard computational problems, they represent a new „random Gaussian” universality class within this family, where a number of modes go soft in rapid succession, as one approaches the critical point. • Filtering corresponds to discarding these soft modes.
The appearence of powerful tools borrowed from statistical physics (random matrices, phase transition concepts, scaling, universality, replicas) is an important development that enriches finance theory
The sampling error catastrophe, due to lack of sufficient information, appears in a much wider set of problems than just the problem of investment decisions (multivariate regression, stochastic linear progamming and all their applications.) • Whenever a phenomenon is influenced by a large number of factors, but we have a limited amount of information about this dependence, we have to expect that the estimation error will diverge and fluctuations over the samples will be huge.
Optimization and statistical mechanics • Any convex optimization problem can be transformed into a problem in statistical mechanics, by promoting the cost (objective, target) function into a Hamiltonian, and introducing a fictitious temperature. At the end we can recover the original problem in the limit of zero temperature. • Averaging over the time series segments (samples) is similar to what is called quenched averaging in the statistical physics of random systems: one has to average the logarithm of the partition function (i.e. the cumulant generating function). • Averaging can then be performed by the replica trick
Portfolio optimization and linear regression Portfolios:
Minimizing the residual error for an infinitely large sample
Linear regression is ubiquitous (microarrays, medical sciences, brain research, sociology, macroeconomics, etc.) • The link between statistical mechanics – optimization – regression allows us to apply methods borrowed from one field to another. • Powerful methods and concepts from statistical physics (random matrices, phase transition concepts, replicas) are particularly promising in this respect
What we have learned • If we do not have sufficient information we cannot make an intelligent decision, nor can we build a good model – so far this is a triviality • The important message here is that there is a critical point in both the optimization problem and in the regression problem where the error diverges, and its behaviour is subject to universal scaling laws
Truth is always concrete. What is general is either empty, or wrong, or pathological. Lev Landau
NowI would like to make a few general remarks on complex systems. At the end you may decide if they are empty, wrong, or pathological.
Normally, one is supposed to work in the N<<T limit, i.e. with low dimensional problems and plenty of data. • Complex systems are very high dimensional and irreducible (incompressible), they require a large number of explicatory variables for their faithful representation. • The dimensionality of the minimal model providing an acceptable representation of a system can be regarded as a measure of the complexity of the system. (Cf. Kolmogorov – Chaitin measure of the complexity of a string. Also Jorge Luis Borges’ map.)
Therefore, we have to face the unconventional situation also in the regression problem that N~T, or N>T, and then the error in the regression coefficients will be large. • If the number of explicatory variables is very large and they are all of the same order of magnitude, then there is no structure in the system, it is just noise (like a completely random string). So we have to assume that some of the variables have a larger weight than others, but we do not have a natural cutoff beyond which it would be safe to forget about the higher order variables. This leads us to the assumption that the regression coefficients must have a scale free, power law like distribution for complex systems.
The regression coefficients are proportional to the covariances of the dependent and independent variables. A power law like distribution of the regression coefficients implies the same for the covariances. • In a physical system this translates into the power law like distribution of the correlations. • The usual behaviour of correlations in simple systems is not like this: correlations fall off typically exponentially.
Exceptions: systems at a critical point, or systems with a broken continuous symmetry. Both these are very special cases, however. • Correlations in a spin glass decay like a power, without any continuous symmetry! • The power law like behaviour of correlations is a typical behaviour in the spin glass phase, not only on average, but for each sample. • A related phenomenon is what is called chaos in spin glasses. • The long range correlations and the multiplicity of ground states explain the extreme sensitivity of the ground states: the system reacts to any slight external disturbance, but the statistical properties of the new ground state are the same as before: this is a kind of adaptation or learning process.
Other complex systems? Adaptation, learning, evolution, self-reflexivity cannot be expected to appear in systems with a translationally invariant and all-ferromagnetic coupling. Some of the characteristic features of spin glasses (competition and cooperation, the existence of many metastable equilibria, sensitivity, long range correlations) seem to be necessary minimal properties of any complex system. • This also means that we will always face the information deficite catastrophe when we try to build a model for a complex system.
How can we understand that people (in the social sciences, medical sciences, etc.) are getting away with lousy statistics, even with N>T? • They are projecting external information into their statistical assessments. (I can draw a well-determined straight line across even a single point, if I know that it must be parallel to another line.) • Humans do not optimize, but use quick and dirty heuristics. This has an evolutionary meaning: if something looks vaguely like a leopard, one jumps, rather than trying to seek the optimal fit to the observed fragments of the picture to a leopard.
Prior knowledge, the „larger picture”, deliberate or unconscious bias, etc. are essential features of model building. • When we have a chance to check this prior knowledge millions of times in carefully designed laboratory experiments, this is a well-justified procedure. • In several applications (macroeconomics, medical sciences, epidemology, etc.) there is no way to perform these laboratory checks, and errors may build up as one uncertain piece of knowledge serves as a prior for another uncertain statistical model. This is how we construct myths, ideologies and social theories.
It is conceivable that theory building, in the sense of constructing a low dimensional model, for social phenomena will prove to be impossible, and the best we will be able to do is to build a life-size computer model of the system, a kind of gigantic Simcity. • It remains to be seen what we will mean by understanding under those circumstances.