440 likes | 570 Views
Imre Kondor Collegium Budapest and Eötvös University, Budapest, Hungary Seminar talk given at the Shidler College of Business, University of Hawaii at Manoa, January 29, 2010. The Sensitivity of Portfolio Selection to Estimation Error .
E N D
Imre Kondor Collegium Budapest and Eötvös University, Budapest, Hungary Seminar talk given at the Shidler College of Business, University of Hawaii at Manoa, January 29, 2010 The Sensitivity of Portfolio Selection to Estimation Error This work has been supported by the Teller Program of the National Office for Research and Technology, Contract No. KCKHA005
Contents I. Preliminaries: The problem of estimation error, risk measures, noisy covariance matrices II. Noise sensitivity of various alternative risk measures (parametric VaR, mean absolute deviation, parametric and historical expected shortfall, worst loss) III. The feasibility problem (the instability of coherent and downside risk measures) IV. The mirage of apparent arbitrage V. Regularization: the promise of machine learning methods
Coworkers • Sz. Pafka and G. Nagy (CIB Bank, Budapest, a member of the Intesa Group), • István Varga-Haszonits (Eötvös University, Budapest, now at Morgan Stanley Fixed Income, Budapest) • S. Still (University of Hawaii)
Preliminary considerations • Portfolio selection: a tradeoff between risk and reward • There is a more or less general agreement on what we mean by reward in a finance context, but the status of risk measures is controversial • For optimal portfolio selection we have to know what we want to optimize, and under what kind of constraints • The chosen risk measure should respect some obvious mathematical requirements (such as convexity, e.g.), must be stable against estimation error, and easy to implement in practice.
The problem of estimation error • Even if returns formed a clean, stationary stochastic process, we only could observe finite length time segments (samples), therefore we never have sufficient information to completely reconstruct the underlying process. Our estimates will always be noisy. • Mean returns are particularly hard to measure on the market with any precision • Even if we disregard returns and go for the minimal risk portfolio, lack of sufficient information will introduce „noise”, i. e. error, into correlations and covariances, hence into our decision. • The problem of noise is especially severe for large portfolios (size N) and relatively short time series (length T) of observations, and different risk measures are sensitive to noise to a different degree. • We have to know how the decision error depends on N and T for a given risk measure
Some elementary criteria on risk measures • A risk measure is a quantitative characterization of our intuitive risk concept (fear of uncertainty and loss). • Risk is related to the stochastic nature of returns. It is (or should be) a functional of the pdf of returns. • Any reasonable risk measure must satisfy: - convexity (diversification!) - invariance under addition of risk free asset - monotonicity and assigning zero risk to a zero position. • Convexity is extremely important. A non-convex risk measure penalizes diversification, does not allow risk to be correctly aggregated, cannot provide a basis for rational pricing of risk(the efficient set may not be convex), cannot serve as a basis for a consistent limit system. In short, a non-convex risk measure is really not a risk measure at all.
Variance • The classic risk measure: the average quadratic deviation • Assumes a tight underlying distribution, such as the Gaussian, but financial time series are fat tailed (25 sigma events may occur) • Minimizing the variance of a multivariate fat tailed pdf may actually increase the risk at the tails
Value at Risk (VaR) • A high quantile: The threshold below which a given percentage (say 1%) of the weight of the profit-loss distribution resides = The minimal loss we incur on the worst day out of a hundred = The best outcome of the worst 1% of cases. • Introduced in the wake of the US savings and loan crisis by J.P. Morgan at the beginning of the 90’s. • Spread over the industry and in regulation like fire. • It came under criticism by academics for its lack of convexity already back in the 90’s.
Coherent risk measures Proposed by P. Artzner, F. Delbaen, J.-M. Eber, D. Heath, Risk10, 33-49 (1997); Mathematical Finance,9, 203-228 (1999), they represent an axiomatic approach to risk measures, and are meant to safeguard against inconsistencies like those of VaR. A risk measure is said to be coherent if it is monotonic, subadditive, positive homogeneous, and translationally invariant.
Alternative risk measures • Mean Absolute Deviation (MAD) • Expected Shortfall (ES): the conditional average over a high quantile • Maximal Loss (ML): the extreme case of ES, the optimal combination of the worst outcomes. • ES and ML are coherent. VaR, ES, and ML are downside risk measures.
Portfolios Consider a linear combination of returns with weights : , The portfolio’s expectation value is: with variance: , where is the covariance matrix, and the standard deviation of return .
The Markowitz problem According to Markowitz’ classical theory the tradeoff between risk and reward can be realized by minimizing the variance over the weights, for a given expected return and budget
The minimal risk portfolio • Expected returns are hardly possible (on efficient markets, impossible) to determine with any precision. • For the sake of simplicity, we confine ourselves to considering the global minimal risk portfolio only, that is we drop the return constraint. • Minimizing the variance of a portfolio without considering return does not, in general, make much sense. In some cases (index tracking, benchmarking), however, this is precisely what one has to do.
The weights of the minimal risk portfolio • Analytically, the minimal variance portfolio corresponds to the weights for which is minimal, given . The solutions is: . • Note that the weights are not assumed to be necessarily positive, that is we allow unlimited short selling.
Empirical covariance matrices • The covariance matrix has to be determined from measurements on the market. From the returns observed at time twe get the estimator: • For a portfolio of N assets the covariance matrix has O(N²) elements. The time series of length T for N assets contain NT data. In order for the measurement to be precise, we need N <<T. Bank portfolios may contain hundreds of assets, and it is hardly meaningful to use time series older than 4 years, while the sampling frequency cannot be high for portfolio optimization. Therefore, N/T << 1 rarely holds in practice. As a result, there will be a lot of noise in the estimate, and the error will grow fast with N/T.
Fighting the curse of dimensions • Economists have been struggling with this problem for ages. Since the root of the problem is lack of sufficient information, the remedy is to inject external info into the estimate. This means imposing some structure on σ. This introduces bias, but beneficial effect of noise reduction may compensate for this. • Examples: • single-index models (β’s) All these help to • multi-index models various degrees. • grouping by sectors Most studies are based • principal component analysis on empirical data • Baysian shrinkage estimators, etc.
A measure of the estimation error Assume we know the true covariance matrix and the noisy one . Then a natural measure of the estimation error of is where w*are the optimal weights corresponding to and , respectively.
For iid normal variables one can easily prove that for large N and T the sample average of qo is: • This shows that for iid normal variables the critical value of the ratio N/T is 1. • The character of the divergence is largely independent of the risk measure, or the properties of the time series (universality).
The next slides show • plots of wi (porfolio weights) as a function of i • display of q0 (ratio of risk of optimal portfolio determined from time series information vs full information) • results show that the effect of estimation noise is very significantand that MAD (ES) requiremore (much more) data, i.e. smaller N/T, than the variance.
The estimation error (q0) as function of T/N (for large N and T):
The feasibility problem • In addition to the large fluctuations, for finite N and T, the portfolio optimization problem for ES and ML does not always have a solution even below the critical N/T ratio! (These risk measures may become unbounded.) • For finite N and T, the existence of the optimum is a probabilistic issue, it depends on the sample. • As N and T→ ∞ with N/T = fixed, this probability goes to 1 resp. 0, according to whether N/T is below, or above (N/T)crit.
Illustration: the case of Maximal Loss Definition of the problem (for simplicity, we are looking for the global minimum and allow unlimited short selling): where the w’s are the portfolio weights and the x’s the returns.
Probability of finding a solution for the minimax problem (for elliptic underlying distributions): In the limit N,T → ∞, with N/T fixed, the transition becomes sharp at N/T = ½.
The phase boundary for ES has been calculated analytically in A. Ciliberti, I. K., and M. Mézard:On the Feasibility of Portfolio Optimization under Expected Shortfall, Quantitative Finance, 7, 389-396 (2007)
The mirage of apparent arbitrage • The intuitive explanation for the instability of ES and ML is that for a given finite sample there may exist a dominant item (or a dominant combination of items) that produces a larger return at each time point than any of the others, even if no such dominance relationship exist between them on very large samples. This leads the investor to believe that if she goes extremely long in the dominant item and extremely short in the rest, she can produce an arbitrarily large return on the portfolio, and a risk that goes to minus infinity (i.e. no risk). • These considerations generalize for all coherent measures.
The formal statements corresponding to the above intuition Theorem 1. If there exist two portfolios u and v so that then the portfolio optimisation task has no solution under any coherent measure. Theorem 2. Optimisation under ML has no solution, if and only if there exists a pair of portfolios such that one of them strictly dominates the other. Neither of these theorems assumes anything about the underlying distribution.
For elliptically distributed underlyings we can say more: Corollary 1: For elliptically distributed items the probability of the existence of a pair of portfolios such that one of them dominates the other on a given sample Xis 1-p(N,T). (Think of the minimax.) Corollary 2: The probability of the unfeasibility of the portfolio optimisation problem under any coherent measure on the sample X is at least 1-p(N,T)if the underlying assets are elliptically distributed. Corollary 3: If there is a sharp transition in the limit N,T→ ∞, with N/T fixed also for coherent risk measures other than ML or ES, then their critical N/T ratio is smaller or equal to ½, for elliptical distributions again.
Further generalization • As a matter of fact, this type of instability appears even beyond the set of coherent risk measures, and may appear in downside risk measures in general. • By far the most widely used risk measure today is Value at Risk (VaR). It is a downside measure. It is not convex, therefore the stability problem of its historical estimator is ill-posed. • Parametric VaR, however, is convex, and this allows us to study the stability problem. Along with VaR, we also look into the closely related parametric estimate for ES. • Parametric estimates are expected to be more stable than historical ones. We will then be able to compare the phase diagrams for the historical and parametric ES.
Phase diagram for parametric VaR, parametric and historical ES
In the region above the respective phase boundaries the optimization problem does not have a solution. • In the region below the phase boundary there is a solution, but for it to be a good approximation to the true risk we must go deep into the feasible region. If we go to the phase boundary from below, the estimation error diverges. • The phase boundary for ES runs above that of VaR, so for a given confidence level α the critical ratio for ES is larger than for VaR (we need less data in order to have a solution). For practically important values of α (95-99%) the difference is not significant.
Parametric vs. historical estimates • The parametric ES curve runs above the historical one: we need less data to have a solution when the risk is estimated parametrically than when we use raw historical data. It seems as if we had some additional information in the parametric approach. • Where does this information come from? • It is injected into the calculation by fitting the data to an independently chosen probability distribution.
Adding linear constraints In practice, portfolio optimization is always subject to some constraints on the allowed range of the weights, such as a ban on short selling and/or limits on various assets, industrial sectors, regions, etc. These constraints restrict the region over which the optimum is sought to a finite volume where no infinite fluctuations can appear. One might then think that under such constraints the instability discussed above disappears completely.
This is not so. If we work in the vicinity of the phase boundary, sample to sample fluctuations in the weights will still be large, but the constraints will prevent the solution from running away to infinity. Instead, it will stick to the „walls” of the allowed region. • For example, for a ban on short selling (wi > 0) these walls will be the coordinate planes, and as N/T increases more and more of the weights will become zero. This phenomenon is well known in portfolio optimization. (B. Scherer, R. D. Martin, Introduction to Modern Portflio Optimization with NUOPT and S-PLUS,Springer, New York (2005))
This spontaneous reduction of diversification is entirely due to estimation error and does not reflect any real structure of the objective function. • In addition, for the next sample a completely different set of weights will become zero – the solution keeps jumping about on the walls of the allowed region. • Clearly, in this situation the solution reflects the structure of the limit system, rather than that of the market. Therefore, whenever we are working in or close to the unstable region (which is almost always), the constraints only mask rather than cure the instability.
The promise of machine learning • Complex optimization, classification, etc. problems suffer from high dimensionality and data parsimony in the same way as portfolio selection does. • Modern statistics (statistical learning theory) has developed powerful regularization methods to deal with this difficulty. • Application of these methods in a portfolio context offers a well-established, systematic approach : • S. Still and I. Kondor: Regularizing Portfolio Optimization, http://lanl.arxiv.org/abs/0911.1694, to appear in New Journal of Physics
Closing remarks Given the nature of the portfolio optimization task, one will typically work in that region of parameter space where sample fluctuations are large. Since the critical point where these fluctuations diverge depends on the risk measure, the confidence level, and on the method of estimation, one must be aware of how close one’s working point is to the critical boundary, otherwise one will be grossly misled by the unstable algorithm.
The merit of downside risk measures is that they ignore positive fluctuations that are not supposed to scare investors. We find it most remarkable that the very same downside risk measures display the instability described here which is basically due to a false arbitrage alert and may induce an investor to take very large positions on the basis of fragile information stemming from finite samples. In a way, the global financial crisis was a macroscopic example of such a folly.