310 likes | 326 Views
Learn how to optimize portfolio selection using scenario generation for asset allocation problem. Understand mean-risk models, the quality of scenario generators, hidden Markov models, and more.
E N D
SP XI Vienna August28, 2007 Scenario Generation for the Asset Allocation Problem Diana Roman Gautam Mitra
The asset allocation problem • An amount of money to invest • N stocks with known current prices S10,…,SN0 • Decision to take: how much to invest in each asset • Goal: to get a profit as high as possible after a certain time T • The stock prices (returns) at time T are not known: random variables (stochastic processes) • xi=fraction of wealth invested in asset i portfolio (x1,…,xn) • Ri=the return of asset i at time T • The portfolio return at time T: Rx=x1R1+…+xNRN (also r.v.!) How to choose between portfolios? A modelling issue!
Mean-risk models for portfolio selection • Mean – risk models: maximize expected value, minimise risk • Risk: Conditional Value-at-Risk (CVaR) = the expected value of losses in a prespecified number of worst cases. Confidence level =0.01 consider the worst 1% of cases The optimisation problem: Max (E(Rx) ,- CVaR(Rx)) over x1,…,xn (1) Min CVaR(Rx) over x1,…,xn S.t.: E(Rx)d …………
Scenario Generation • The (continuous) distribution of stock returns: approximated by a discrete multivariate distribution with a limited number of outcomes, so that (1) can be solved numerically: scenario generation. • scenario set (single-period case) or a scenario tree (multi-period case).
Scenario Generation • S Scenarios: • pi=probability of scenario i occurring; • rij=the return of asset j under scenario i; • The (continuous) distribution of (R1,…,RN) is replaced with a discrete one
The mean-CVaR model • Scenarios a LP (Rockafellar and Uryasev 2000) Min Subject to: rij= the scenarios for assets’s returns We only solve an approximation of the original problem; The quality of the solution obtained is directly linked to the quality of the scenario generator (“garbage in, garbage out”).
The quality of scenario generators • The goal of scenario models is to get a good approximation of the “true” optimal value and of the “true” optimal solutions of the original problem (NOT necessarily a good approximation of the distributions involved, NOT good point predictions). • Difficult to test • There are several conditions required for a SG
The quality of scenario generators In-sample stability: different runs of a scenario generator should give about the same results. If we generate several scenario sets (or scenario trees) with the same number of scenarios and solve the approximation LP with these discretisations, we should get about the same optimal value. (not necessarily the same optimal solutions: the objective function in a SP can be “flat”, i.e. different solutions giving similar objective values)
The quality of scenario generators • Out-of-sample stability: • Generate scenario sets of the same size • Solve the optimisation problem on each different optimal solutions • These solutions are evaluated on the “true” distributions “true” objective values • The true objective values should be similar • In practice: use a very large scenario set generated with an exogenuous SG method as the “true” distribution
The quality of scenario generators • -Out-of-sample stability: the important one • No (simple) relation between in-sample and out-of-sample stability
Hidden Markov Models applied in various fields, e.g. speech recognition still experimental for financial scenario generation Motivation: financial time series are not stationary; unexpected jumps, changing behaviour
Hidden Markov Models Real world processes produce observable outputs – a sequence of historical prices, returns… • A set of N distinct states: S1,…,SN • System changes state at equally spaced discrete times: t=1,2,… • Each state produces outputs according to its “output distribution” (different states ->different parameters) • The “true” state of the system at a certain time point is “hidden”: only observe the output.
Hidden Markov Models Assumptions: • First order Markov chain:at any time point, system’s state depends only on the previous state and not the whole history: P(qt=Si | qt-1=Sj, qt-2=Sk,….)= P(qt=Si | qt-1=Sj) with qt=system’s state at time t • Time independence: aij=probability of changing from state i to state j: the same at any time t. • Output-independence assumption: the output generated at a time t depends solely on the system’s state at time t (not on the previous outputs)
Hidden Markov Models The output distributions: mixtures of normal distributions M mixtures: =the normal density function with mean vector jand covariance matrixj Mixtures of normal density functions can approximate any finite continuous density function.
Hidden Markov Models c11,…,c1M 11 ,…,1M 11,…,1M a11 1 1 a13 a31 a12 c31,…,c3M 31 ,…,3M 31,…,3M a21 a32 3 2 3 a23 2 c21,…,c2M 21 ,…,2M 21,…,2M N=3 M mixtures
Hidden Markov Models The parameters of a HMM: • Number of states N • Number of mixtures M • Initial probabilities of states: 1,…, N • Transition probabilities: A=(aij), i,j=1…N • For each state i, parameters of the output distributions: • Mixture coefficients ci1,…,ciM • Mean vectors i1,..., iM • Covariance matrices i1,…, i1.
Training Hidden Markov Models • Historical data: O=(O1,…,OT)=(rtj, t=1…T,j=1…N) is used to “train” the HMM. • Meaning: Find the parameters • =(, A, C, , ) s.t. P(O|) maximised • Cannot be solved analytically and no best way to find • Iterative procedures (e.g. EM, Baum-Welch) can be used to find a local maximum. • Parameters N and M are supposed to be known!
Training HMM’s • Start with some initial parameters 0 ; compute P(O|0) • Re-estimate parameters 1 ; compute P(O|1) P(O|0) • Obtain sequence 0, 1, 2… with P(O|i) P(O|i-1) • (P(O|i))i converges towards a local maximum • Limited knowledge about the convergence speed • Observed sharp increase in the first few iterations, then relatively little improvement • Practically: stop when P(O|i)- P(O|i-1) is small enough • Use final i for generation of scenarios
Training HMMs: initial parameters • How choose 0? • Not important for i and aij (could be 1/N or random) • Very important for C, and – but no”best” way to estimate them • k-means clustering algorithm: separate historical data into M clusters • starting parameters: Based on the mean vectors and covariance matrices of the clusters
Training HMM: parameters re-estimation Use Baum-Welch algorithm (EM): Need to calculate additional quantities: Forward probabilities: time t, state i Calculated recursively after time Backward probabilities: time t, state i In calculus: the multi-variate normal density:
Training HMM: parameters re-estimation Additional quantities: Probability of the historical observation to be generated by the current model:
HMM: estimation of the current state • The state of the system at the current time? • Via Viterbi algorithm • Given an observation sequence O=(O1,…,OT) and a model , find an “optimal” state sequence Q=(q1,…,qT) • i.e., that best “explains” the observations: maximises P(Q|O, )
HMM for scenario generation Historical data: estimation of Generation of scenarios …. …. t=1 t=2 t=3 t=T t=T+1 t=T+2 t=T+TP Estimation of the system’s state at time T • A scenario: a path of returns for times T+1,…,T+TP • Estimate the current state (time T); say, qt=Si • { • Transit to a next state Sj according to transition probabilities aij • Generate a return conform to the distribution of state j • }
HMM – implementation issues • Number of states? Still a very much unsolved problem. • The observation distributions for each state? • The initial estimates of the model’s parameters • Computational issues: lots!! • For large number of assets, large covariance matrices (at every step of re-estimation: determinants, inverses); • The quantities calculated recursively get smaller and smaller • Or the opposite: get larger and larger
Computational results • Historical Dataset: • 5 stocks from FTSE 100 • 132 monthly returns: Jan 1993-Dec 2003 • Generate scenario returns for 1 month ahead • 500, 700, 1000, 2000, 3000 scenarios
Computational results • For each scenario size: • Run 30 times generate 30 different discretisationsfor the assets’ returns (R1,…,RN) • Solve mean-CVaR model with these discretisations get 30 solutions x1,…,x30 • Similar solutions as scenario size increases: (x2=x3=0, x5>=50%) • Evaluate these solutions on the “true” distribution?
Computational results • Out-of-sample stability: • The “true” distribution: generated with Geometric Brownian motion, 30.000 scenarios • Each of the 30 solutions was evaluated on this distribution 30 “true” objective values (=portfolio CVaRs)
Computational results • Geometric Brownian motion (GBM) • The standard in finance for modelling stock prices • Stock prices are approximated by continuous time stochastic processes (accepted by practitioners…) • S0: the current price • : the expected rate of return • : the standard deviation of rate of return • {Wt}: Wiener process - the “noise” in the asset’s price.
Computational results Statistics for the series of “true” objective functions • Quality of solutions improve with larger scenario sets (as expected!) • Reasonably small spread; pretty similar objective values
Conclusions and final remarks • For the mean-CVaR model: SG that can capture extreme price movements • Stability is a necessary condition for a “good” SG • HMM is a discrete-time model; experimental for financial SG • Motivated by non-stationarity of financial time series • Two stochastic processes: one of them describes the “state of the system” • Implementation problems, especially when the number of assets is large • An initial “good” estimate for HMM parameters is essential • The number of states: supposed to be known in advance • Good results regarding out-of-sample stability • The “true” distribution when testing out-of-sample stability: with GBM - standard in finance.