Gaussian process emulation of multiple outputs

Gaussian process emulation of multiple outputs Tony O’Hagan, MUCM, Sheffield

Outline • Gaussian process emulators • Simulators and emulators • GP modelling • Multiple outputs • Covariance functions • Independent emulators • Transformations to independence • Convolution • Outputs as extra dimension(s) • The multi-output (separable) emulator • The dynamic emulator • Which works best? • An example

Simulators and emulators • A simulator is a model of a real process • Typically implemented as a computer code • Think of it as a function taking inputs x and giving outputs y • y = f(x) • An emulator is a statistical representation of the function • Expressing knowledge/beliefs about what the output will be at any given input(s) • Built using prior information and a training set of model runs • The GP emulator expresses f as a GP • Conditional on hyperparameters

GP modelling • Mean function • Regression form h(x)Tβ • Used to model broad shape of response • Analogous to universal kriging • Covariance function • Stationary • Often use the Gaussian formσ2exp{-(x-x′) TD-2(x-x′)} • D is diagonal with correlation lengths on diagonal • Hyperparameters β, σ2 and D • Uninformative priors

The emulator • Then the emulator is the posterior distribution of f • After integrating out β and σ2, we have a t process conditional on D • Mean function made up of fitted regression hTβ* plus smooth interpolator of residuals • Covariance function conditioned on training data • Reproduces training data exactly • Important to validate • Using a validation sample of additional runs • Check that emulator predicts these runs to within stated accuracy • No more and no less • Bastos and O’Hagan paper on MUCM website

Multiple outputs • Now y is a vector, f is a vector function • Training sample • Single training sample for all outputs • Probably design for one output works for many • Mean function • Modelling essentially as before, hi(x)Tβifor output i • Probably more important now • Covariance function • Much more complex because of correlations between outputs • Ignoring these can lead to poor emulation of derived outputs

Covariance function • Let fi(x) be i-th output • Covariance function • c((i,x), (j,x′)) = cov[fi(x), fj(x′)] • Must be positive definite • Space of possible functions does not seem to be well explored • Two special cases • Independence: c((i,x), (j,x′)) = 0 if i≠ j • No correlation between outputs • Separability: c((i,x), (j,x′)) = σijcx(x, x′) • Covariance matrix Σ between outputs, correlation cx between inputs • Same correlation function cx for all outputs

Independence • Strong assumption, but ... • If posterior variances are all small, correlations may not matter • How to achieve this? • Good mean functions and/or • Large training sample • May not be possible in practice, but ... • Consider transformation to achieve independence • Only linear transformations considered as far as I’m aware • z(x) = A y(x) • y(x) = B z(x) • c((i,x), (j,x′)) is linear mixture of functions for each z

Transformations to independence • Principal components • Fit and subtract mean functions (using same h) for each y • Construct sample covariance matrix of residuals • Find principal components A (or other diagonalising transform) • Transform and fit separate emulators to each z • Dimension reduction • Don’t emulate all z • Treat unemulated components as noise • Linear model of coregionalisation (LMC) • Fit B (which need not be square) and hyperparameters of each z simultaneously

Convolution • Instead of transforming outputs for each x separately, consider • y(x) = ∫ k(x,x*) z(x*) dx* • Kernel k • Homogeneous case k(x-x*) • General case can model non-stationary y • But much more complex

Outputs as extra dimension(s) • Outputs often correspond to points in some space • Time series outputs • Outputs on a spatial or spatio-temporal grid • Add coordinates of the output space as inputs • If output i has coordinates t then write fi(x) = f*(x,t) • Emulate f* as single output simulator • In principle, places no restriction on covariance function • In practice, for single emulator we use restrictive covariance functions • Almost always assume separability -> separable y • Standard functions like Gaussian correlation may not be sensible in t space

The multi-output emulator • Assume separability • Allow general Σ • Use same regression basis h(x) for all outputs • Computationally simple • Joint distribution of points on multivariate GP have matrix normal form • Can integrate out β and Σ analytically

The dynamic emulator • Many simulators produce time series output by iterating • Output yt is function of state vector st at time t • Exogenous forcing inputs ut, fixed inputs (parameters) p • Single time-step simulator f* • st+1 = f*(st, ut+1 , p) • Emulate f* • Correlation structure in time faithfully modelled • Need to emulate accurately • Not much happening in single time step but need to capture fine detail • Iteration of emulator not straightforward! • State vector may be very high-dimensional

Which to use? • Big open question! • This workshop will hopefully give us lots of food for thought • MUCM toolkit v3 scheduled to cover these issues • All methods impose restrictions on covariance function • In practice if not in theory • Which restrictions can we get away with in practice? • Dimension reduction is often important • Outputs on grids can be very high dimensional • Principal components-type transformations • Outputs as extra input(s) • Dynamic emulation • Dynamics often driven by forcing

Example • Conti and O’Hagan paper • On my website: http://tonyohagan.co.uk/pub.html • Time series output from Sheffield Global Dynamic Vegetation Model (SDGVM) • Dynamic model on monthly timestep • Large state vector, forced by rainfall, temperature, sunlight • 10 inputs • All others, including forcing, fixed • 120 outputs • Monthly values of NBP for ten years

Multi-output emulator on left, outputs as input on right For fixed forcing, both seem to capture dynamics well Outputs as input performs less well, due to more restrictive/unrealistic time series structure

Conclusions • Draw your own!

Gaussian process emulation of multiple outputs