Engineering subprogramme, 7 November 2006

Engineering subprogramme, 7 November 2006 Tony O’Hagan

Outline Three parts: • Turbofan engine vibration model • Reification • Predictors and validation

Part 1: The new model

Turbofan vibration model • Rolls-Royce, Derby, UK • Maker of civil aeroplane engines • Simulator of a fan assembly • Our example has 24 blades • Primary concern is with vibration • If amplitude is too high on any one blade it may break • In effect this will destroy the engine Rolls-Royce Trent 500 engine

Model details • 24 inputs are vibration resonant frequency of each blade • 24 outputs are amplitude of vibration for each blade • Other factors • Amount of damping – more results in more complex behaviour and longer model run times • Model resolution – it’s possible to run the solver on higher or lower resolution grids • Could also vary e.g. number of blades, operating rpm and temperature

Parameter uncertainty • It’s not possible to manufacture and assemble blades to be all identical and perfectly oriented • Variation in resonant frequencies of blades creates complex variations in their vibration amplitude • Uncertainty distribution on each model input is the distribution achieved within manufacturing tolerances • Question:Given an assembly of bladessampled from this distribution,what is the risk of high amplitudevibrations resulting?

Emulation • Strategy: • Emulate single output = blade 1 amplitude • 24 inputs = frequencies of blades 1 to 24 • Because of rotational symmetry, each model run gives up to 24 design points • Simulate random blade assemblies • Results • Output depends most strongly on blade 1 input • Also on neighbouring inputs, 2 and 24, etc • But high-order dependencies on all inputs • So far we’ve failed to emulate accurately even with very many design points

Challenges • What’s going on here? • Can we find a way to achieve the original strategy? • Should we try instead to emulate max amplitude? • This may also be badly behaved!

Part 2: Reification

Reification – background • Kennedy & O’Hagan (2001), “Bayesian calibration of computer models” • KO’H henceforth • Goldstein & Rougier (2006), “Reified Bayesian modelling and inference for physical systems” • GR henceforth • GR discuss two problems with KO’H • Meaning of calibration parameters is unclear • Assuming stationary model discrepancy, independent of code, is inconsistent if better models are possible • Reification is their solution

Meaning of calibration parameters • The model is wrong • We need prior distributions for calibration parameters • Some may just be tuning parameters with no physical meaning • How can we assign priors to these? • Even for those that have physical meanings, the model may fit observational data better with wrong values • What does a prior mean for a parameter in a wrong model?

Example: some kind of machine • Simulator says output is proportional to input • Energy in gives work out • Proportionality parameter has physical meaning • Observations with error • Without model discrepancy, this is a simple linear model • LS estimate of slope is 0.568 • But true parameter valueis 0.65

Model discrepancy • Red line is LS fit • Black line is simulator with true parameter 0.65 • Model is wrong • In reality there are energy losses

Case 1 • Suppose we have • No model discrepancy term • Weak prior on slope • Then we’ll get • Calibration close to LS value, 0.568 • Quite good predictive performance in [0, 2+] • Poor estimation of physical parameter

Case 2 • Suppose we have • No model discrepancy term • Informative prior on slope based on knowledge of physical parameter • Centred around 0.65 • Then we’ll get • Calibration between LS and prior values • Not so good predictive performance • Poor estimation of physical parameter

Without model discrepancy • Calibration is just nonlinear regression • y = f(x, θ) + e • Where f is the computer code • Quite good predictive performance can be achieved if there is a θ for which the model gets close to reality • Prior information based on physical meaning of θ can be misleading • Poor calibration • Poor prediction

Case 3 • Suppose we have • GP model KO’H discrepancy term with constant mean • Weak prior on mean • Weak prior on slope • Then we’ll get • Calibration close to LS value for regression with non-zero intercept • The GP takes the intercept • Slope estimate is now even further from the true physical parameter value, 0.518, albeit more uncertain • Discrepancy estimate ‘corrects’ generally upwards

Case 4 • Suppose we have • GP model KO’H discrepancy term with constant mean • Weak prior on mean • Informative prior on slope based on knowledge of physical parameter • Centred around 0.65 • Then we’ll get • Something like linear regression with informative prior on the slope • Slope estimate is a compromise and loses physical meaning • Predictive accuracy weakened

Adding simple discrepancy • Although the GP discrepancy of KO’H is in principle flexible and nonparametric, it still fits primarily on its mean function • Prediction looks like the result of fitting the regression model with nonlinear f plus the discrepancy mean • This process does not give physical meaning to the calibrated parameters • Even with informative priors • The augmented regression model is also wrong

Reification • GR introduce a new entity, the ‘reified’ model • To reify is to attribute the status of reality • Thus, a reified simulator is one that we can treat as real, and in which the calibration parameters should take their physical values • Hence prior distributions on them can be meaningfully specified and should not distort the analysis • GR’s reified model is a kind of thought experiment • It is conceptually a model that corrects such (scientific and computational) deficiencies as we can identify in f

The GR reified model is not regarded as perfect • It still has simple additive model discrepancy as in KO’H • The discrepancy in the model is now made up of two parts • Difference between f and the reified model • For which there is substantive prior information • Discrepancy of the reified model • Independent of both models

Reification doubts • Can the reified model’s parameters be regarded as having physical meaning? • Allowing for model discrepancy between the reified model and reality makes this questionable • Do we need the reified model? • Broadly speaking, the decomposition of the original model’s discrepancy is sensible • But it amounts to no more than thinking carefully about model discrepancy and modelling it as informatively as possible

Case 5 • Suppose we have • GP model discrepancy term with mean function that reflects the acknowledged deficiency of the model in ignoring losses to friction • Informative prior on slope based on knowledge of physical parameter • Then we’ll get • Something more like the original intention of bringing in the model discrepancy! • Slope parameter not too distorted, model correction having physical meaning, good predictive performance

Moral • There is no substitute for thinking • Model discrepancy should be modelled as informatively as possible • Inevitably, though, the discrepancy function will to a greater or lesser extent correct for unpredicted deficiencies • Then the physical interpretations of calibration parameters can be compromised • If this is not recognised in their priors, those priors can distort the analysis

Final comments • There is much more in GR than I have dealt with here • Definitely repays careful reading • E.g. relationships between different simulators of the same reality • Their paper will appear in JSPI with discussion • This presentation is a pilot for my discussion!

Part 3: Validation

Simulators, emulators, predictors • A simulator is a model, representing some real world process • An emulator is a statistical description of a simulator • Not just a fast surrogate • Full probabilistic specification of beliefs • A predictor is a statistical description of reality • Full probabilistic specification of beliefs • Emulator + representation of relationship between simulator and reality

Validation • What can be meaningfully called validation? • Validation should have the sense of demonstrating that something is right • The simulator is inevitably wrong • There is no meaningful sense in which we can validate it • What about the emulator? • It makes statements like, “We give probability 0.9 to the output f(x) lying in the range [a, b] if the model is run with inputs x.” • This can be right in the sense that (at least) 90% of such intervals turn out to contain the true output

Validating the emulator • Strictly, we can’t demonstrate that the emulator actually is valid in that sense • The best we can do is to check that the truth on a number of new runs lies appropriately within probability bounds • And apply as many such checks as we feel we need to give reasonable confidence in the emulator’s validity • In practice, check it against as many (well-chosen) new runs as possible • Do Q-Q plots of standardised residuals and other diagnostic checks

Validating a predictor • The predictor is also a stochastic entity • We can validate it in the same way • Although getting enough observations of reality may be difficult • We may have to settle for the predictor not being yet shown to be invalid!

Validity, quality, adequacy • So, a predictor/emulator is valid if the truth lies appropriately within probability bounds • Could be conservative • Need severe testing tools for verification • The quality of a predictor is determined by how tight those bounds are • Refinement versus calibration • A predictor is adequate for purpose if the bounds are tight enough • If we are satisfied the predictor is valid over the relevant range we can determine adequacy

Conclusion – terminology • I would like to introduce the word ‘predictor’, alongside the already accepted ‘emulator’ and ‘simulator’ • I would like the word ‘validate’ to be used in the sense I have done above • Not in the sense that Bayarri, Berger, et al have applied it, which has more to do with fitness for purpose • And hence involves not just validity but quality • Models can have many purposes, but validity can be assessed independently of purpose

Engineering subprogramme, 7 November 2006