180 likes | 422 Views
How to Emulate: Recipes without Patronising. The MUCM Toolkit Dan Cornford, Aston University. Overview. What and why is the toolkit? How is it delivered Current toolkit contents A (slightly contrived) tour through parts of the toolkit What is the future of the toolkit?
E N D
How to Emulate: Recipes without Patronising The MUCM Toolkit Dan Cornford, Aston University
Overview • What and why is the toolkit? • How is it delivered • Current toolkit contents • A (slightly contrived) tour through parts of the toolkit • What is the future of the toolkit? • What would you like to see in the toolkit?
What is the toolkit? A series of linked (web) pages: • Threads follow the derivation of major idea as a series of linked pages • Core threads cover main areas, variants cover specialisations • Procedures describe an operation or algorithm • provide sufficient information to allow the implementation of the operation • Discussions cover issues that may arise • during the implementation of a method, or other optional details • Alternatives present available options • when building a specific part of an emulator (e.g. choosing a covariance function) and provide some guidance for making the selection • Examples present how to use the techniques in practice • Definitions of a term or a concept • Meta any page that does not fall in one of the above categories • usually pages about the Toolkit itself
What are the main threads? • ThreadCoreGP - the core model, dealt with by fully Bayesian, Gaussian Process, emulation • ThreadCoreBL - the core model, dealt with by Bayes Linear emulation And to come … • ThreadVariantMultipleOutputs - variant of the core model in which we emulate more than one output of a simulator • ThreadGenericMultipleEmulators – dealing with multiple outputs from more than one emulator • ThreadVariantMultipleSimulators - variant of the core model: emulating outputs from more than one related simulator • ThreadVariantDynamic - a special case of multiple outputs as timeseries • ThreadVariantStochastic - variant of the core model in which the simulator output is random • ThreadVariantDerivatives - variant of the core model in which we also model derivatives of outputs
Do I have to read it linearly? • Pages can be accessed individually or as part of a thread. • We will add cross-cutting threads, e.g. on design for computer models
How are we creating it? • The toolkit is built using a wiki • All the MUCM team contributes • Tony O’Hagan is the editor in chief, Yiannis Andrianakis is managing the overall technology • We release sections of the toolkit as they become mature to a web site • This allows us control over the quality of the content • We plan further enhancement to the presentation • More graphical presentation of the structure • Ability for users to add comments to pages
How to use the toolkit • I’ll use a scenario to motivate this. • A chemical engineer is working on an azoisopropane chemical process simulation. • The process involves two key chemicals, which react to produce 39 main chemicals, with 42 reactions possible. • Thus the simulator has 39+2*42 = 123 inputs. • For now the chemist is mainly interested in a single output, the main target azoisopropane concentration, 1 output! • I want to show how the toolkit can help here!
What does the chemist want to know? • There are many chemical reactions, but which are the most important for determining the output variation? • This is in essence a sensitivity analysis. • Not all the reaction rates and activation energies are perfectly known – many are not directly observable • Initial concentrations can be controlled • ThreadCoreGP is relevant here.
Exploratory analysis, prior judgements • The chemist expects only a few reactions to be important, and wants to know which these are • At present they use local estimates based on simulator Jacobians • The model is not too complex – typical evaluation takes a few tens of seconds, depending on target time • It is likely that reaction rate parameters within the model could lie in the range 0.5x to 2.0x where x is the specified value
ThreadCoreGP: how to emulate • ThreadCoreGP discusses all the issues that need to be tackled when undertaking emulation in the situation: • We are only concerned with one simulator • The simulator only produces one output • The output is deterministic • We do not have observations of the real world • We don’t make statements about the real world process • We cannot directly observe derivatives of the simulator • We’ll explore how we can use ThreadCoreGP
What is in ThreadCoreGP? • Definition of what a Gaussian process is • Discussion of the implications of using a Gaussian process • Alternatives to the ‘full Bayesian’ approach – Bayes Linear methods • Provides technical information and discusses alternatives for: • determining active inputs • mean functions and covariance functions • choice of prior distributions • experimental design of simulator runs • fitting the emulator • using the emulator • prediction, uncertainty analysis and sensitivity analysis
DiscGaussianAssumption – what is in there? • This discusses issues to do with representing beliefs about the simulator in terms of a Gaussian process • Why we use a Gaussian process • computation and simplicity; other approaches could be entertained • When a Gaussian process might be inappropriate • outputs constrained in a range (but not practically important if we have a good emulator) • What to do if Gaussian process is not appropriate • main solution is use transformations e.g. log • Also mentions Bayes Linear methods
AltMeanFunction – what is in there? • Discussion of the alternatives for the mean function: • mean function should be chosen to represent ‘the general shape of how the analyst expects the simulator output to respond to changes in the inputs’ • Typically a linear in parameters regression, with a prior over the parameters – AltGPPriors • Other forms possible but there is a price
AltCorrelationFunction – what is in there? • Discussion of the alternatives for choosing the covariance function • Gaussian (squared exponential), generalised Gaussian, Matern • Role of nuggets • Implications of choices • Other possible choices
OK time for you to take over • Rather than presenting this I now want to get you to do some work • I like volunteers to try and use the toolkit – let’s talk about your simulation problems as see if the toolkit has the answers • What problems made you sign up for today • I’ll try and find the answers in the toolkit or the experts
Toolkit development – the future • The toolkit is continually developing • By the end of MUCM there will be a complete description of most aspects of building and using emulators • MUCM2 will add more content, particularly accessible introductions and more examples • Have we missed something? • Please tell us! • Future releases should allow easy commenting
Summary • The toolkit will distil the combined knowledge of the MUCM team (and beyond) • We intend it to become the ‘emulationWikipedia’: • An accessible, free community resource which will outlive the project • We are releasing it in parts, and will continue to improve it within MUCM2