70 likes | 142 Views
Realistic Uncertainty Bounds for Complex Dynamic Models Andrew Packard, Michael Frenklach CTS-0113985 April 2005.
E N D
Realistic Uncertainty Bounds for Complex Dynamic Models Andrew Packard, Michael Frenklach CTS-0113985 April 2005 • Developed a formalism involving assertions expressed as polynomial inequalities on a parameter space. Use global optimization methods, developed in control systems analysis, with origins in algebraic geometry. Novel re-analysis of GRI Data Set. • Reasoning on Collections of assertions • test for consistency • inconsistency falsifies at least one • sensitivity of consistency to data • which are likely false? • infer additional implications from assertions • sensitivity of inferred conclusions to data • which assertions have the most impact? Our research focuses on the benefits of treating models/data pairs as assertions, that can be shared and reasoned with using automated algorithms. Message: Use collaboration through model/data sharing and automated reasoning to extract the totality of information in the community data sets. Frenklach, Packard, Seiler and Feeley, “Collaborative data processing in developing predictive models of complex reaction systems,” International Journal of Chemical Kinetics, vol. 36, issue 1, pp. 57-66, 2004. Frenklach, Packard and Seiler, “Prediction uncertainty from models and data,” 2002 American Control Conference, pp. 4135-4140, Anchorage, Alaska, May 8-10, 2002. Seiler, Frenklach, Packard and Feeley, “Numerical approaches for collaborative data processing,” to appear Optimization and Engineering, Kluwer, 2005. Feeley, Seiler, Packard and Frenklach, “Consistency of a reaction data set,” Journal of Physical Chemistry A, vol. 108, pp. 9573-9583, 2004. Project website: http://jagger.me.berkeley.edu/~pack/nsfuncertainty
Chemical Kinetics Modeling Chemical kinetics modeling is a form of • high dimensional (mechanisms are complex), • distributed (efforts of many) system identification. The effort of researchers yields complex, intertwined, factual assertions about the possible values of the model parameters • Handbook style of {parameter, nominal, range, reference} will not work • Each individual assertion is usually not illuminating in the problem’s natural coordinates. Concise individual conclusions are rare. • Information-rich, “anonymous” collaboration is necessary • Machines must do the heavy lifting. • Managing lists of assertions • Reasoning and inference
Separate asserted facts from analysis • Two types of assertions: models and observed behavior • (Web-based) assertion of models of physical processes (e.g., “if we knew the parameter values, this parametrized mathematics would accurately model the process”) • (Web-based) assertion of measured outcomes of physical processes (e.g., “I performed expt, and the process behaved as follows…”) Together, these form constraints in "world"-parameter space of physical constants. Parameters which satisfy all are feasible (or unfalsified). • Analysis (global optimization) on the assertions • Check consistency of a collection of assertions • Sensitivity of consistency to changes in a single assertion • Discover highly informative (or highly suspect) assertions • Explore the information implied by the assertions • Determine possible range of different scalar functions on the feasible set. • (old standby) Generate parameter samples from the feasible set. We’ve taken this perspective, and re-analyzed the GRI-Mech data set. The results are very encouraging.
M2(r) Chemistry(r) Transport 2 M77(r) Chemistry(r) Transport 77 GRI DataSet • The GRI-Mech (www.me.berkeley.edu/gri_mech) DataSet is collection of 77 experimental reports, consisting of models and ``raw'' measurement data, compiled/arranged towards obtaining a complete mechanism for CH4 + 2O2→ 2H2O + CO2 capable of accurately predicting pollutant formation. The DataSet consists of: • Reaction model: 53 chemical species, 325 reactions (nonlinear). • Unknown parameters (): 102 parameters, essentially the various rate constants. • Prior Information: Each normalized parameter is known to lie between -1 and 1. • Processes (Pi): 77 widely trusted, high-quality laboratory experiments, all involving methane combustion, but under different • physical manifestations, and different conditions. • Process Models (Mi): 77 1-d and 2-d numerical PDE models, coupled with the common reaction model. • Measured Data (di,ui) data and measurement uncertainty from77 peer-reviewed papers reporting above experiments. M1(r) d1 u1 Chemistry(r) Transport 1 ProcessP1 300+ Reactions, 50+ Species CH4 + 2O2 ↓ 2H2O + CO2 100+ unknown parameters each has -1≤ρk≤1 Process P77 Process P2 d2 u2 d77 u77 The prior information, models and measured data constitute assertions about possible parameter values. • kthassertion associated with prior info: • Assertions associated with ith dataset unit:
Manual management of uncertainty propagation Manual (paper/email) mode would require an efficient uncertainty description (linear in number of model parameters, say). Eg., use “handbook” type description: • parameter values • plus/minus uncertainty Equivalent to requiring a coordinate-aligned cube to contain feasible set. • Very ineffective in extracting the predictive capability of GRI data: ie., using assertions to predict the outcome (a range) of another model • (M1) Use 76 assertions to reduce the parameter uncertainty to a cube (as above), then do prediction of 77th model’s outcome on this cube • (M2) Use 76 assertions directly to predict the range of the 77th model’s outcome • “Loss” value: L=1 means M1 is no better than just using the prior info; L=0 means M1 is as effective as M2
Parameter # Parameter # Experiment # Consistency results for GRI-DataSet assertions Collection of 77 assertions is consistent. Nevertheless, a quantitative consistency measure was found to be very sensitive (using multipliers from the dual form) to 2 particular experimental assertions, but not to the prior info. Experiment # The scientists involved rechecked calculations, and concluded that reporting errors had been made. Both reports were updated -- one measurement value increased, one decreased -- exactly what the consistency analysis had suggested. Sensitivity of the consistency measure to individual assertions is greatly reduced, and spread more evenly across data set.
How are we computing? • Transforming real models to polynomial models • Large-scale computer “experimentation” on M(r). • Random sampling and sensitivity calculations to determine active parameters • Factorial design-of-experiments on active parameter cube • Linear, Quadratic or Polynomial (stay in Sum-of-Squares hierarchy) fit • Assess the residuals, account for fit error in assertion • Assertions become polynomial inequality constraints • Most analysis is optimization subject to these constraints • S-procedure, sum-of-squares (scalable emptiness proofs, outer bounds) • Outer bounds are also interpreted as solutions to the original problem when cost is an expected value, constraints are only satisfied on average, and the decision variable is a random variable. • Branch & Bound (or increase order) to eliminate ambiguity due to fit errors • Off-the-shelf constrained nonlinear optimization for inner bounds • Use stochastic interpretation of outer bounds to aid search