Advancing Uncertainty Quantification: Challenges and Solutions in Machine Learning and Physics

Next Steps • Group by theme • Identify grand challenges where progress in theme is needed • Identify gaps to achieving • Downselect to ~4 topic areas, then break into groups to polish • (DL) UQ related to other topics (DOE, V&V, information, decision), many domains (~6) • (JN) Interpretability and Explainability (better quantify, methods/algs to optimize metrics) • And Causality, and V&V (~8) • (BC) Information/Data (value of information over time for UQ and decisions) and DOE (relates to RL/adaptive design/extrapolation) (~6) • Optimization • Prepare outbrief slides

UQ • Precision and accuracy; understand tradeoffs for decisions with high consequences; multiple methods/approaches when methods have different benefits/precision (global/local/multimodal and speed fast/slow) • what does UQ mean in modeling? One way: bayesian posterior; another way: what are the 5 most plausible interpretations • 2 broad flavors of DL; DL approximates a function; also used for example in variational approach (learning a distribution) kernel methods? Are we making inference in euclidean space or distribution space? • UQ on machine learning, or use machine learning for UQ (GPs); affects interpretability; climate example; discussion of density estimates (accuracy of prediction intervals? “Too aggressive”) • Bayesian approach to UQ - get effects on prediction from prior (advantages/disadvantages?) • What do we lose when we choose the wrong class of models at the outset? • Fund physics: look at many facilities/quantify resolution functions; Using ML for UQ, what else do you get by bringing in resolution functions?; not using uncertainty in transport codes appropriately, resulting in bias? Does uncertainty propagate? How strong of a prior should your surrogate model be? [BNNs are one approach to help here] • “Surrogate model(s)”; have an ensemble/complex systems; each model has different distributions; how do we do model selection among families of models • Ensemble models: interp/extrap; drives UQ; disagreement among ensemble -> uncertainty; • Gap between optimal model from DL vs what a domain expert wants • When do we want to reduce vs understand uncertainty? • Extrapolation: how can we bound it? What priors/constraints do we have when extrapolating? Adaptive design/RL? Related to UQ/GP approaches (climate breakout) • In DNN, we use opt alg; find local optima usually; how can we quantify sub-optimality and UQ implications? • Insufficient data: how do we do UQ? A “weak” posterior? (prior > data?) Overfitting.

UQ: Challenges and Gaps • ML for UQ vs UQ for ML • Highly domain dependent • Why did we do ML first? Physical model too expensive -> surrogate model (where is it applicable/how local is it; discontinuities are problems computationally and wrt uncertainty) • Use of surrogate models • Replace expensive models with ‘cheap’ surrogates • Physics based models are hard for different reasons • Are cheap and local. Can we quantify this better? • Need better class of AI models to create surrogates • Probabilistic methods for non-local approximations • “Model of a model of a model…” • How do we propagate error optimally, what is loss function • UQ for extreme events with high loss or quantiles

UQ: Challenges and Gaps • Climate Challenges: needs ‘better UQ’ • Lots of deterministic models across scale propagate up/down/ between grid cells • Want to improve predictions/intervals; v&v not fast/sample size 1 earth • Want to quantify what we don’t know • Computationally expensive • Use of surrogate models (biased, but fast) • Uncertainty comes from data and model? • GPs Historically Used for Determining UQ on deterministic models • Deterministic models expensive to evaluate • GPs estimate the uncertainty in interpolation • Different than using a surrogate model • Source of uncertainty in model? • How do we do better with DL? • Fundamental Physics • Propagating uncertainty • Transportation Side • Possible to validate/verify; sample size: many cities/areas • Systems Biology • Models do not predict enough to be useful in real world (eg transition to successful clinical trials) Why? Model? Data? v&v doesn’t work out

2) Interpretability and Exp • Tensor factorization for explainability: N-dimensional data; unsupervised learning (climate/cancer/chemistry); like NNMF/latent features; chemistry/combustion: multiscale modeling in transportation • Definitions? • Why are we looking at the posterior? To accept/reject predictions -> point estimate fine; in practice we use softmaxes; develop new methods focused on ‘trust’; PPD makes us feel good, but is it true? • We need data/model parameters for different reasons; helps differentiate exp and interp. Explaining an unforseen outcome; vs interpreting model output; explaining/inter parameters • aspects where one matters/other doesn’t; multiscale - interpretability at subgrid desired, other scale for control decision; application specific • Model robustness: (modeling assumptions; do we use priors too strong?)

Interp/Exp Challenges and gaps • Connection UQ and interp/explainability • In climate, have a big inverse problem • If I believe output, how do I understand how we got there? • How do we quantify interpretability across domains? • What is uncertainty by which we get to solution from a complex model • Bounds

3) Validation & Verification • big in power industry; Want V&V for everything. -> TRUST for decision makers; nuclear security (quantile bounds; extreme value theory); deterministic models; asymmetric loss • for statistical models, validation: how is validation different than model selection? Probabilistic calibration; asymmetric loss/rare events/bounds • Verification: does model do what it is intended to do; in simulation (ABS), we follow/subselect some agents to verify; • Do we need new v&v paradigms for models built on new hardware

V&V Challenges and Gaps • Domains such as power and national security (nuclear) • Key prerequisite for trust of decision makers • Dovetails with UQ • Does our UQ lead to results that can be reproduced • Issue in systems biology not translating to lots of new successful FDA trials • Can we do better with V&V in areas like climate

4) Optimization • We need to solve big opt problems (MINLPs) for energy grid generation and distribution; algorithm and scalability • precision/accuracy: tradeoff in loss; newton’s vs quasi-newton vs SGD; can we select model fitting under different loss functions? Challenges • Sometimes a tool, sometimes an outcome • Many times there is no single, one dimensional objective function • When is optimization the right answer • Big, nasty problems mathematically (MINLPs) • What do we need to develop in opt for AI? • Robust methods (what do we mean specifically) Lots of AI at many scales with high variance. • Do we want resilience or rigidity? Defining performance metrics difficult Gaps: • Separating larger problem into smaller problems (decomposition) • SGD can be improved on; better classes of model could be enabled Infrastructure/Investment • HPC Needs from Domain: • How can we relax the problem? (physics constrained; a discretized PDE) • More problems; benchmarks for power systems Uniqueness to DOE • Domain specific, electric; DOE has computational power; DOE has big data (big DL, energy grid, climate); culture of • collaboration between modeling and domain Capabilities • 3-5 year: finding optimal solutions of DL; adding in constraints ; 10-15 year: using quantum to speed up; automating energy grid; 3d functional design with AI driven topology optimization

5) Decision • Precision and accuracy; understand tradeoffs for decisions with high consequences; multiple methods/approaches when methods have different benefits/precision (global/local/multimodal and speed fast/slow)

6) Information Theory/Data • What do we lose when we choose the wrong class of models at the outset? • Time scale; what data do we keep? For how long? Colocation of data/compute? Sensor side: ‘real time’ decisions; years later may need post hoc analysis (error/anomaly later in production life) What should we keep? How? • Data ‘sufficiency’ is different; (what do we need) sufficient dim reduction • Thoughts on models better/worse at interpolation vs extrapolation? (sequential analyses of hypotheses?) adaptive designs • How do we do model selection in adaptive design? • Data curation; information; build models for a specific dataset, not necessarily the population it came from; do we have enough data to answer other hypotheses.; What data do I need to know to answer hypothesis X

Information Theory/Data Challenges and Gaps • Characterizing model/data go hand in hand • Which data have strong influence in model predictions • Some hypotheses we know a priori, some we know post hoc • We don’t know what analyses will be done in the future • We don’t know what data to keep • How valuable is data from surrogate models? • When is it valuable? • How do we estimate the value of data over time? (eg user facility like SNS) • Do we maintain just statistics of data distribution instead of data • What mix of high resolution/low resolution allows original data to be reconstructed accurately? • Additive manufacturing (post hoc error analysis) • User facility: how do you monitor the stream of data over time? • What objective functions are used when keeping data around? • Energy systems: intrusion detection/security • Healthcare: what health covariates would predict future diagnoses • Office Science/NNSA same problem • How do we value data at different scales

Information Theory/Data Gaps • Don’t know the value data over time • Used for different purposes over time • Maintain and keep data with limited resources • New data (sensors) have different information • Information is in literature that isn’t easily extractable (materials, standards, etc) • Value of data from surrogate models not always obvious • No DOE database exists Infrastructure/Investment • Encoder/decoder • Literature mining Needs from Domain: • How can domain quantify changes in data measurements as sensors change (climate: units, measurement change) • What data is important; what has been done in literature; what is the expiration date for their data; need metadata Uniqueness to DOE • Big generator of data; user facilities; Esnet; unique data with little immediate economic value; Capabilities • 3-5 yr better data management plan; 3D ability to identify defects post hoc; applications to AI; 10-15 year ability to reconstruct data

7) Design of Experiments • precision/accuracy; asymmetric loss; iterative decision making • Iterative set of analyses, propagating uncertainty, how good do we need to be at estimating error to get a good prediction interval • In finance; there is value of information with respect to financial risk; we (DOE) don’t necessarily always have very clear metric of risk a priori • In manufacturing, machines are under-instrumented; need a formal way to measure what we need to know • AI suit: up front, look (up front) what hypotheses might be solved, what is the value of different measurements wrt those hypotheses; decision theoretic approach (what are loss functions?) “what we know now, vs what we want to know in the future” • Thoughts on models better/worse at interpolation vs extrapolation? (sequential analyses of hypotheses?) adaptive designs • How do we do model selection in adaptive design?

Design of Experiments Challenges and Gaps • DOE and UQ are also related in real world, iterative science • Need better adaptive design when we make decisions iteratively • ML for DOE vs DOE for ML • An approach to the curse of dimensionality? • DOE and reducing UQ in an optimal way • Model characterization • Design of experiments to determine to bound uncertainty/variance estimate? • Many labs are interested in finding rare events • How does DOE need to adjust for things like rare/extreme events like particle events • Can UQ estimates help determine the best places to look for rare events? • UQ/DOE is changing fast, can we provide better education/automated tools • For a domain like fundamental physics, how can we create better metadata/tools for domain needs

Design of Experiments Gaps • Experiments (multiscale, etc) are complex, specific experiments not known a priori • Want to maximize value of information in a user facility experiment when resources can go offline • Design for rare events, for example particle detectors • Need to clarify the use: static, adaptive/iterative • Codesign: don’t know specific experiments when you build a user facility (env science; innoculate soil, don’t know what you’ll want to measure in the future) • Value of data changes over time: Underinstrumented: how to maximize value with uncertain value of data over time (features are expensive to add) Infrastructure/Investment • Mathematical side; optimization with non-traditional constraints; DOE with asymmetric loss functions; balance between flexibility and improvement (bandit like) Needs from Domain: • ML for DOE: climate, analyze time series to decide which processes are most important to include; what new data to collect • Collaboration earlier; define loss functions more clearly • Getting prior information Uniqueness to DOE • Energy design, generation and distribution; rare particle event detection; user facilities; asymmetric loss functions (nuclear security, energy grid) Capabilities • 3-5 yr: understanding from domain scientists (bioepic at LBNL, instrument soil), 10-15 year: can AI automate the DOE process for a given experimental process (automated lab)

8) Causality • bio idea of “digital twin” to create counterfactuals for treatment options • Modeling physical phenomenon; miss nuances of human/social phenomena; many causal pathways/interp/exp can be very culturally dependent (environment not consistent); domain knowledge ‘culturally dependent’ • How are interp/exp different in causal framework vs association/correlation models (how do decision makers view these models?) • Can we create a causal/explainable model Challenges • How do go from correlation to causation in next gen AI? • What are the gaps in the deep learning world for (a) cause/effect and (b) counterfactual settings

Model Applicability and Characterization Co-lead: Blair Christian (ORNL) Co-lead: Dan Lu (ORNL) Co-lead: Justin Newcomer (Sandia)

List of breakout participants (44 people) • Jiafu Mao • Yury Maximov • Hugh Medal • Matt Menickelly • Konstantin Mischaikow • Susan Mniszewski • Ambarish Nag • Kyle Neal • Michelle Newcomer • Jim Ostrowski • Ozgur Ozmen • Tara Pandya • Pavel Lougovski • Jiaxin Zhang • Dawn Levy • Armenak Petrosyan • Hong Qin • Daniel Ricciuto • Derek Rose • Peter Schultz • Satyabrata Sen • Stuart Slattery • Michael Smith • Sibendu som • Suhas Somnath • David Stracuzzi • Hai Xiao • Zechun Yang • Steven Young • Blair Christian • Dan Lu • Justin Newcomer • Mathieu Doucet • David Fobes • Nancy Hayden • Jacob Hinkle • Travis Johnston • Brian Kaul • Ryan King • James Kress • Jitendra Kumar • Frank Felder • Robert Link • Lexie Yang

Breakout Agenda • 9:00 Introduction (10 min) • 9:10 Review charge questions (10 min) • 9:20 Identify potential topics (20 min) • 9:40 Merge and reduce topics (20 min) • 10:00 Topic 1 (30 min) • 10:30 Topic 2 (30 min) • 11:00 Topic 3 (30 min) • 12:50 Wrap up

Mapping application challenges to the crosscuts Data Learning Scalability Assurance Workflow • Experimental design • Data curation and validation • Compressed sensing • Facilities operation and control • Physics informed • Reinforcement learning • Adversarial networks • Representation learning and multi-modal data • “Foundational math” of learning • Algorithms, complexity and convergence • Levels of parallelization • Mixed precision arithmetic • Communication • Implementations on accelerated-node hardware • Uncertainty quantification • Explainability and interpretability • Validation and verification • Causal inference • Computing at the edge • Compression • Online learning • Federated learning • Infrastructure • Augmented intelligence • Human-computer interface Partial Information Actions Rewards Environment Agent Model-based Approximations States Model applicability and characterization

Identify 3-5 open questions that need to be addressed to maximally contribute to AI impact in the science domains and/or AI impact in the enabling technologies? For each challenge: To what extent is DOE uniquely positioned to address this challenge? What contributions can DOE make to the broader AI community (3-5 years, 10+ years)? How well is the broader AI community suited to addressing this challenge? What capabilities are imagined in the 3-5 year timeframe and the 10-15 year timeframe? What classes of AI problems will the technical area contribute to? What level of infrastructure and investment is needed to realize the impact? What do you need from the domain sciences to push your areas forward? Review Charge Questions (10 minutes)

Initial list of topics (20 minutes)

Merge and reduce topics (20 minutes)

Topic 1: UQ • Uniqueness to DOE • Contributions from DOE • Math/statistics theory • HPC architectures • Broader community contributions • Confidence in ML predictions • Knows what we do not know • Capabilities • 3-5 years • Learn probability distribution from data in high dimensional spaces • Mathematical and computational approaches that can be widely applied • Physics-informed ML methods to constrain the uncertainty • 10-15 years • UQ for combined ML and physical models • Scalable UQ methods • Generalized ML model for extrapolation (rare events) and UQ to quantify the extrapolation uncertainty • Life-cycle UQ framework

Topic 1: • Classes of AI problems contributed to • Probabilistic deep learning • Infrastructure and investment needed • DOE-leading computing • Needs from the domain sciences • Physics information to constrain the uncertainty and for V&V • To understand the uncertainty propagation • Contributions to broader AI community • 3-5 years • Advance the ML model prediction (accuracy and precision) • Explainability and interpretability • 10+ years • Support decision making • Improve predictive understanding

Topic 2: Interpretability and Explainability • Uniqueness to DOE • Contributions from DOE • More relevant to high consequence scientific applications • Having methods that are inherently explainable/interpretable enables more rapid/automated scientific discovery • Broader community contributions • Overlap in interests from pharma and transportation/mobility - both in terms of verifiability and in terms of improving understanding of outcomes • Capabilities • 3-5 years • Ability to define/quantify interpretability and explainability? Qualitative vs. quantitative measures of explainability / How do we encode explainability? • Ability to build in explainability / interpretability as a feature of AI/ML methods • 10-15 years • Development of a new class of algorithms that directly optimize interpretability and explainability • Ability to represent/encode, and embed domain knowledge and causality into AI algorithms • Ability to interrogate the model to understand why it is making decisions - draw out the explainability from the model

Topic 2: Interpretability and Explainability • Classes of AI problems contributed to • All • Infrastructure and investment needed • Dedicate people/funding focused specifically on explainability rather than having it be an offshoot of other projects • Embedding explainability into models and algorithms may (or may not) increase computational complexity - Degree of explainability may come at a cost. Would like to understand what those costs are and how to make those tradeoffs. • Needs from the domain sciences • Understanding of the tradeoffs between interpretability and explainability - which is needed, to what degree, and when • Understanding of what the domain scientist needs to trust a model (metrics?) - What is a satisfactory explanation/interpretation? • Representation of domain knowledge / known causal relationships • Contributions to broader AI community • 3-5 years • 10+ years • Ability to provide domain scientists, decision makers, and the general public the trust to adopt the AI algorithms for critical decisions

Topic 3: Optimization, Design of Experiments, Information Theory/Data • Uniqueness to DOE • Large user facilities • Energy/National security • Climate • Asymmetric loss • Experiments not clearly defined • Contributions from DOE • HPC • User facility data • Dynamic user groups • Large use of surrogate models • Capabilities • 3-5 years • XXX • XXX • 10-15 years • XXX • XXX

Topic 3: Optimization, Design of Experiments, Information Theory/Data • Classes of AI problems contributed to • DOE dovetails with • XXX • Infrastructure and investment needed • XXX • XXX • Needs from the domain sciences • XXX • XXX • Contributions to broader AI community • 3-5 years • XXX • XXX • 10+ years • XXX • XXX

Design of Experiments Challenges • DOE and UQ are also related in real world, iterative science • Need better adaptive design when we make decisions iteratively • ML for DOE vs DOE for ML • An approach to the curse of dimensionality • DOE and reducing UQ in an optimal way in complex settings like climate • Model characterization • Design of experiments to determine to bound uncertainty/variance estimate • Many labs are interested in finding rare events • How does DOE need to adjust for things like rare/extreme events like particle events • Can UQ estimates help determine the best places to look for rare events? • UQ/DOE is changing fast, can we provide better education/automated tools • For a domain like fundamental physics, how can we create better metadata/tools for domain needs

Design of Experiments Gaps • Experiments (multiscale, etc) are complex, specific experiments not known a priori • Want to maximize value of information in a user facility experiment when resources can go offline • Design for rare events, for example particle detectors • Need to clarify the use: static, adaptive/iterative • Codesign: don’t know specific experiments when you build a user facility (env science; innoculate soil, don’t know what you’ll want to measure in the future) • Value of data changes over time: Underinstrumented: how to maximize value with uncertain value of data over time (features are expensive to add) Infrastructure/Investment • Mathematical side; optimization with non-traditional constraints; DOE with asymmetric loss functions; balance between flexibility and improvement (bandit like) Needs from Domain: • ML for DOE: climate, analyze time series to decide which processes are most important to include; what new data to collect • Collaboration earlier; define loss functions more clearly • Getting prior information Uniqueness to DOE • Energy design, generation and distribution; rare particle event detection; user facilities; asymmetric loss functions (nuclear security, energy grid) Capabilities • 3-5 yr: understanding from domain scientists (bioepic at LBNL, instrument soil), 10-15 year: can AI automate the DOE process for a given experimental process (automated lab)

Optimization Challenges • Sometimes a tool (SGD for AI), sometimes a goal (energy production) • Many times there is no single, one dimensional objective function • When is optimization the right answer • Big, nasty problems mathematically (MINLPs) • Are local optima good enough in AI? • Robust methods (what do we mean specifically) Lots of AI at many scales with high variance. • Do we want resilience or rigidity? Defining performance metrics difficult Gaps: • Separating larger problem into smaller problems (decomposition) • SGD can be improved on; better classes of model could be enabled Infrastructure/Investment • HPC Needs from Domain: • How can we relax the problem? (physics constrained; a discretized PDE) • More problems; benchmarks for power systems Uniqueness to DOE • Domain specific, electric; DOE has computational power; DOE has big data (big DL, energy grid, climate); culture of • collaboration between modeling and domain Capabilities • 3-5 year: finding optimal solutions of DL; adding in constraints ; 10-15 year: using quantum to speed up; automating energy grid; 3d functional design with AI driven topology optimization

Information Theory/Data Challenges • Characterizing model/data go hand in hand • Which data have strong influence in model predictions/uses • Some hypotheses we know a priori, some we know post hoc • We don’t know what analyses will be done in the future • We don’t know what data to keep • How valuable is data from surrogate models? • When is it valuable? • How do we estimate the value of data over time? (eg user facility like SNS) • Do we maintain just statistics of data distribution instead of data • What mix of high resolution/low resolution allows original data to be reconstructed accurately? • Additive manufacturing (post hoc error analysis) • User facility: how do you monitor the stream of data over time? • What objective functions are used when keeping data around? • Energy systems: intrusion detection/security • Healthcare: what health covariates would predict future diagnoses • Office Science/NNSA same problem • How do we value data at different scales

Information Theory/Data Gaps • Don’t know the value data over time • Used for different purposes over time • Maintain and keep data with limited resources • New data (sensors) have different information • Information is in literature that isn’t easily extractable (materials, standards, etc) • Value of data from surrogate models not always obvious • No DOE database exists Infrastructure/Investment • Encoder/decoder • Literature mining Needs from Domain: • How can domain quantify changes in data measurements as sensors change (climate: units, measurement change) • What data is important; what has been done in literature; what is the expiration date for their data; need metadata Uniqueness to DOE • Big generator of data; user facilities; Esnet; unique data with little immediate economic value; Capabilities • 3-5 yr better data management plan; 3D ability to identify defects post hoc; applications to AI; 10-15 year ability to reconstruct data

Summary: key points to communicate (10 minutes)

Advancing Uncertainty Quantification: Challenges and Solutions in Machine Learning and Physics

Advancing Uncertainty Quantification: Challenges and Solutions in Machine Learning and Physics

Presentation Transcript

Next Steps

Next steps

Next steps

Next Steps

Next Steps

Next Steps

Next Steps

Next Steps…

“NEXT STEPS”

Next Steps

Next STEPS

Next steps

Next Steps

Next Steps

Next Steps

Next Steps

Next Steps

Next Steps

Next Steps

Next Steps…………………………..

Next Steps

Next Steps