260 likes | 271 Views
Explore robust machine learning and artificial intelligence approaches to enhance accuracy and stability. Focus on unique challenges in science applications and leverage computational capabilities to improve robustness. Develop new capabilities and advance deep network architecture design. Characterize loss landscapes to inform model design and improve interpretability. Utilize variable precision for improved performance.
E N D
Numerical aspects of learning (improving robustness and stability) Co-leads: Sandeep Madireddy (Argonne) Clayton Webster (ORNL) Stefan Wild (Argonne) Writer: Rachel Harken (ORNL)
4 Crosscut Opportunities • Robust ML and AI approaches to increase trust • Advanced deep network architecture design • Characterizing the loss landscape of science-informed ML • Exploiting variable precision for performance • What scientific grand challenges could these address? • Robustness is an essential ingredient for all AI models that augment/model science • Particularly important for mission-critical and high-risk scenarios • Important theoretical guarantees could increase and justify adoption in complex scientific applications
Robust ML and AI Approaches to Increase Trust • What is this? - Build AI models that are accurate and stable, where the stability is with respect to variability/perturbations in the data - Develop strategies for achieving this robustness through the choice of architecture, initialization and optimization • What is unique to/important for science? • The data, information content and noise distributions are unique compared to industry applications • Consequence and risks can be much higher when deployed in practice • Computational and numerical capabilities in the DOE ecosystem could be used to improve robustness • The diversity of the application space provides a richer testbed for developing and testing robustness approaches New capabilities imagined • 3-5 years • A suite of databases and challenge problems common and transferable between science applications that can be used to test and improve the robustness in the AI models • Novel architecture formulations (e.g., Implicit networks) • Domain informed representation learning (input space representations) • Robustness metrics be a commonplace in architecture choice • 10-15 years • Rigorous theoretical basis involving approximation theory for domain-informed architecture choice/design • Probabilistic approaches to deep learning and uncertainty quantification
Advanced Deep Network Architecture Design • What is this? • Design innovative deep networks to resolve physical and engineering systems comprised of multiple complex phenomena. • This requires capabilities that go beyond black-box tools developed by industry that lack important properties, e.g., stability, robustness. • What math needs to be done to enable future advances in science? • How to design continuous ML models given the desired physical and/or analytical properties? • How to design proper discretization schemes to enforce the desired numerical properties? • How to integrate ML models into well-established simulators and accelerate the solvers? New capabilities imagined • 3-5 years • Advance closure models in CFD • New pre-conditioners for systems derived in physics and engineering applications • 10-15 years • New deep network architectures that can accommodate unstructured temporal/spatial meshes. • Transfer the advanced architectures into the state-of-the-art capabilities in applications • What scientific grand challenges could this address? • Noisy/adversarial data from experiments/simulation • High-dimensional input parameters • Irregular data geometry
Characterizing the Loss Landscape of Science-informed ML • What is this? • Understanding properties of critical points and the landscape around them using visualization and analysis • Understand how the loss landscape will affect performance of training algorithms • Characterization of the effects of regularizations on loss landscape • What is unique to/important for science? • Characterization of the loss landscape in order to expediate training of ML models by informing choice of ML model design (hyperparameter optimization, initializations) • Understanding how regularizations encourage certain properties for critical points and landscapes will equip us with new guides to improve the interpretability of neural networks via regularizations. • New capabilities imagined (3-5 years) • Tackle short horizon bias: Mathematical theory that can bound the number of critical points for a given loss function • Improved understanding of loss landscape for more complicated ML models of practical interests, e.g., GANs, Physics-informed ML • Training parameterized models on near-term quantum devices • Multiplicity, bifurcation and other critical behavior regularly encountered in chemically reacting systems
Exploiting Variable Precision for Performance • What is this? • Performance from faster arithmetic, data movement, data storage • Challenges • Numerical implication... reevaluate numerical analysis • Can it be automated? • Issues with different vendor “standards” for half precision • Can we verify we have meaningful results? • Redefining what accuracy means • What is unique to/important for science? • Bigger problems faster • Maintaining accuracy while enhancing throughput for scientific experiments New capabilities imagined • LA Library for half and mixed precision • Fast initial guesses and preconditioners • Precompute minimum precision to train models • Precision-informed neural architecture search What scientific grand challenges could this address? • Make edge computing a reality for science • Higher fidelity models for same cost Google TPU: bfloat16 largest flpt number O(1038) IEEE float16 largest flpt number 65,504 IEEE SP
STOP HERE CLAYTON • Love Stefan
Advanced Deep Network Architecture Design • What is this? • Design innovative deep networks to resolve physical and engineering systems comprised of multiple complex phenomena. • This requires capabilities that go beyond black-box tools developed by industry that lack important properties, e.g., stability, robustness. • What math needs to be done to enable future advances in science? • How to design continuous ML models given the desired physical and/or analytical properties? • How to design proper discretization schemes to enforce the desired numerical properties? • How to integrate ML models into well-established simulators and accelerate the solvers? New capabilities imagined • 3-5 years • Advanced NNs to infer and replace, e.g., closure models in CFD • New pre-conditioners for systems derived in physics and engineering applications • 10-15 years • New deep network architectures that can accommodate unstructured temporal/spatial meshes. • Transfer the advanced architectures into the state-of-the-art capabilities in applications.
Advanced Deep Network Architecture Design … from domain sciences To maximize impact, need ... … from other AI4Sci crosscuts/ enabling techs • <-> … from broader AI community • <-> • What scientific grand challenges could this address? • Noisy/adversarial data from experiments/simulation • High-dimensional input parameters • Irregular data geometry • State of the art • Poor stability for deep networks based on explicit architectures • Use exiting deterministic adjoint method to compute gradients to train standard ResNets. • Needs from domain sciences to maximize impact • Needs from the domain sciences • XXX • XXX • Contributions to broader AI community • 3-5 years • XXX • XXX • 10+ years • XXX • XXX
Characterizing the Loss Landscape of Science-informed ML • What is this? • Understanding the properties of critical points and the landscape around them using analytical visualization techniques, • Understand how the loss landscape will affect performance of training algorithms, • Characterization of the effects of regularizations on loss landscape. • What is unique to/important for science? • Characterization of the loss landscape in order to expediate training of ML models by informing choice of ML model design (hyperparameter optimization, initializations) • Understanding how regularizations encourage certain properties for critical points and landscapes will equip us with new guides to improve the interpretability of neural networks via regularizations. • New capabilities imagined (3-5 years) • Tackle short horizon bias: Mathematical theory that can bound the number of critical points for a given loss function • Improved understanding of loss landscape for more complicated ML models of practical interests, e.g., GANs, Physics-informed ML • Scalable stability analysis of critical points for larger networks
Characterizing the Loss Landscape of Science-informed ML What scientific grand challenges could this address? • Training parameterized models on near-term quantum devices • Currently circuit design is driven by quantum system of interest and training is done with brute force gradient descent or gradient-free optimization • Understanding how circuit design can affect the loss landscape can lead to improved noise resilience circuits or reduce the computational cost of training by leading to more informed choice of initializations • Understand how incorporating information about the physical system affects loss landscape and ultimately training • Multiplicity, bifurcation and other critical behavior regularly encountered in chemically reacting systems (e.g., system with coupled bulk phase and surface reactions). Examples: Catalytic chemical conversion and chemical vapor deposition of materials. Machine learning algorithms with rigorous numerical foundation are required to build surrogate models for such systems • Allow treatment of physically valid multiplicity vs. experimental repeatability • Enable training from a blend of experimental measurements and numerical solutions
Exploiting Variable Precision for Performance • What is this? • Performance from faster arithmetic, data movement, data storage • Challenges • Numerical implications • Can it be automated? • Issues with different “standard” for half precision • Can we verify we have meaningful results? • Reevaluate numerical analysis • What is unique to/important for science? • Bigger problems faster, • Maintaining accuracy • Redefining what accuracy means • Enhanced throughput for scientific experiments New capabilities imagined • 3-5 years • LA Library for half and mixed precision • Fast initial guesses and preconditioner • Precompute minimum precision to train model • 10-15 years • Fully automated • Precision informed neural architecture search Google TPU: bfloat16 largest flpt number O(1038) IEEE float16 largest flpt number 65,504 IEEE SP
Exploiting Variable Precision for Performance … from domain sciences • Benchmarks • Experiments and data with different precision levels To maximize impact, need ... … from other AI4Sci crosscuts/ enabling techs • Benchmarks • Hardware: Variable precision adjustable by user • Software: Performance and debugging tools • Software: Ways to express accuracy and problem conditioning … from broader AI community • Need for standards • Performance engineering tools sensitive to mixed precision • What scientific grand challenges could this address? • Make edge computing a reality for science • Higher fidelity models for same cost • Can affect all scientific computations • State of the art • Vendors are implementing different standards • Deep learning frameworks e.g. tensor-flow, SLATE library, a few apps using mixed precision • Needs from domain sciences to maximize impact • Needs from the domain sciences • XXX • XXX • Contributions to broader AI community • 3-5 years • XXX • XXX • 10+ years • XXX • XXX
Robust ML and AI Approaches to Increase Trust • What is this? - Build AI models that are accurate and stable, where the stability is with respect to variability/perturbations in the data - Develop strategies for achieving this robustness through the choice of architecture, initialization and optimization • What is unique to/important for science? • The data, information content and noise distributions are unique compared to industry applications • Consequence and risks can be much higher when deployed in practice • Computational and numerical capabilities in the DOE ecosystem could be used to improve robustness • The diversity of the application space provides a richer testbed for developing and testing robustness approaches New capabilities imagined • 3-5 years • A suite of databases and challenge problems common and transferable between science applications that can be used to test and improve the robustness in the AI models • Novel architecture formulations (e.g., Implicit networks) • Domain informed representation learning (input space representations) • Robustness metrics be a commonplace in architecture choice • 10-15 years • Rigorous theoretical basis involving approximation theory for domain-informed architecture choice/design • Probabilistic approaches to deep learning and uncertainty quantification
Robust ML and AI Approaches to Increase Trust … from domain sciences • Challenge problems and datasets • Test suites To maximize impact, need ... … from other AI4Sci crosscuts/ enabling techs • Uncertainty quantification in addition to robustness … from broader AI community • Robustness metrics • What scientific grand challenges could this address? • Robustness is an essential ingredient for all AI models that augment/model science • Particularly important for mission critical and high risk scenarios • Important theoretical guarantees could increase and justify adoption • Needs from domain sciences to maximize impact • Needs from the domain sciences • XXX • XXX • Contributions to broader AI community • 3-5 years • XXX • XXX • 10+ years • XXX • XXX
Rough Plan • 9:00 Intro & Overview • 9:30 Break into subgroups • 11:30 Pick up lunch in other building • 11:45 Meet back to critique generated slides and eat lunch • 12:30 Head back/co-leads prep slides
Primary topics and subgroups 1. Inspiration from the continuum limit • new insights into the design of better (novel) architectures and faster learning algorithms (increased convergence) • natural setting for very deep/wide networks with potentially finer control over the parallel scalability and required computing resources • solid theoretical foundation to achieve well-understood and predictable behavior (e.g., dynamical systems, PDEs, integral representations, splines, etc.) • complexity reduction and stability guarantees (total variation of the numerical solution at a fixed time remains bounded as the step size goes to zero, e.g., ResNets (Forward Euler) vs ImplicitNets (Backward Euler)) • optimal control framework for forward propagation (solution of state equations) and back propagation (solution of adjoint equations)
Primary topics and subgroups 2. Robust ML and AI approaches to Increase Trust • Specification testing: Techniques to test that ML systems are consistent with properties (such as invariance, robustness, physics) desired by the underlying system. • Robust training: Training algorithms that produce models that not only fit training data well, but also are consistent with a list of specifications, e.g., robustness • Formal verification: Efficient approaches to setting geometric bounds based on a given specification or underlying physics • Computational cost of enforcing robustness, e.g., adversarial training • Robustness considerations for Reinforcement Learning • Connection between Representation learning and robustness • Reduced and variable precision (for training inference and beyond)
Primary topics and subgroups 3. Understanding critical points of neural networks (global minima, local minima, saddle points) • Understanding the properties of critical points and the landscape around them from an analytical aspect and the characterization of the effects of regularization on loss landscape • Intensive efforts into this direction; however, most current results have been focused on simple models (e.g., linear, shallow networks) and idealistic conditions • Extensions to more general, complex models will further facilitate our understanding on these aspects • Connection between adversarial robustness and the landscape of the objective function leading to the local minimum
<Succinct Title of Opp/Challenge> • What is this? • <--> • What is unique to/important for science? • <--> New capabilities imagined • 3-5 years • <--> • 10-15 years • <-->
<Succinct Title of Opp/Challenge> … from domain sciences • <-> To maximize impact, need ... … from other AI4Sci crosscuts/ enabling techs • <-> … from broader AI community • <-> • What scientific grand challenges could this address? • State of the art • <optional> • Needs from domain sciences to maximize impact • Needs from the domain sciences • XXX • XXX • Contributions to broader AI community • 3-5 years • XXX • XXX • 10+ years • XXX • XXX