Calibrating Uncertainties in Deep Learning Models at Scale

Calibrating Uncertainties in Deep Learning Models at Scale Research Challenges and Opportunities at the Interface of Machine Learning and Uncertainty Quantification July 25, 2019 Gemma J. Anderson, Jim A. Gaffney, Sam A. Jacobs, Brian Van Essen and Brian K. Spears UQ

Using deep learning for science has its own challenges... • We need meaningful uncertainties from our deep learning models: can we trust our predictions? • We are developing new methods to generate data-driven, calibrated uncertainties from arbitrary deep learning models • These developments will be essential in making deep learning a viable tool for scientific applications • Other challenges • Physical systems obey strict rules: are these captured by the network? • How to do Bayesian inference in the context of transfer learning? • …

Big picture of uncertainty quantification Goal: prediction uncertainty on the quantities of interest (QoI) Given the: • uncertainty in the experiment • the bias of simulations versus experiment • uncertainty in the input physics parameters • uncertainty in the surrogate model • … … what is the uncertainty in the prediction of the QoI?

We strive to advance our predictive capability by challenging simulation with experiment Traditional pillar High-performance computing Traditional pillar Large-scale experiments HYDRA simulation NIF X-ray image • Goal is to achieve fusion ignition, where energy generated outstrips energy lost, demonstrating potential for fusion energy to be a viable energy source • Few experiments (~10 shots) that are very costly • Simulates the NIF implosions. Helps us understand experiments • High dimensional input parameter space, including physics parameters and laser input parameters • Outputs include scalars, images and time histories • Computationally expensive

We strive to advance our predictive capability by challenging simulation with experiment Traditional pillar High-performance computing Traditional pillar Large-scale experiments We compare a few key scalars – leaving our models less constrained Ysim Tion,sim P2,sim Yexpt Tion,expt P2,expt HYDRA simulation NIF X-ray image

Deep learning allows for a much richer comparison New pillar Machine learning to compare simulation and experiment Traditional pillar high-performance computing Traditional pillar Large-scale experiments Deep neural network HYDRA simulation NIF X-ray image Deep learning will allow us to use our full data sets to make our models more predictive

Train a deep neural network (DNN) surrogate model that maps inputs to outputs Very expensive simulation Input parameters Outputs from simulation Outputs from DNN Multi-modal: scalars, images and time histories Laser inputs and physics parameters Deep neural networks are accurate, learns highly non-linear mapping, and are scalable Want a model that is equipped with calibrateduncertainties

We developed a cyclic system of sub-networks to engineer required performance features Cyclically-consistent Generative Adversarial Network Autoencoder Performance features • Uses all the data engineers the latent space • Enforces physical consistency predictions look like training examples • 3. Enforces self consistency • regularizes ill-posed inverse Forward Model Input Parameters Predicted Output X Y Discriminator Model Inverse Model Predicted Parameters X • For details see: Anirudh et. al. (2019), Cyclical Regularization for Robust Surrogates in Inertial Confinement Fusion (to appear) Physical Consistency Loss

The model captures important interdependencies between different observables Clear physics correlation in simulated data * (work performed by Rushil Anirudh) *Obtained by comparing images in different spectral ranges

The model also captures important interdependencies between different observables Standard ML on images does not identify this feature Clear physics correlation in simulated data convolutional network + images only * (work performed by Rushil Anirudh) *Obtained by comparing images in different spectral ranges

The model also captures important interdependencies between different observables Clear physics correlation in simulated data Improved architecture discovers physics trends improved network + images + scalars * (work performed by Rushil Anirudh) *Obtained by comparing images in different spectral ranges

Types of Uncertainty in Neural Networks • DNNs typically consist of deterministic weights and biases and consequently deterministic outputs. Gives us no measure of how certain they are in their prediction • To equip the DNN with uncertainties, we allow the weights and biases to have probability distributions. Epistemic: surrogate model uncertainty • Which model generated the data? • Can be reduced with more data Aleatoric: randomness in the system • Governed by random physical phenomena • Not relevant for our application : posterior of the network weights , θ Marginalize over all possible weights

Dropout as a Bayesian Approximation • Dropout is implemented by randomly setting weights in the neural network to zero, with a rate defined by the dropout keep rate - a variational parameter • Originally a method to protect against overfitting • It was shown in Gal (2015) that performing dropout during training and when making a prediction is equivalent to approximate Bayesian inference in deep Gaussian processes • Dropout gives us a measure of epistemic uncertainty in the deep neural network • No additional complexity or compute time gif by Michael Pearce

Apply dropout in DNN to obtain prediction uncertainties for QoIs Outputs Inputs • 5 dimensions • 2 independent variables • 3 physics parameters • 15 scalars • x-ray emission images Rapid semi-analytic simulation code [Gaffney] • Feed an input sample to DNN many times to make a prediction • Repeat 3 and 4 with different dropout keep rates • Sample input space with Latin Hypercube sampling • Run 100,000 simulations • Train DNN with dropout AE C-GAN dropout

Prediction uncertainties for a key QoI Coefficient of variation as a function of the two independent variables , (physics parameters set to zero) for different dropout keep rates • Coefficient of variation = (standard deviation/mean)*100 Model is less confident in regions where it hasn’t seen data Keep rate = 0.8 extrapolation region Uncertainties decrease as dropout keep rate increases, so which uncertainty estimate is correct? training box

The uncertainties generated by a dropout DNN are not calibrated to the data; they are user-defined Does the 50% percentile contain the correct answer 50% of the time? • Prediction uncertainties stem from the choice of neuron activation, loss function, and training hyperparameters • They do not describe the probability that the prediction is correct When comparing with experimental data, incorrect uncertainties significantly degrade the information content of the experiment and can skew results

Coverage probabilities give a data-driven metric for the quality of our uncertainty models • Train a model with dropout switched on • Make predictions of all QoIs for test data • Compute percentiles for each prediction • Compute coverage probabilities Coverage probability: the fraction of test samples where the true value lies within the percentile for which it was calculated • Want coverage probabilities to be equal to the percentile for which they were calculated, to give calibrated uncertainties

Coverage probabilities for models with different dropout keep rates • Varying a single dropout keep rate can tune a single coverage probability • No model is giving perfectly calibrated uncertainties model that is best calibrated 99.7% percentile 95.5% percentile 68.3% percentile 50% percentile

A “good” model does not always result in well-calibrated uncertainties...

Want a model with good MSE and well-calibrated uncertainties

Current methods do not provide us with calibrated uncertainty estimates • Concrete dropout is a variation of dropout that allows for the keep rate to be optimized during training and claims to provide well-calibrated uncertainties • For us, concrete dropout always trains to a place with no dropout • Doesn’t give us calibrated uncertainties

We can use large scale learning to achieve optimally calibrated uncertainties We showed several dropout keep rates and computed coverage probabilities for only a handful of percentiles Model takes 16 hours to run on one GPU Difficult to explore full range of dropout keep rates e.g. Keep Rate = 0.89 Keep Rate = 0.94 Keep Rate = 0.85 Keep Rate = 0.72 Keep Rate = 0.99 • Use large scale learning to set keep rate by layer!

Livermore Big Artificial Neural Networks (LBANN) • Large-scale deep learning toolkit • Accelerate (scientific) deep learning training as data and models grow • Optimized for world-class supercomputers (with thousands of GPUs) but still run on your laptops • Built on a GPU-accelerated distributed linear algebra library and a high performance GPU-aware asynchronous communication library • Provides multiple levels of parallelism • Open source (github.com/llnl/lbann)

LBANN Multiple Levels of Parallelism • Single model (network) training • Train a single model fast either in a data parallel (distributed minibatch) or model parallel (distributed linear algebra or convolution) fashion • Multiple model (network) training • Train multiple models simultaneously (in parallel) • Relevant to tournament voting (population-based training), uncertainty quantification, large-scale data augmentation, hyperparameter exploration

Multiple Models Training in LBANN for UQ • Train multiple models simultaneously in parallel • A model is a set of neural network layers with loss function, metrics, optimizer(s) etc • Each model differs in initialization, hyper parameters, training data etc • For calibrated UQ, models only differ in keep dropout rate, training data and other hyperparameters are fixed

Multiple projects are using LBANN • Multi-modal DL for Nonproliferation • Accelerate self-supervised learning • Co-training multiple data spokes • ECP CAncer Deep Learning Environment (CANDLE) • High-dimensional multiple drug response prediction • Feature extraction for MD simulation • Cognitive Simulation for ICF • Training on massive, billion sample data sets • Combine multiple image views with time series samples • And more!

We have shown how we will use large scale learning to equip our predictive models with calibrated uncertainty estimates

Our goal is to have an end-to-end UQ pipeline for our predictive deep learning models What does it mean to have calibrated uncertainties for images? How to do simultaneous estimation of physics parameters and bias in the context of an “elevated” (transfer-learned) deep neural network (speak to Jim Gaffney) How to link UQ with transfer learning (Bogdan’s talk from yesterday)

Contact: Gemma J. Anderson - anderson276@llnl.gov References: • Anirudh et. al. (2019), Cyclical Regularization for Robust Surrogates in Inertial Confinement Fusion (to appear) • LBANN: github - https://github.com/LLNL/lbann, publications - https://lbann.readthedocs.io/en/latest/publications.html • Gal and Ghahramani (2016 ), Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning - https://arxiv.org/pdf/1506.02142.pdf This work was performed as part of a Lab Directed Research and Development Strategic Initiative grant award to Brian Spears (18-SI-002) Disclaimer This document was prepared as an account of work sponsored by an agency of the United States government. Neither the United States government nor Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States government or Lawrence Livermore National Security, LLC. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States government or Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes.

Calibrating Uncertainties in Deep Learning Models at Scale