Deep Learning for Soil Moisture Estimation and Uncertainty Analysis

Deep learning for estimating land surface response with uncertainty: soil moisture and other opportunities Kuai Fang1, Chaopeng Shen1, & Daniel Kifer2 1Civil and Environmental Engineering, Penn State University 2Computer Science and Engineering, Penn State University cshen@engr.psu.edu https://github.com/mhpi @ChaopengShen

Content • Deep learning models for soil moisture • Separating uncertainties with LSTM models • Perspectives on incubating DL-powered research

Background • Land surface provides important lower boundary conditions (vapor, heat, momentum, CO2) for the atmosphere • Assimilating soil moisture bias improves weather forecasts • Land surface models often produce biases • Is this idea “oversold”? We need to see...

Background • Soil Moisture Active Passive (SMAP) • Launched recently (2015/04) • 2~3 days revisit time • Senses moisture-dependent top surface soil • Use cases • Weather forecasting (land feedback) • Runoff prediction • Vegetation water stress • Agriculture • How does soil moisture really work? • Inferring precipitation?

Use DL to for moisture projection Self-learned Memory system • Time Series Deep Learning (DL) • Long-short Term Memory (LSTM) • Use DL as a dynamical model! • (hopefully) less bias for atm. models • Free from structural assumptions • May be used to correct feedbacks to atmosphere (y)

Soil moisture prolongation LSTM model (y) SMAP L3_P Atm. Forcing (NLDAS) train (x) Surface soil moisture (optional) Land Surface Model (Noah) solutions Soil texture, slope, land cover, irrigation, depth to water table, etc

Using LSTM, we prolonged SMAP to … • Spatio-temporally seamless coverage

Using LSTM, we prolonged SMAP to … • Examined comparison with in-situ data & long-term projections • Long-term projections DL models are powerful. Combined DL-SMAP could help improve soil moisture estimates But can we estimate the uncertainty from those DL models? – important for DA

Deep learning models for soil moisture • Separating uncertainties with LSTM models • Perspectives on incubating DL-powered research

How to estimate the uncertainty? MCD+A Input-dependent Aleatoric uncertainty • Cannot be reduced by more data • Magnitude maybe predictable Epistemic uncertainty • Comes from uncertain model param. • Captured by Monte Carlo dropout (MCD) To be predicted by LSTM along with yi W: weight Ŵ: dropout weight x: input f: deep network Shen 2018 WRR Kendall and Gal 2017

Uncertainty estimation with deep network? Cited 807!! • Gal 2016 argued that NN+Monte Carlo dropout=deep Gaussian process http://mlg.eng.cam.ac.uk/yarin/blog_3d801aa532c1ce.html • Issues: • (Very) approximate derivation between neural network and Gaussian Process (prior & variation distribution). Does this really work? • Requires hyper-parameter (dropout). Conditioned by training data • Does it really measure similarity? • Accurately reflect uncertainty?

Probabilistic DL Model Soil Moisture Prediction (y) Forcing Input DL Model Aleatoric Uncertainty (σx) ŷ1 Dropout Model 1 Monte Carlo Dropout ŷ2 Dropout Model 2 …… σx ŷn Dropout Model n Combine of aleatoric and epistemic uncertainty: + where is the dropout rate σmc:Std of MCD ensemble

Can error magnitude? • Training • 2015/04 - 2016/04 • Validation • 2016/04 - 2017/04 • hyper-parameter ( is a function of the dropout rate) • Temporal test • 2017/04 - 2018/04 • same pixels as training set

Do those two uncertainties behave as intended?  Does uncertainty respond to aleatoric noise (but Gaussian noise added to SMAP observations: σnoise

Do those two uncertainties behave as claimed?  Does epistemic uncertainty behave like GP? Training Basin First time it was reported that MCD responds to similarity!

What if training data is not representative? Representative training set Biased training set

Conclusions • An automatically estimated error model! (disruptive to UQ?) • We showed that & mostly works as intended. Through spatial autocorrelation (in input attributes), does measure similarity. • When training data is representative, MCD+A is good at predicting error , but requires a hyper-parameter. If training data is biased, will be misled. Often this evaluation was not thorough • Better prior error model helps

Recent enhancement: closed-loop predictions

Also applying LSTM to streamflow predictions

Deep learning models for soil moisture • Separating uncertainties with LSTM models • Perspectives on incubating DL-powered research

Process-based modeling vs machine learning ===== BDML strength ===== • Built from the top-down, directly from observations  accurate • Less biased • Identify things we don’t know? • Highly efficient in computation ==== Limitations ====== • Can’t observe everything! • May be difficult to interpret • May not fully respectphysical laws • Does not understandcausal relationships ===== PBM strength ==== • Built from the bottom-up to observe emergent patterns • We know what we put in • We can do experiments & identify causal relationships ===== Limitations ==== • Human biases • Parameter calibration • What we don’t know? • Errors compound? Synergy? Top down Bottom up

Potential hydrologic DL applications New DL-based method • Challenging-to-model problems • Prepare inputs • Dynamical predictions • Scaling • Parameterization • Measure information content of predictors • …… • Extract knowledge?

How can we incubate progress as a community? • Open competitions as an organizing event • Community organization, shared resources (data, models, infrastructure…)

Interpretive/interrogative • Visualization • Relevance backpropagation • Trained explanation network

Thank you! Email: Chaopeng Shen cshen@engr.psu.edu @ChaopengShen https://github.com/mhpi

Hyper-parameter – dropout rate

How does the prior error model impact uncertainty estimation? Apply Inverse-Gamma distribution as prior to σx: Temporal generation test result

Deep Learning for Soil Moisture Estimation and Uncertainty Analysis