1 / 30

Quantifying Monte Carlo Uncertainty in the Ensemble Kalman Filter

Quantifying Monte Carlo Uncertainty in the Ensemble Kalman Filter. Kristian Thulin* (CIPR), Geir Nævdal (IRIS) Hans Julius Skaug (UiB, Dpt. Math.) and Sigurd Ivar Aanonsen (CIPR) EnKF Workshop, 18 - 20 June 2008 Park Hotel, Voss, Norway. Example: Synthetic 2D case.

susan
Download Presentation

Quantifying Monte Carlo Uncertainty in the Ensemble Kalman Filter

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Quantifying Monte Carlo Uncertainty in the Ensemble Kalman Filter Kristian Thulin* (CIPR), Geir Nævdal (IRIS) Hans Julius Skaug (UiB, Dpt. Math.) and Sigurd Ivar Aanonsen (CIPR) EnKF Workshop, 18 - 20 June 2008 Park Hotel, Voss, Norway

  2. Example: Synthetic 2D case • Reservoir model is simplified to a parabolic equation for single-phase • p – dynamic variable • κ – static unknown parameter • g – five spot sink/source term

  3. Example: Synthetic 2D case • Square triangularized grid 665 nodes (p) 1248 triangles (K) • Red crosses – sources • Black crosses – sinks

  4. Ensemble Kalman filter (EnKF) • Estimates the posterior probability density function (PDF/CDF) • Statistics estimated from a finite ensemble of realization • Solution dependent on sampling of initial ensemble (and data perturbations)

  5. Motivation:Inconsistent posterior CDFs • Have noticed that different initial ensembles resulted in visually very different CDFs • 10 ensembles with 100 members each Initial Updated

  6. Motivation • Posterior CDF only estimated for non-linear problem • Would like to sample from the same distribution for each repeated EnKF • Using different initial ensemble (e.g. same prior, different seed)

  7. Kolmogorov-Smirnov test • K-S test for two samples • Null-hypothesis: • The two samples are from the same underlying distribution • Test the runs pair-wise • Bonferroni correction • Null-hypotheses: all samples are from the same underlying distribution

  8. Kolmogorov-Smirnov test Static Variable • K-S test for all variables • After second assimilation step • Points with p-value below critical value are blank

  9. Kolmogorov-Smirnov test • This confirms what we have seen visually • Ensemble members gets positively correlated during update • Lorentzen et al. (2005 at SPE ATCE) pointed out that forecasts from PUNQ-S3 study did not fulfil the K-S test for 100 members

  10. Motivation – ensemble size • Typically 100 ensemble members are used • Good history matches and mean estimates reported using 30-40 ensemble members • To have good estimates of the uncertainty much larger ensemble size is needed

  11. New proposed methodology:Multiple runs • Propose running multiple EnKF runs (m), each with fewer ensemble members (n) • Keeping (n x m) fixed • Gain more independent information • Members from different runs will be independent samples from the distribution we are seeking • Can construct a confidence interval on the estimated CDF

  12. Confidence interval • Pink background: • Span of CDFs from the m runs • Solid blue line: • Mean over the runs • Dashed blue lines: • Confidence interval on your estimated CDF! • Black line: • Infinite ensemble run (10.000 members)

  13. Optimal combination of ensemble size (n) and number of EnKF runs (m) • Keeping their product (n*m) fixed, what is the optimum combination of ensemble size and number of runs? • Too few members will give a biased result (no history match) • Too few runs will give a very large uncertainty in the estimated CDF

  14. Mean Square Error • Find the optimum combination by minimizing the Mean Square Error MSE = Variance + Bias2 • Variance and Bias calculated for each value on x-axis, and then integrated • An t-distribution standard error is used for the variance to account for the large uncertainty with very few runs

  15. Example • Focus on one selected point in space (0,0) • One dynamic and one static parameter • Look at the posterior CDF after two data assimilation steps at this point

  16. Example 1 – 1000 members • Given a total of 1000 ensemble members • Different combinations of m and n • Mean of the m runs gives the final estimate • Compare with an “infinite” ensemble run

  17. Example (static variable) 200 x 5 members 50 x 20 members 10 x 100 members 3 x 333 members Black line: infinite run Blue lines: 95% confidence interval

  18. Example - Bias • Means over a large number of runs • The trend of the bias is clear • Runs with 5 and 10 members does not give satisfying results • Similar for the dynamic variable

  19. Example • Similar behaviour for both variables • Too few ensemble members gives a biased estimate • Too few runs gives a very large variance • Want optimize n (or m) given n x m

  20. Example (MSE) • Calculate the bias and variance along the x-axis • Integrate over x to obtain a single number for each combination of • m and n • Want to minimize the integrated MSE Static variable

  21. Example • IMSE has a clear minimum • Interval or point • As long as n > 50-100 and m > 3-4 we have a satisfying combination in this example • Difficult to conclude on a specific minimum without more information

  22. Example 2 – 200 members • Given a total of 200 ensemble members • Different combinations of m and n

  23. Example (dynamic variable) 40 x 5 members 20 x 10 members 5 x 40 members 3 x 66 members Black line: infinite run Blue lines: 95% confidence interval

  24. Example (MSE) • Calculate the bias and variance along the x-axis • Integrate over x to obtain a single number for each combination of • m and n • Want to minimize the integrated MSE Static variable

  25. Example • Similar results as in the 1000 members case • As long as n > approx. 20 and m > approx. 3 we have a satisfying result for this example

  26. MSE plots 200 members 1000 members dynamic static

  27. Summary and Conclusions • Observed inconsistencies in estimated posterior CDFs • Running with different seed in the initial sampling • Used Kolmogorov-Smirnov test to verify that most updated variables become positively correlated

  28. Summary and Conclusions • Suggest running multiple EnKF runs, each with fewer ensemble members • Obtain more independent information • Gain information about the size of the Monte Carlo error in the final estimate • Allows for a point-wise confidence interval on the CDF • Present a methodology for finding an optimal combination of the number of runs (m) and ensemble members (n) • Keeping n*m (resources) fixed

  29. Summary and Conclusions • Methodology tested with a simplified single-phase flow example in 2D • Final estimated CDF from multiple runs give equally good or better results than with one single run • For our examples • Found a interval minimum • Difficult to conclude any further for a minimum • More than 3-4 runs are required (in general?) • MSE for 1000 members was ≈ 1/10 of the MSE for 200 members

  30. Summary and Conclusions • It might be better to run multiple runs with fewer ensemble members. • Should to be able to run at least around 4 runs with “large enough” ensemble size to have an effect • General guidelines for optimum can be made from experience with synthetic problems

More Related