1 / 14

Project Plan Task 8 and VERSUS2 Installation problems

Project Plan Task 8 and VERSUS2 Installation problems. Anatoly Myravyev and Anastasia Bundel , Hydrometcenter of Russia March 2010. Task 8: Statistical features like confidence intervals and the Bootstrap method. Formal definition of confidence intervals (CIs):.

Download Presentation

Project Plan Task 8 and VERSUS2 Installation problems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Project Plan Task 8 and VERSUS2 Installation problems Anatoly Myravyev and Anastasia Bundel, Hydrometcenter of Russia March 2010

  2. Task 8: Statistical features like confidence intervals and the Bootstrap method

  3. Formal definition of confidence intervals (CIs): • Estimation of an unknown value  defines a distribution Рcorresponding to a random sample X from the population ={Р}. • If for a given α>0 there exist random variables  =  (α, Х) such that P(– <  < +)  1– α, then the interval (– , +) is called the confidence interval for  of level 1– α. • The random interval contains the unknown value , which is not random.

  4. The statistical problem lies in the construction of CIs • Cases with known probability distribution function of the population: parametric CIs • Cases where the pdf is not known: non-parametric CIs

  5. Parametric CIs • Normal distribution assumption is most frequent. The underlying sample must be an iid-sample (independent and identically distributed). • Pluses: • Easy and not computer-intensive • Minuses: • Cannot be used for scores with non-normal distributions without some normalization (proportions, odds ratio, correlation coefficients, …), or require complicated calculation formulas

  6. Non-parametric CIs • Construction of artificial datasets from a given collection of real data by resampling the observations. • Pluses: • Highly adaptable to different testing situations because no assumptions regarding an underlying theoretical distribution of data are required • Computational ease • Minuses: • The assumptions for sample statistics must not be overlooked: representativeness, iid

  7. Bootstrapping • Operates by constructing the artificial data using sampling with replacement from the original data (Efron 1979, Wassermann 2006) • Highly elaborated computational technique (R-project) • The most common and popular resampling method in verification (Wilks 1995)

  8. Different bootstrap methods – how to construct CIs from the samples obtained • Percentile CIs • Bias-corrected Cis (BSa) • Normal approximation CIs • Basic bootstrap CIs • Bootstrap-t CIs • Approximated bootstrap CIs (ABC), • etc. A compromise between their accuracy and computational burden must be made. used at present in MET Package

  9. Implementation of CIs using R packageboot • Boot is one of the required packages for R verification package • The intention is to introduce commands analogous to the MySQL v_index table in a form like • index_booted<-boot(index(fcs,obs), 1000) • index_ci<-(index_booted, conf=c(0.95, 0.99), type=c(“perc, ”bca”)

  10. Conclusions • The accuracy of statistical scores depends among other things on the following: • Sampling uncertainty • Validity of assumptions about representativeness and iid of the sample • Observational uncertainty • Uncertainty in the physical processes (Gilleland, 2008) • Different α can be used (e.g. CIs of level 0.95, 0.99, even 0.70, etc) depending on the scope of analysis Bayesian prediction intervals?

  11. Conclusions (2) • In view of ambiguities about a “most precise” method for the CI construction, we should try several procedures on real frc and obs data available. Both parametric and non-parametric statistics are rightful (MET experience!) • The decision making (what is good, what is bad) should be performed on the multi-criteria basis

  12. Problems with VERSUS2 functioning In the Hydrometcenter of Russia

  13. Problems with VERSUS2 functioning • Installation is done in the RedHat environment without errors • The new data leave traces in the MySQL tables and the test (Pirmin-) files are acquired • However, the data information gets lost in the vicinity of the Data Availability tab (Model? Date Intervals?...) • A tutorial variant for the package is urgently needed with valid obs and frc data

  14. Thank you for your attention!

More Related