140 likes | 248 Views
Project Plan Task 8 and VERSUS2 Installation problems. Anatoly Myravyev and Anastasia Bundel , Hydrometcenter of Russia March 2010. Task 8: Statistical features like confidence intervals and the Bootstrap method. Formal definition of confidence intervals (CIs):.
E N D
Project Plan Task 8 and VERSUS2 Installation problems Anatoly Myravyev and Anastasia Bundel, Hydrometcenter of Russia March 2010
Task 8: Statistical features like confidence intervals and the Bootstrap method
Formal definition of confidence intervals (CIs): • Estimation of an unknown value defines a distribution Рcorresponding to a random sample X from the population ={Р}. • If for a given α>0 there exist random variables = (α, Х) such that P(– < < +) 1– α, then the interval (– , +) is called the confidence interval for of level 1– α. • The random interval contains the unknown value , which is not random.
The statistical problem lies in the construction of CIs • Cases with known probability distribution function of the population: parametric CIs • Cases where the pdf is not known: non-parametric CIs
Parametric CIs • Normal distribution assumption is most frequent. The underlying sample must be an iid-sample (independent and identically distributed). • Pluses: • Easy and not computer-intensive • Minuses: • Cannot be used for scores with non-normal distributions without some normalization (proportions, odds ratio, correlation coefficients, …), or require complicated calculation formulas
Non-parametric CIs • Construction of artificial datasets from a given collection of real data by resampling the observations. • Pluses: • Highly adaptable to different testing situations because no assumptions regarding an underlying theoretical distribution of data are required • Computational ease • Minuses: • The assumptions for sample statistics must not be overlooked: representativeness, iid
Bootstrapping • Operates by constructing the artificial data using sampling with replacement from the original data (Efron 1979, Wassermann 2006) • Highly elaborated computational technique (R-project) • The most common and popular resampling method in verification (Wilks 1995)
Different bootstrap methods – how to construct CIs from the samples obtained • Percentile CIs • Bias-corrected Cis (BSa) • Normal approximation CIs • Basic bootstrap CIs • Bootstrap-t CIs • Approximated bootstrap CIs (ABC), • etc. A compromise between their accuracy and computational burden must be made. used at present in MET Package
Implementation of CIs using R packageboot • Boot is one of the required packages for R verification package • The intention is to introduce commands analogous to the MySQL v_index table in a form like • index_booted<-boot(index(fcs,obs), 1000) • index_ci<-(index_booted, conf=c(0.95, 0.99), type=c(“perc, ”bca”)
Conclusions • The accuracy of statistical scores depends among other things on the following: • Sampling uncertainty • Validity of assumptions about representativeness and iid of the sample • Observational uncertainty • Uncertainty in the physical processes (Gilleland, 2008) • Different α can be used (e.g. CIs of level 0.95, 0.99, even 0.70, etc) depending on the scope of analysis Bayesian prediction intervals?
Conclusions (2) • In view of ambiguities about a “most precise” method for the CI construction, we should try several procedures on real frc and obs data available. Both parametric and non-parametric statistics are rightful (MET experience!) • The decision making (what is good, what is bad) should be performed on the multi-criteria basis
Problems with VERSUS2 functioning In the Hydrometcenter of Russia
Problems with VERSUS2 functioning • Installation is done in the RedHat environment without errors • The new data leave traces in the MySQL tables and the test (Pirmin-) files are acquired • However, the data information gets lost in the vicinity of the Data Availability tab (Model? Date Intervals?...) • A tutorial variant for the package is urgently needed with valid obs and frc data