250 likes | 277 Views
Analysis of establishing credible probability with limited data, using bootstrap procedure for reliability assessment.
E N D
Confidence in the Range of Variability H.J. Pradlwarter and G.I. Schuëller mechanik@uibk.ac.at Institute of Engineering Mechanics Leopold-Franzens University Innsbruck, Austria, EU
Problem definition Suppose, only few measured values of an uncertain quantity are available: Is it under such circumstance possible to establish a credible probability distribution for the reliability assessment? - Without any strong assumptions, certainly not ! - There is an infinite number of options ! - Some physical background information is needed to proceed any further.
Problem definition (cont.) Among the infinite set of options, the choice should reflect the needs of the analysis Some options: The uncertainty due to the insufficient amount of data points should be considered For estimating the performance (safety assessment) confidence in the estimates will be crucial.
Problem definition (cont.) Only few measured values of an uncertain quantity are available: Is it under such circumstance possible to establish a credible probability distribution for the reliability assessment? Yes if: - Some physical background information can be safely assumed - We are not looking for the best estimate (e.g. a Bayesian approach) but for an conservative PDF estimate for a required confidence level.
Overview • Bootstrap procedure • Statistical results, e.g. confidence intervals • Probability of observation • Probability of lying outside the observed domain • Probability density estimation • Extended bootstrap procédure • Marginal distributions for calibration data • Joint distributions • Results • Conclusions
Bootstrap procedure • Bootstrap procedure: • Modern, computer-intensive, general purpose approach to statistical inference. • Approach to compute properties of an estimator (e.g. variance, confidence intervals, correlations). • Advantage: Straightforward also for complex estimators and complex distributions. • Disadvantage: Tendency to be too optimistic for small sample sizes.
Bootstrap procedure • Bootstrap procedure (cont.): • Given the data set • Resampling: Generate artificially a large number N of sets from data by random sampling • Determine for each of the N sets the estimator (e.g. mean, variance, etc.) , establish the histogram and derive confidence intervals: Estimator (e.g.variance)
Bootstrap procedure • Bootstrap procedure (cont.): • Resampling corresponds to sampling from the discrete probability distribution • The inference is only justified in case the sample represent the underlying unknown distribution well. • The method is not reliable if only very few data are availble, i.e. in case n is small. • The case n < 30 will be investigated in the following:
Probability mass outside the observation range N > 1 data points specify the observed range • Define interval [a,b] • Assume independent data points • Suggestion: Interpret qN as level of significance confidence level = 1-
Probability mass outside the observation range Probability
Probability density • Density outside the observed domain Until now we just have an estimate for the probability, not the density! Almost everthing is possible without any physical background information • Reasonable (physical) assumptions: • The density is high in the neighbourhood of any observations • The density decreases with its distance from observations • The density has a single domain with PDF(x)>0
Proposed PDF • Extended bootstrap distribution: • Replace the underlying discrete bootstrap probability distribution • by continuous Gaussian kernel density functions
Proposed PDF Kernel densities: N gaussian densitities centered at the data points Justification: + each data point has equal weight and provides identical information + each data point has the same variability + the probability of occurrence decreases with the distance is used to specify the standard deviation
Application to data • Calibration experiments: • Young's modulus • elongation • Three data sets Notation: inverse Young's modulus average inverse Young's modulus over the length
Application to calibration data Inverse Young's modulus Exceptionally large dispersion for Nc=5 when compared with Nc=30. The distribution is function of the amount of data points Nc and the required confidence level 1-a.
Application to calibration data Average inverse Young's modulus Exceptionally small dispersion for Nc=5 when compared with Nc=30. The distribution is function of the amount of data points Nc and the required confidence level 1-a.
Application to calibration data Joint distribution as function of Nc and significance level
Application to calibration data Joint distribution as function of Nc and significance level
Application to calibration data Joint distribution as function of Nc and significance level
Random field calibration • Random field model • simple piecewise linear • i.i.d inverse Young's moduli • distribution of (maximum entropy principle) derives from mechanics
Random field calibration • Fitting of The average correlation length D is selected such that it fits the joint distribution best. • Simple Monte Carlo search
Application: Static Challenge problem • Prediction of Exceedance probability • Young's modulus in all four bars modelled as random field • Challenge: Estimation of exceedance probability
Application: Static Challenge problem • Prediction of Exceedance probability • Consistent results • Severe underestimation without introducing a low level of
Summary and Conclusion • The spread of the assumed probability distribution is a function of the number of data points and the required confidence level. • The introduction of confidence level provides a suitable safeguard against a severe underestimation of the variability of the parameters derived from a small data set. • Consistent results can be obtained although the small data set might be misleading.
Acknowledgment This research is partially supported by the European Commission under contract# RTN505164 (MADUSE)