430 likes | 607 Views
Kansrekening en steekproeftheorie. Pieter van Gelder TU Delft IVW-Cursus, 16 September 2003.
E N D
Kansrekening en steekproeftheorie Pieter van Gelder TU Delft IVW-Cursus, 16 September 2003
De basis van de theorie der kansrekening als fundament voor de cursus; Schatten van verdelingsparameters; Steekproef theorie, waarbij zowel met als zonder voor-informatie wordt gewerkt (Bayesiaanse versus Klassieke steekproeven); Afhankelijkheden tussen variabelen en risico's.
Outline • What is a stochastic variable? • Probability distributions • Fast characteristics • Distribution types • Two stochastic variables • Closure
Stochastic variable • Quantity that cannot be predicted exactly (uncertainty): • Natural variation • Shortage of statistical data • Schematizations Examples: • Strength of concrete • Water level above a tunnel • Lifetime of a chisel • Throw of a dice
Relation to events • Express uncertainty in terms of probability • Probability theory related to events • Connect value of variable to event • E.g. probability that stochastic variable X • is less than x • is greater than x • is equal to x • is in the interval [x, x+x] • etc.
Probability distribution • Probability distribution function = probability P(Xx): • FX(x) = P(Xx) 1 stochast 0.8 dummy 0.6 (x) X F 0.4 0.2 0 x
Probability density • Familiar form probability ’distribution’: • This is probability density function
Probability density • Differentiation of F to x: • fX(x) = dFX(x) / dx • f = probability density function • fX(x) dx = P(x < X x+dx)
1 0.8 P(Xx) 0.6 (x) X F 0.4 0.2 0 x 0.5 0.4 P(x < X x+d x) 0.3 fX(x) 0.2 0.1 0 x x+d x
1 0.8 0.6 (x) X F 0.4 P(Xx) 0.2 0 x 0.5 0.4 0.3 fX(x) 0.2 0.1 0 x
0.4 1 0.35 0.9 0.3 0.8 (x) (x) 0.25 0.7 X X p F 0.2 0.6 0.15 0.5 0.1 0.4 0.05 0 0.3 0 1 2 3 4 5 6 0 1 2 3 x 4 5 6 x 1 0.5 0.8 0.4 (x) (x) 0.6 0.3 X X f F 0.4 0.2 0.2 0.1 0 0 -4 -2 0 2 4 6 -4 -2 0 2 4 6 x x Discrete and continuous discrete variable: continuous variable: probability density (cumulative) probability distribution
Fast characteristics 0.5 0.4 (x) 0.3 X f 0.2 sX 0.1 0 -4 -2 0 2 4 6 x mX mX mean, indication of location sX standard deviation, indication for spread
Fast characteristics 0.7 0.6 0.5 fX(x) 0.4 0.3 0.2 sX 0.1 0 0 1 2 3 4 5 mX x Mean location maximum (mode)
Fast characteristics • Mean • (centre of gravity) • Variance • Standard deviation • Coefficient of variation
Normal distribution Normal distributions 1 0.8 (x) 0.6 X sX f 0.4 sX 0.2 0 -4 -2 0 2 4 6 x mX Completely determined by mean and standard deviation
Normal distribution • Probability density function • Standard normally distributed variable • (often denoted by u):
Normal distribution • Why so popular? • Central limit theorem: • Sum of many variables with arbitrary distributions is (almost) normally distributed. • Convenient in structural reliability calculations
Two stochastic variables joint probability density function
2 1.5 y 1 0.5 0 -0.5 -1 -1.5 -2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 x Contour map probability density
Two stochastic variables • Relation to events dh dx
3 2.5 2 kansdichtheid (1/m) 1.5 1 0.5 0 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 lengte (m) Example • Health survey. • Measurements of: • Length • Weight 0.05 0.04 0.03 kansdichtheid (1/kg) 0.02 0.01 0 50 60 70 80 90 100 110 gewicht (kg)
Logical contour map? 110 100 90 80 weight (kg) 70 60 50 1.4 1.6 1.8 2 2.2 length (m)
Dependency 110 100 90 80 weight (kg) 70 60 50 1.4 1.6 1.8 2 2.2 length (m)
Fast characteristics • Location: mX, mYmeans • Spread sX, sYstandard deviation • Dependency • covXY covariance rXY = covXY / sXsYcorrelation, between -1 and 1
Closure of the short Introduction to Stochastics • What is a stochastic variable? • Probability distributions • Fast characteristics • Distribution types • Two stochastic variables
Parameter estimation methods • Given a dataset x1, x2, …, xn • Given a distribution type F(x|A,B,…) • How to estimate the unknown parameters A,B,… to the data?
List of estimation methods • MoM • ML • LS • Bayes
MoM • Distribution moments = Sample moments xnf(x)dx = xin F(x) = 1- exp[-(x-A)/B] AMOM = std(x) BMOM = mean(x) +std(x)
Binomial distribution • X~Bin(N,p) • The binomial distribution gives the discrete probability distribution of obtaining exactly n successes out of NBernoulli trials (where the result of each Bernoulli trial is true with probability p and false with probability q=1-p). The binomial distribution is therefore given by • fX(n) =
MoM-estimator of p • pMOM = xi/ N • for j=1:M, • X=0; • for I=1:N, • if rand(1)<p, x(I)=1; end • end • y(j)=sum(x) • end • for j=1:M, • pMOM(j)=y(j)/N; • end • hist(pMOM)
Case Study • Webtraffic statistics • The number of pageviews on websites
Statistics on Usage of Screen sizes • Is it necessary to download from every user his/her screen size? • Is it sufficient to inspect the screen size of just N users, and still have a reliable percentage of the used screen sizes?
Assume 41% of the complete population uses size 1024x768 • Inspection population size N = 100, 1000, …and simulate the results by generating the usage from a Binomial distribution. • Theoretical analysis: Cov=sqrt(1/p - 1)N-1/2
P N 100 1000 10 000 106 41.4% 11.75% 3.7% 1.2% 0.1% 39.8% 12.3% 3.9% 1.3% 0.1% 6.2% 38.9% 12.3% 3.9% 0.4% 5.4% 41.8% 13.2% 4.2% 0.4% 3.2% 55.0% 17.4% 5.5% 0.55% Coefficient of variations (as a function of p and N)
Optimisation of the inspection sample size • Assume the costs of getting screen size information from a user is A • Assume the costs of having a larger cov-value is B • TC(N) = A.N + B.sqrt(1/p - 1)N-1/2 • The optimal sample size follows from TC’(N) = 0, giving N* = B/2A.(1/p - 1)-2/3 • For this choice of N, the cov = (2A/B.(1/p – 1))1/3
Case study container inspectie • Toelaatbare ‘ontglip kans’ p = 1/1.000 containers • Populatie bestaat uit 100.000 containers • Inspectie bestaat uit controle van 1.000 containers • Stel dat 1 container uit deze steekproef wordt afgekeurd • Dan is pMOM=0.001, maar std(pMoM)=0.0316 • Als std(pMoM)<0.001, dan inspectie van volledige populatie (immers std(pMoM)=sqrt(pq)sqrt(1/N))
Inspectie volledige populatie (bij kleine p-waarden) • Inspectiekosten moeten zich terugverdienen uit de boete-opbrengsten • Inspectiekosten: 100.000 x K/C • Opbrengst zonder inspectie: NI (Negative Impact) • Opbrengst met inspectie: p x 100.000 x boete – 100.000 x K/C • p x 100.000 x boete – 100.000 x K/C > NI
Bayesian statistics • P(A|B)=P(A and B)/P(B) • P(A|B)=P(B|A)P(A)/P(B) • A = parameters • B = data