1 / 43

Kansrekening en steekproeftheorie

Kansrekening en steekproeftheorie. Pieter van Gelder TU Delft IVW-Cursus, 16 September 2003.

oliana
Download Presentation

Kansrekening en steekproeftheorie

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Kansrekening en steekproeftheorie Pieter van Gelder TU Delft IVW-Cursus, 16 September 2003

  2. De basis van de theorie der kansrekening als fundament voor de cursus; Schatten van verdelingsparameters; Steekproef theorie, waarbij zowel met als zonder voor-informatie wordt gewerkt (Bayesiaanse versus Klassieke steekproeven); Afhankelijkheden tussen variabelen en risico's.

  3. Inspection in Civil Engineering

  4. Stochastic variables

  5. Outline • What is a stochastic variable? • Probability distributions • Fast characteristics • Distribution types • Two stochastic variables • Closure

  6. Stochastic variable • Quantity that cannot be predicted exactly (uncertainty): • Natural variation • Shortage of statistical data • Schematizations Examples: • Strength of concrete • Water level above a tunnel • Lifetime of a chisel • Throw of a dice

  7. Relation to events • Express uncertainty in terms of probability • Probability theory related to events • Connect value of variable to event • E.g. probability that stochastic variable X • is less than x • is greater than x • is equal to x • is in the interval [x, x+x] • etc.

  8. Probability distribution • Probability distribution function = probability P(Xx): • FX(x) = P(Xx) 1 stochast 0.8 dummy 0.6 (x) X F 0.4 0.2 0 x

  9. Probability density • Familiar form probability ’distribution’: • This is probability density function

  10. Probability density • Differentiation of F to x: • fX(x) = dFX(x) / dx • f = probability density function • fX(x) dx = P(x < X x+dx)

  11. 1 0.8 P(Xx) 0.6 (x) X F 0.4 0.2 0 x 0.5 0.4 P(x < X x+d x) 0.3 fX(x) 0.2 0.1 0 x x+d x

  12. 1 0.8 0.6 (x) X F 0.4 P(Xx) 0.2 0 x 0.5 0.4 0.3 fX(x) 0.2 0.1 0 x

  13. 0.4 1 0.35 0.9 0.3 0.8 (x) (x) 0.25 0.7 X X p F 0.2 0.6 0.15 0.5 0.1 0.4 0.05 0 0.3 0 1 2 3 4 5 6 0 1 2 3 x 4 5 6 x 1 0.5 0.8 0.4 (x) (x) 0.6 0.3 X X f F 0.4 0.2 0.2 0.1 0 0 -4 -2 0 2 4 6 -4 -2 0 2 4 6 x x Discrete and continuous discrete variable: continuous variable: probability density (cumulative) probability distribution

  14. Fast characteristics 0.5 0.4 (x) 0.3 X f 0.2 sX 0.1 0 -4 -2 0 2 4 6 x mX mX mean, indication of location sX standard deviation, indication for spread

  15. Fast characteristics 0.7 0.6 0.5 fX(x) 0.4 0.3 0.2 sX 0.1 0 0 1 2 3 4 5 mX x Mean location maximum (mode)

  16. Fast characteristics • Mean • (centre of gravity) • Variance • Standard deviation • Coefficient of variation

  17. Normal distribution Normal distributions 1 0.8 (x) 0.6 X sX f 0.4 sX 0.2 0 -4 -2 0 2 4 6 x mX Completely determined by mean and standard deviation

  18. Normal distribution • Probability density function • Standard normally distributed variable • (often denoted by u):

  19. Normal distribution • Why so popular? • Central limit theorem: • Sum of many variables with arbitrary distributions is (almost) normally distributed. • Convenient in structural reliability calculations

  20. Two stochastic variables joint probability density function

  21. 2 1.5 y 1 0.5 0 -0.5 -1 -1.5 -2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 x Contour map probability density

  22. Two stochastic variables • Relation to events dh dx

  23. 3 2.5 2 kansdichtheid (1/m) 1.5 1 0.5 0 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 lengte (m) Example • Health survey. • Measurements of: • Length • Weight 0.05 0.04 0.03 kansdichtheid (1/kg) 0.02 0.01 0 50 60 70 80 90 100 110 gewicht (kg)

  24. Logical contour map? 110 100 90 80 weight (kg) 70 60 50 1.4 1.6 1.8 2 2.2 length (m)

  25. Dependency 110 100 90 80 weight (kg) 70 60 50 1.4 1.6 1.8 2 2.2 length (m)

  26. Fast characteristics • Location: mX, mYmeans • Spread sX, sYstandard deviation • Dependency • covXY covariance rXY = covXY / sXsYcorrelation, between -1 and 1

  27. Independent variables

  28. Closure of the short Introduction to Stochastics • What is a stochastic variable? • Probability distributions • Fast characteristics • Distribution types • Two stochastic variables

  29. Parameter estimation methods • Given a dataset x1, x2, …, xn • Given a distribution type F(x|A,B,…) • How to estimate the unknown parameters A,B,… to the data?

  30. List of estimation methods • MoM • ML • LS • Bayes

  31. MoM • Distribution moments = Sample moments xnf(x)dx = xin F(x) = 1- exp[-(x-A)/B] AMOM = std(x) BMOM = mean(x) +std(x)

  32. Binomial distribution • X~Bin(N,p) • The binomial distribution gives the discrete probability distribution of obtaining exactly n successes out of NBernoulli trials (where the result of each Bernoulli trial is true with probability p and false with probability q=1-p). The binomial distribution is therefore given by • fX(n) =

  33. E(X) = Np; var(X)=Npq

  34. MoM-estimator of p • pMOM = xi/ N • for j=1:M, • X=0; • for I=1:N, • if rand(1)<p, x(I)=1; end • end • y(j)=sum(x) • end • for j=1:M, • pMOM(j)=y(j)/N; • end • hist(pMOM)

  35. Case Study • Webtraffic statistics • The number of pageviews on websites

  36. Statistics on Usage of Screen sizes • Is it necessary to download from every user his/her screen size? • Is it sufficient to inspect the screen size of just N users, and still have a reliable percentage of the used screen sizes?

  37. Assume 41% of the complete population uses size 1024x768 • Inspection population size N = 100, 1000, …and simulate the results by generating the usage from a Binomial distribution. • Theoretical analysis: Cov=sqrt(1/p - 1)N-1/2

  38. P N 100 1000 10 000 106 41.4% 11.75% 3.7% 1.2% 0.1% 39.8% 12.3% 3.9% 1.3% 0.1% 6.2% 38.9% 12.3% 3.9% 0.4% 5.4% 41.8% 13.2% 4.2% 0.4% 3.2% 55.0% 17.4% 5.5% 0.55% Coefficient of variations (as a function of p and N)

  39. Optimisation of the inspection sample size • Assume the costs of getting screen size information from a user is A • Assume the costs of having a larger cov-value is B • TC(N) = A.N + B.sqrt(1/p - 1)N-1/2 • The optimal sample size follows from TC’(N) = 0, giving N* = B/2A.(1/p - 1)-2/3 • For this choice of N, the cov = (2A/B.(1/p – 1))1/3

  40. Case study container inspectie • Toelaatbare ‘ontglip kans’ p = 1/1.000 containers • Populatie bestaat uit 100.000 containers • Inspectie bestaat uit controle van 1.000 containers • Stel dat 1 container uit deze steekproef wordt afgekeurd • Dan is pMOM=0.001, maar std(pMoM)=0.0316 • Als std(pMoM)<0.001, dan inspectie van volledige populatie (immers std(pMoM)=sqrt(pq)sqrt(1/N))

  41. Inspectie volledige populatie (bij kleine p-waarden) • Inspectiekosten moeten zich terugverdienen uit de boete-opbrengsten • Inspectiekosten: 100.000 x K/C • Opbrengst zonder inspectie: NI (Negative Impact) • Opbrengst met inspectie: p x 100.000 x boete – 100.000 x K/C • p x 100.000 x boete – 100.000 x K/C > NI

  42. Bayesian statistics • P(A|B)=P(A and B)/P(B) • P(A|B)=P(B|A)P(A)/P(B) • A = parameters • B = data

More Related