1 / 69

Modelling uncertainty

Modelling uncertainty. Probability of an event. Classical method : If an experiment has n possible outcomes assign a probability of 1/ n to each experimental outcome. Relative frequency method : Probability is the relative frequency of the number of events satisfying the constraints.

Download Presentation

Modelling uncertainty

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modelling uncertainty Dr hab. inż. Joanna Józefowska, prof. PP

  2. Probability of an event • Classical method: • If an experiment has n possible outcomes assign a probability of 1/n to each experimental outcome. • Relative frequency method: • Probability is the relative frequency of the number of events satisfying the constraints. • Subjective method: • Probability isa number characterising the likelihood of an event – degree of belief Dr hab. inż. Joanna Józefowska, prof. PP

  3. Axioms of the probability theory Axiom IThe probability value assigned to each experimental outcome must be between 0 and 1. AxiomIIThe sum of all the experimental outcome probabilities must be 1. Dr hab. inż. Joanna Józefowska, prof. PP

  4. Conditional probability denoted by P(A|B) expresses belief that event A is true assuming that event B is true (events A and B are dependent) Definition Let the probability of event B be positive. Conditional probability of event A under condition B is calculated as follows: Dr hab. inż. Joanna Józefowska, prof. PP

  5. Joint probability If events A1, A2,... Are mutuallyexclusive and cover the sample space , and P(Ai) > 0 for i = 1, 2,... then for any event B the following equality holds: Dr hab. inż. Joanna Józefowska, prof. PP

  6. Bayes’ Theorem Thomas Bayes (1701-1761) If the events A1, A2,... fulfil the assumptions of the joint probability theorem, and P(B) > 0, then for i =1, 2,... The following equality holds: Dr hab. inż. Joanna Józefowska, prof. PP

  7. Bayes’ Theorem Prior probabilities New information Bayes’ theorem Posterior probabilities Let us denote: H – hipothesis E – evidence The Bayes’ rule has the form: Dr hab. inż. Joanna Józefowska, prof. PP

  8. Difficulties with joint probability distribution (tabular approach) • the joint probability distribution has to be defined and stored in memory • high computational effort required to calculate marginal and conditional probabilities Dr hab. inż. Joanna Józefowska, prof. PP

  9. n sample points 2n probabilities P(B,M) Dr hab. inż. Joanna Józefowska, prof. PP

  10. Certainty factor • Buchanan, Shortliffe 1975 • Model developed for the rule expert system MYCIN If E then H hipothesis evidence (observation) Dr hab. inż. Joanna Józefowska, prof. PP

  11. Belief • MB[H, E] – measure of the increase of belief that H is true based on observation E. Dr hab. inż. Joanna Józefowska, prof. PP

  12. Disbelief • MD[H, E] – measure of the increase of disbelief that H is true based on observation E. Dr hab. inż. Joanna Józefowska, prof. PP

  13. Certainty factor CF  [–1, 1] Dr hab. inż. Joanna Józefowska, prof. PP

  14. Interpretation of the certainty factor Certainty factor is associated with a rule: If evidence thenhipothesis and denotes the change in belief that H is true after observation E. CF(H, E) E H Dr hab. inż. Joanna Józefowska, prof. PP

  15. CF(H, E1) E1 CF(H, E1&E2) H H E1, E2 E2 CF(H, E2) Uncertainty propagation Parallel rules Dr hab. inż. Joanna Józefowska, prof. PP

  16. CF(H, E2) CF(E2, E1) E1 E2 H CF(H, E1) H E1 Uncertainty propagation Serial rules If CF(H,E2) is not defined, it is assumed to be 0. Dr hab. inż. Joanna Józefowska, prof. PP

  17. Certainty factor – probabilistic definition Heckerman 1986 Dr hab. inż. Joanna Józefowska, prof. PP

  18. Certainty measure Grzymała-Busse 1991 C(H) C(E) CF(H, E) E H Dr hab. inż. Joanna Józefowska, prof. PP

  19. C(s1) = 0,2 s1 CF(h, s1  s2) = 0,4 h s2 C(h) = 0,3 C(s2) = – 0,1 Example 1 C(s1  s2) = min(0,2; – 0,1) = – 0,1 CF’(h, s1  s2) = 0,4 * 0 = 0 C’(h) = 0,3 + (1– 0,3) * 0 = 0,3 + 0 = 0,3 Dr hab. inż. Joanna Józefowska, prof. PP

  20. C(s1) = 0,2 s1 CF(h, s1  s2) = 0,4 h s2 C(h) = 0,3 C(s2) = 0,8 Example 2 C(s1  s2) = min(0,2; 0,8) = 0,2 CF’(h, s1  s2) = 0,4 * 0,2 = 0,08 C’(h) = 0,3 + (1– 0,3) * 0,08 = 0,3 + 0,7 * 0,08 = 0,356 Dr hab. inż. Joanna Józefowska, prof. PP

  21. Dempster-Shafer theory Each hipothesis is characterised by two values: balief and plausibility. It models not only belief, but also the amount of acquired information. Dr hab. inż. Joanna Józefowska, prof. PP

  22. Density probability function Dr hab. inż. Joanna Józefowska, prof. PP

  23. Belief Belief Bel[0,1] measures the value of acquired information supporting the belief that the considered set hipothesis is true. Dr hab. inż. Joanna Józefowska, prof. PP

  24. Plausibility Plausibility Pl[0,1] measures how much the belief that A is true is limited by evidence supporting A. Dr hab. inż. Joanna Józefowska, prof. PP

  25. Combining various sources of evidence Assume two sources of evidence: XandYrepresented by respective subsets of : X1,...,Xmand Y1,...,Yn. Probability density functions m1and m2 are defined on Xand Yrespectively. Combining observations from two sources a new value m3(Z) is calculated for each subset of as follows: Dr hab. inż. Joanna Józefowska, prof. PP

  26. A – allergy F – flu C – cold P - pneumonia Example m1() = 1  ={A, F, C, P} Observation 1 m2({A, F, C}) = 0,6 m2() = 0,4 m2({A, F, C}) = 0,6 m2() = 0,4 m1() = 1 m3({A, F, C}) = 0,6 m3() = 0,4 Dr hab. inż. Joanna Józefowska, prof. PP

  27. Example m3({A, F, C}) = 0,6 m3() = 0,4 Observation 2 m4({F,C,P}) = 0,8 m4() = 0,2 m4({F,C,P}) = 0,8 m4() = 0,2 m3({A,F,C}) = 0,6 m5({F,C}) = 0,48 m5({A,F,C}) = 0,12 m3() = 0,4 m5({F,C,P}) = 0,32 m5() = 0,08 Dr hab. inż. Joanna Józefowska, prof. PP

  28. m7({A}) = 0,15 Example m5({F,C}) = 0,48 m5({A,F,C}) = 0,12 m5({F,C,P}) = 0,32 m5() = 0,08 Observation 3 m6({A}) = 0,75 m6() = 0,25 m6({A}) = 0,75 m6() = 0,25 m7() = 0,36 m7({F,C}) = 0,12 m5({F,C}) = 0,48 m7({A}) = 0,09 m7({A,F,C}) = 0,03 m5({A,F,C}) = 0,12 m7() = 0,24 m7({F,C,P}) = 0,08 m5({F,C,P}) = 0,32 m7({A}) = 0,06 m7() = 0,02 m5() = 0,08 Dr hab. inż. Joanna Józefowska, prof. PP

  29. Example m5({F,C}) = 0,48 m5({A,F,C}) = 0,12 m5({F,C,P}) = 0,32 m5() = 0,08 Observation 3 m6({A}) = 0,75 m6() = 0,25 m7() = 0,6 m6({A}) = 0,75 m6() = 0,25 m7() = 0,36 m7({F,C}) = 0,12 m5({F,C}) = 0,48 m7({A}) = 0,09 m7({A,F,C}) = 0,03 m5({A,F,C}) = 0,12 m7() = 0,24 m7({F,C,P}) = 0,08 m5({F,C,P}) = 0,32 m7({A}) = 0,06 m7() = 0,02 m5() = 0,08 Dr hab. inż. Joanna Józefowska, prof. PP

  30. Example m7({A}) = 0,375 m7({A}) = 0,15 m7({F,C}) = 0,3 m7({F,C}) = 0,12 m7({A,F,C}) = 0,075 m7({A,F,C}) = 0,03 m7({F,C,P}) = 0,2 m7({F,C,P}) = 0,08 m7() = 0,05 m7() = 0,02 1 – 0,3 – 0,2 {A}: [0,375, 0,500] {F}: [0, 0,625] {C}: [0, 0,625] {P}: [0, 0,250] 1 – 0,375 1 – 0,375 – 0,3 – 0,075 Dr hab. inż. Joanna Józefowska, prof. PP

  31. Fuzzy sets (Zadeh) Rough sets (Pawlak) Dr hab. inż. Joanna Józefowska, prof. PP

  32. earthquake Mary alarm John burglary Probabilistic reasoning Dr hab. inż. Joanna Józefowska, prof. PP

  33. Probabilistic reasoning B – burglary E – earthquake A – alarm J – John calls M – Mary calls ? Joint probability distribution – P(B,E,A,J,M) Dr hab. inż. Joanna Józefowska, prof. PP

  34. Joint probability distribution Dr hab. inż. Joanna Józefowska, prof. PP

  35. Probabilistic reasoning What is the probability of a burglary if Mary called? P(B=y|M=y) ? Marginal probability: Conditional probability: Dr hab. inż. Joanna Józefowska, prof. PP

  36. Advantages of probabilistic reasoning • Sound mathematical theory • On the basis of the joint probability distribution one can reason about: • the reasons on the basis of the observed consequences, • consequences on the basis of given evidence, • Any combination of the above ones. • Clear semantics based on the interpretation of probability. • Model can be taught with statistical data. Dr hab. inż. Joanna Józefowska, prof. PP

  37. Complexity of probabilistic reasoning • in the „alarm” example • (25 – 1) = 31 values, • direct acces to unimportant information, e.g. P(B=1,E=1,A=1,J=1,M=1) • calculating any practical value, e.g. P(B=1|M=1) requires 29 elementary operations. • in general • P(X1, ..., Xn) requires storing 2n-1 values • difficult knowledge acquisition (not natural) • exponential complexity Dr hab. inż. Joanna Józefowska, prof. PP

  38. Bayes’ theorem Dr hab. inż. Joanna Józefowska, prof. PP

  39. B A Bayes’ theorem B depends on A P(B|A) Dr hab. inż. Joanna Józefowska, prof. PP

  40. The chain rule P(X1,X2) = P(X1)P(X2|X1) P(X1,X2,X3) = P(X1)P(X2|X1)P(X3|X1,X2) ................................................................ P(X1,X2,...,Xn) = P(X1)P(X2|X1)...P(Xn|X1,...,Xn-1) Dr hab. inż. Joanna Józefowska, prof. PP

  41. Conditional independence of variables in a domain In any domain one can define a set of variables pa(Xi){X1, ..., Xi–1} such that Xi is independent of variables from the set {X1, ..., Xi–1} \ pa(Xi). Thus P(Xi|X1, ..., Xi – 1) = P(Xi|pa(Xi)) and P(X1, ..., Xn) =  P(Xi|pa(Xi)) n i=1 Dr hab. inż. Joanna Józefowska, prof. PP

  42. B1 A B2 Bn C1 ..... Cm ..... Bayesian network P(A|B1, ..., Bn) Bi directly influences A Dr hab. inż. Joanna Józefowska, prof. PP

  43. earthquake burglary alarm burglary earthquake P(alarm|burglary, earthquake) true false true true 0.950 0.050 true false 0.940 0.060 false true 0.290 0.710 false false 0.001 0.999 Mary calls John calls Example Dr hab. inż. Joanna Józefowska, prof. PP

  44. E B A M J Example P(B) 0.001 P(E) 0.002 B E P(A) T T 0.950 T F 0.940 F T 0.290 F F 0.001 A P(J) T 0.90 F 0.05 A P(M) T 0.70 F 0.01 Dr hab. inż. Joanna Józefowska, prof. PP

  45. Complexity of the representation • Instead of 31 values it is enough to store 10. • Easy construction of the model • Less parameters. • More intuitive parameters. • Easy reasoning. Dr hab. inż. Joanna Józefowska, prof. PP

  46. Bayesian networks • Bayesian network is an acyclic directed graph which • nodesrepresent formulas or variables in the considered domain, • arcsrepresent dependence relation of variables, with related probability distributions. Dr hab. inż. Joanna Józefowska, prof. PP

  47. Bayesian networks variable A with parent nodes pa(A) = {B1,...,Bn} conditional probablity table P(A|B1,...,Bn) or P(A|pa(A)) if pa(A) = a prioriprobability equals P(A) Dr hab. inż. Joanna Józefowska, prof. PP

  48. B1 A B2 Bn B3 ..... B1 ... Bn P(A|B1, Bn) T T 0.18 T F 0.12 ................................. F F 0.28 Bayesian networks pa(A) P(A|B1, B2, ..., Bn) Event Bi has no predecesors (pa(Bi) = ) a priori probability P(Bi) Dr hab. inż. Joanna Józefowska, prof. PP

  49. Local semantics of Bayesian network • Only direct dependence relations between variables. • Local conditional probability distribution. • Assumption about conditional independence of variables not bounded in the graph. Dr hab. inż. Joanna Józefowska, prof. PP

  50. Global semantics of bayesian network Joint probability distribution given implicite. It can be calculated using the following rule: Dr hab. inż. Joanna Józefowska, prof. PP

More Related