720 likes | 916 Views
Possibility Theory and its applications: a retrospective and prospective view. D. Dubois, H. Prade IRIT-CNRS, Université Paul Sabatier 31062 TOULOUSE FRANCE. Outline. Basic definitions Pioneers Qualitative possibility theory Quantitative possibility theory.
E N D
Possibility Theory and its applications: a retrospective and prospective view D. Dubois, H. Prade IRIT-CNRS, Université Paul Sabatier 31062 TOULOUSE FRANCE
Outline • Basic definitions • Pioneers • Qualitative possibility theory • Quantitative possibility theory
Possibility theory is an uncertainty theory devoted to the handling of incomplete information. • similar to probability theory because it is based on set-functions. • differs by the use of a pair of dual set functions (possibility and necessity measures) instead of only one. • it is not additive and makes sense on ordinal structures. The name "Theory of Possibility" was coined by Zadeh in 1978
The concept of possibility • Feasibility:It is possible to do something (physical) • Plausibility: It is possible thatsomething occurs (epistemic) • Consistency : Compatible with what is known (logical) • Permission: It is allowed to do something (deontic)
POSSIBILITY DISTRIBUTIONS(uncertainty) • S: frame of discernment (set of "states of the world") • x : ill-known description of the current state of affairs taking its value on S • L: Plausibility scale: totally ordered set of plausibility levels ([0,1], finite chain, integers,...) • A possibility distribution πx attached to x is a mapping from S to L : s, πx(s) L, such that s, πx(s) = 1 (normalization) • Conventions: πx(s) = 0 iff x = s is impossible, totally excluded πx(s) = 1 iff x = s is normal, fully plausible, unsurprizing
EXAMPLE : x = AGE OF PRESIDENT • If I do not know the age of the president, I may have statistics on presidents ages… but generally not, or they may be irrelevant. • partial ignorance : • 70 ≤ x ≤ 80 (sets,intervals) a uniform possibility distribution π(x) = 1 x [70, 80] = 0 otherwise • partial ignorance with preferences : May have reasons to believe that72 > 71 73 > 70 74 > 75 > 76 > 77
EXAMPLE : x = AGE OF PRESIDENT • Linguistic information described by fuzzy sets: “ he is old ” : π = µOLD • If I bet on president's age: I may come up with a subjective probability ! But this result is enforced by the setting of exchangeable bets (Dutch book argument). Actual information is often poorer.
A possibility distribution is the representation of a state of knowledge: a description of how we think the state of affairs is. • π' more specific than π in the wide sense if and only if π' ≤ π In other words: any value possible for π' should be at least as possible for π that is, π' is more informative than π • COMPLETE KNOWLEDGE : The most specific ones • π(s0) = 1 ; π(s) = 0 otherwise • IGNORANCE : π(s) = 1, s S
POSSIBILITY AND NECESSITY OF AN EVENT • A possibility distribution on S (the normal values of x) • an event A How confident are we that x A S ? • (A) = maxuAπ(s); The degree of possibility that x A • N(A) = 1 – (Ac)=min uA 1 – π(s) The degree of certainty (necessity) that x A
Comparing the value of a quantity x to a threshold when the value of x is only known to belong to an interval [a, b]. • In this example, the available knowledge is modeled by p(x) = 1 if x [a, b], 0 otherwise. • Proposition p = "x > " to be checked • i) a > : then x > is certainly true : N(x > ) = P(x > ) = 1. • ii) b < : then x > is certainly false ; N(x > ) = P(x > ) = 0. • iii) a ≤ ≤ b: then x > is possibly true or false; N(x > ) = 0; P(x > ) = 1.
Basic properties (A) = to what extent at least one element in A is consistent with π (= possible) N(A) = 1 – (Ac) = to what extent no element outside A is possible = to what extent π implies A (A B) = max((A), (B)); N(A B) = min(N(A), N(B)). Mind that most of the time : (A B) < min((A), (B)); N(A B) > max(N(A), N(B) Corollary N(A) > 0 (A) = 1
Pioneers of possibility theory • In the 1950’s, G.L.S. Shackle called "degree of potential surprize" of an event its degree of impossibility. • Potential surprize is valued on a disbelief scale, namely a positive interval of the form [0, y*], where y* denotes the absolute rejection of the event to which it is assigned. • The degree of surprize of an event is the degree of surprize of its least surprizing realization. • He introduces a notion of conditional possibility
Pioneers of possibility theory • In his 1973 book, the philosopherDavid Lewisconsiders a relation between possible worlds he calls "comparative possibility". • He relates this concept of possibility to a notion of similarity between possible worlds for defining the truth conditions of counterfactual statements. • for events A, B, C, A B C A C B. • The ones and only ordinal counterparts to possibility measures
Pioneers of possibility theory • The philosopherL. J. Cohenconsidered the problem of legal reasoning (1977). • "Baconian probabilities" understood as degrees of provability. • It is hard to prove someone guilty at the court of law by means of pure statistical arguments. • A hypothesis and its negation cannot both have positive "provability" • Such degrees of provability coincide with necessity measures.
Pioneers of possibility theory • Zadeh(1978) proposed an interpretation of membership functions of fuzzy sets as possibility distributions encoding flexible constraints induced by natural language statements. • relationship between possibility and probability: what is probable must preliminarily be possible. • refers to the idea of graded feasibility ("degrees of ease") rather than to the epistemic notion of plausibility. • the key axiom of "maxitivity" for possibility measures is highlighted (also for fuzzy events).
Qualitative vs. quantitative possibility theories • Qualitative: • comparative: A complete pre-ordering ≥πon U A well-ordered partition of U: E1 > E2 > … > En • absolute: πx(s) L = finite chain, complete lattice... • Quantitative: πx(s) [0, 1], integers... One must indicate where the numbers come from. All theories agree on the fundamental maxitivity axiom(A B) = max((A), (B)) Theories diverge on the conditioning operation
Ordinal possibilistic conditioning • A Bayesian-like equation: A) = min(A), ) A) is the maximal solution to this equation. (B | A) = 1 if A, B ≠ Ø, (A) = (A B) > 0 = (A B) if (A) > (A B) N(B | A) = 1 – (Bc| A) • Independence (B | A) = (B) impliesA) = min(), ) Not the converse!!!!
QUALITATIVE POSSIBILISTIC REASONING • The set of states of affairs is partitioned via π into a totally ordered set of clusters of equally plausible states E1 (normal worlds) > E2 >... En+1 (impossible worlds) • ASSUMPTION: the current situation is normal. By default the state of affairs is in E1 • N(A) > 0 iff P(A) > P(Ac) iff A is true in all the normal situations Then, A is accepted as an expected truth • Accepted events are closed under deduction
A CALCULUS OF PLAUSIBLE INFERENCE (B) ≥(C) means « Comparing propositions on the basis of their most normal models » • ASSUMPTION for computing (B): the current situation is the most normal where B is true. • PLAUSIBLE REASONING = “ reasoning as if the current situation were normal” and jumping to accepted conclusions obtained from the normality assumption. • DIFFERENT FROM PROBABILISTIC REASONING BASED ON AVERAGING
ACCEPTANCE IS DEFEASIBLE • If B is learned to be true, then the normal situations become the most plausible ones in B, and the accepted beliefs are revised accordingly • Accepting A in the context where B is true: • P(AB) > P(Ac B) iff N(A | B) > 0(conditioning) • One may have N(A) > 0 , N(Ac | B) > 0 : non-monotony
PLAUSIBLE INFERENCE WITH A POSSIBILITY DISTRIBUTION Given a non-dogmatic possibility distribution π on S (π(s) > 0, s) Propositions A, and B • A |=πB iff (A B) > (A Bc) It means that B is true in the most plausible worlds where A is true • This is a form of inference first proposed by Shoham in nonmonotonic reasoning
Exa mple (continued) • Pieces of knowledge like ∆ = {b f, p b, p ¬f} can be expressed by constraints (b f) > ( b ¬f) (p b) > (p ¬b) (p ¬f) > (p f) • the minimally specific π* ranks normal situations first: ¬p b f, ¬p ¬b • then abnormal situations: ¬f b • Last, totally absurd situations f p , ¬b p
Example (back to possibilistic logic) = material implication • Ranking of rules: b f has less priority that others according to p*: N*(b f ) = N*(p b) > N*(b f) • Possibilistic base : K = {(b f ), (p b), (p ¬f)}, with <
Applications of qualitative possibility theory • Exception-tolerant Reasoning in rule bases • Belief revision and inconsistency handling in deductive knowledge bases • Handling priority in constraint-based reasoning • Decision-making under uncertainty with qualitative criteria (scheduling) • Abductive reasoning for diagnosis under poor causal knowledge (satellite faults, car engine test-benches)
ABSOLUTE APPROACH TO QUALITATIVE DECISION • A set of states S; • A set of consequences X. • A decision = a mapping f from S to X • f(s) is the consequence of decision f when the state is known to be s. • Problem : rank-order the set of decisions in XS when the state is ill-known and there is a utility function on X. • This is SAVAGE framework.
ABSOLUTE APPROACH TO QUALITATIVE DECISION • Uncertainty on states is possibilistic a function π: S L L is a totally ordered plausibility scale • Preference on consequences: a qualitative utility function µ: X U • µ(x) = 0 totally rejected consequence • µ(y) > µ(x) y preferred to x • µ(x) = 1 preferred consequence
Possibilistic decision criteria • Qualitative pessimistic utility (Whalen): UPES(f) = minsS max(n(π(s)), µ(f(s))) where n is the order-reversing map of V • Low utility : plausible state with bad consequences • Qualitative optimistic utility (Yager): UOPT(f) = maxsS min(π(s), µ(f(s))) • High utility: plausible states with good consequences
The pessimistic and optimistic utilities are well-known fuzzy pattern-matching indices • in fuzzy expert systems: • µ = membership function of rule condition • π = imprecision of input fact • in fuzzy databases • µ = membership function of query • π = distribution of stored imprecise data • in pattern recognition • µ = membership function of attribute template • π = distribution of an ill-known object attribute
Assumption: plausibility and preference scales L and U are commensurate • There exists a common scale V that contains both L and U, so that confidence and uncertainty levels can be compared. • (certainty equivalent of a lottery) • If only a subset E of plausible states is known • π = E • UPES(f) = minsE µ(f(s)) (utility of the worst consequence in E) criterion of Wald under ignorance • UOPT(f)= maxsE µ(f(s))
Pessimistic qualitative utility of binary acts xAy, with µ(x) > µ(y): • xAy (s) = x if A occurs = y if its complement Ac occurs UPES(xAy) = median {µ(x), N(A), µ(y)} • Interpretation: If the agent is sure enough of A, it is as if the consequence is x: UPES(f) = µF(x) If he is not sure about A it is as if the consequence is y: UPES(f) = µF(y) Otherwise, utility reflects certainty: UPES(f) = N(A) • WITH UOPT(f) : replace N(A) by (A)
Representation theorem for pessimistic possibilistic criteria • Suppose the preference relation a on acts obeys the following properties: • (XS, a) is a complete preorder. • there are two acts such that f a g. • A, f, x, y constant,x a y xAfyAf • if f >a h and g >a h imply f g >a h • if x is constant, h >a x and h >a g imply h >a xg then there exists a finite chain L, an L-valued necessity measure on S and an L-valued utility function u, such thata is representable by the pessimistic possibilistic criterion UPES(f).
Merits and limitations of qualitative decision theory • Provides a foundation for possibility theory • Possibility theory is justified by observing how a decision-maker ranks acts • Applies to one-shot decisions (no compensations/ accumulation effects in repeated decision steps) • Presupposes that consecutive qualitative value levels are distant from each other (negligibility effects)
Quantitative possibility theory • Membership functions of fuzzy sets • Natural language descriptions pertaining to numerical universes (fuzzy numbers) • Results of fuzzy clustering Semantics: metrics, proximity to prototypes • Upper probability bound • Random experiments with imprecise outcomes • Consonant approximations of convex probability sets Semantics: frequentist, subjectivist (gambles)...
Quantitative possibility theory • Orders of magnitude of very small probabilities degrees of impossibility k(A) ranging on integers k(A) = n iff P(A) = en • Likelihood functions (P(A| x), where x varies) behave like possibility distributions P(A| B) ≤ maxx B P(A| x)
POSSIBILITY AS UPPER PROBABILITY • Given a numerical possibility distribution p, define P(p) = {Probabilities P | P(A) ≤ (A) for all A} • Then, generally it holds that (A) = sup {P(A) | P P(p)} N(A) = inf {P(A) | P P(p)} • So p is a faithful representation of a family of probability measures.
From confidence sets to possibility distributions Consider a nested family of sets E1E2 … En a set of positive numbers a1 …an in [0, 1] and the family of probability functions P = {P | P(Ei) ≥ ai for all i}. Pis always representable by means of a possibility measure. Its possibility distribution is precisely πx = mini max(µEi, 1 – ai)
Random set view • Let mi = i – i+1 then m1 +… + mn = 1 A basic probability assignment (SHAFER) • π(s) = ∑i: sAi mi (one point-coverage function) • Only in the consonant case can m be recalculated from π
CONDITIONAL POSSIBILITY MEASURES • A Coxian axiom(A C) = (A |C)*(C), with * = product Then: (A |C) = (A C)/ (C) N(A|C) = 1 – (Ac | C) Dempster rule of conditioning (preserves s-maxitivity) For the revision of possibility distributions: minimal change of when N(C) = 1. It improves the state of information (reduction of focal elements)
Bayesian possibilistic conditioning (A |b C) = sup{P(A|C), P ≤ , P(C) > 0} N(A |b C) = inf{P(A|C), P ≤ , P(C) > 0} It is still a possibility measure π(s |b C) = π(s)max(1, 1/( π(s) + N(C))) It can be shownthat: (A |b C) = (A C)/ ((A C) + N(AcC)) N(A|bC) = N(A C) / (N(A C) + P(AcC)) = 1 – (Ac |b C) For inference from generic knowledge based on observations
Possibility-Probability transformations • Why ? • fusion of heterogeneous data • decision-making : betting according to a possibility distribution leads to probability. • Extraction of a representative value • Simplified non-parametric imprecise probabilistic models
Elementary forms of probability-possibility transformations exist for a long time • POSS PROB: Laplace indifference principle “ All that is equipossible is equiprobable ” = changing a uniform possibility distribution into a uniform probability distribution • PROB POSS: Confidence intervals Replacing a probability distribution by an interval A with a confidence level c. • It defines a possibility distribution • π(x) = 1 if x A, = 1 – c if x A
Possibility-Probability transformations : BASIC PRINCIPLES • Possibility probability consistency: P ≤ • Preserving the ordering of events : P(A) ≥ P(B) (A) ≥ (B) or elementary events only(x) > (x') if and only if p(x) > p(x')(orderpreservation) • Informational criteria: from to P: Preservation of symmetries (Shapley value rather than maximal entropy) from P to : optimize information content (Maximization or minimisation of specificity
From OBJECTIVE probability to possibility : • Rationale : given a probability p, try and preserve as much information as possible • Select a most specific element of the set PI(P) = {: ≥ P} of possibility measures dominating P such that (x) > (x') iff p(x) > p(x') • may be weakened into : p(x) > p(x')implies (x) > (x') • The result is i = j=i,…n pi (case of no ties)
From probability to possibility : Continuous case • The possibility distribution obtained by transforming p encodes then family of confidence intervals around the mode of p. • The a-cut of is the (1- a)-confidence interval of p • The optimal symmetric transform of the uniform probability distribution is the triangular fuzzy number • The symmetric triangular fuzzy number (STFN) is a covering approximation of any probability with unimodal symmetric density p with the same mode. • In other words the a-cut of a STFN contains the (1- a)-confidence interval of any such p.
From probability to possibility : Continuous case • IL = {x, p(x) ≥ } =[aL, aL+ L] is the interval of length L with maximal probability • The most specific possibility distribution dominating p is π such that L > 0, π(aL) = π(aL+ L) = 1 – P(IL). b
Possibilistic view of probabilistic inequalities • Chebyshev inequality defines a possibility distribution that dominates any density with given mean and variance. • The symmetric triangular fuzzy number (STFN) defines a possibility distribution that optimally dominates anysymmetric density with given mode and bounded support.
From possibility to probability • Idea (Kaufmann, Yager, Chanas): • Pick a number in [0, 1] at random • Pick an element at random in the -cut of π. a generalized Laplacean indifference principle : change alpha-cuts into uniform probability distributions. • Rationale : minimise arbitrariness by preserving the symmetry properties of the representation. • The resulting probability distribution is: • The centre of gravity of the polyhedron P(p) • The pignistic transformation of belief functions (Smets) • The Shapley value of the unanimity game N in game theory.
SUBJECTIVE POSSIBILITY DISTRIBUTIONS • Starting point : exploit the betting approach to subjective probability • A critique: The agent is forced to be additive by the rules of exchangeable bets. • For instance, the agent provides a uniform probability distribution on a finite set whether (s)he knows nothing about the concerned phenomenon, or if (s)he knows the concerned phenomenon is purely random. • Idea : It is assumed that a subjective probability supplied by an agent is only a trace of the agent's belief.