490 likes | 638 Views
Legacy of Ed Jaynes -- approaches to uncertainty management. Stefan Arnborg, KTH. Applications of Uncertainty. Medical Imaging/Research (Schizophrenia) Land Use Planning Environmental Surveillance and Prediction Finance and Stock Marketing into Google Robot Navigation and Tracking
E N D
Legacy of Ed Jaynes -- approaches to uncertainty management. Stefan Arnborg, KTH
Applications of Uncertainty Medical Imaging/Research (Schizophrenia) Land Use Planning Environmental Surveillance and Prediction Finance and Stock Marketing into Google Robot Navigation and Tracking Security and Military Performance Tuning
Project Aims • Support transformation of tasks and solutions in a generic fashion • Integrate different command levels and services in a dynamic organization • Facilitate consistent situation awareness
* WIRED on Total Information Awareness WIRED (Dec 2, 2002) article "Total Info System Totally Touchy" discusses the Total Information Awareness system. The Total Information Awareness System and related efforts received ~~~ Quote: "People have to move and plan before committing a terrorist act. Our hypothesis is their planning process has a signature." Jan Walker, Pentagon spokeswoman, in Wired, Dec 2, 2002. "What's alarming is the danger of false positives based on incorrect data," Herb Edelstein, in Wired, Dec 2, 2002. Endsley: *Inference -> Situation awareness*Information picture *Understanding effects of actions *Understanding situation implies understanding best response
Sun Zi Om han upprättar ett läger på ett lättillgängligt ställe är det för att vinna andra fördelar. Om det rör sig i skogen är han på väg. Många uppsatta hinder på öppen mark betyder att fienden vill vilseleda. När fåglar lättar ligger fienden i bakhåll. Uppskrämda djur betyder att fienden är i rörelse. När dammet yr i höga och tydliga strängar är det vagnar som är på väg. När dammet ligger lågt och jämnt är det fotsoldater. När dammet är utspritt i tunna strängar samlar fienden ved. När dammet är tunt och yr kors och tvärs slår fienden läger
Sun Zi Den som känner sig själv och sin motpart genomgår hundra strider utan fara. Den som känner sig själv men inte sin motpart förlorar en strid för varje seger. Den som varken känner sig själv eller sin motpart är dömd att förlora varje strid.
Methods for Inference • Visualisation: Florence NightingaleExpert-based, CSCW • Probability based methods: Bayes, Hypothesis testing, Fiducial, Distribution independent methods, … • Game theory: Harsanyi Bayesian Games • Ad Hoc: Typically bio-inspired (how does the brain or DNA work?)
Methods for Inference • All inference methods are based on assumptions • The most common method to cope with uncertainty is to make assumptions ---and then to forget that they were made(Arnborg, Brynielsson, 2004), (Thunholm 1999) • Death by Assumption: Why Great Planning Strategies Fail (latest Management Fad)
Visualization • Visualize data in such a way that the important aspects are obvious - A good visualization strikes you as a punch between your eyes (Tukey, 1970) • Pioneered by Florence Nightingale, first female member of Royal Statistical Society, inventor of pie charts and performance metrics
Probabilistic approaches • Bayes: Probability conditioned by observation • Cournot: An event with very small probability will not happen. • Kolmogorov: A sequence is random if it cannot be compressed
Foundations for Bayesian Inference • Bayes method, first documented methodbased on probability: Plausibility of event depends on observation, Bayes rule: • Parameter and observation spaces can be extremely complex, priors and likelihoods also. • MCMC current approach -- often but not always applicable (difficult when posterior has many local maxima separated by low density regions)Better than Numerics??
Spectacular application: PET-camera scene Camera geometry&noise film scene regularity (and any other camera or radar device)
Thomas Bayes,amateur mathematician If we have a probability modelof the world we know how to compute probabilities of events. But is it possible to learn aboutthe world from events we see? Bayes’ proposal was forgottenbut rediscovered by Laplace.
Antoine Augustine Cournot (1801--1877)Pioneer in stochastic processes, market theoryand structural post-modernism. Predicted demise of academic system due to discourses of administration and excellence(cf Readings). An alternative to Bayes’ method - hypothesis testing - is based on ’Cournot’s Bridge’:an event with very small probability will not happen
Fiducial Inference R A Fisher (1890--1962). In his paper Inverse Probability, he rejected Bayesian Analysis on grounds of its dependency on priors and scaling. He launched an alternative concept, 'fiducial analysis'. Although this concept was not developed after Fishers time, the standard definition of confidence intervals has a similar flavor. The fiducial argument was apparently the starting point for Dempster in developing evidence theory.
Kolmogorov and randomness Andrei Kolmogorov(1903-1987) is the mathematician best known for shaping probability theory into a modern axiomatized theory. His axioms of probability tells how probability measures are defined, also on infinite and infinite-dimensional event spaces and complex product spaces. Kolmogorov complexity characterizes a random string by the smallest size of a description of it. Used to explain Vovk/Gammerman scheme of hedged prediction. Also used in MDL (Minimum Description Length) inference.
Combining Bayesian and frequentist inference • Posterior for parameter • Generating testing set (Gelman et al, 2003)
Bayesian Decision Theory (Savage) • Outcome R depends on uncertain l with prior f(l) and outcome a: • Utility of R is u(R) • Observe D with: f(D|) • Choose a maximizing expected utility,Estimating probability: Use Laplace’s estimator
Generalisation of Bayes/Kalman:What if: • You have no prior? • Likelihood infeasible to compute (imprecision)? • Parameter space vague, i.e., not the same for all likelihoods? (Fuzziness, vagueness)? • Parameter space has complex structure (a simple structure is e.g., a Cartesian product of reals, R, and some finite sets)?
Some approaches... • Robust Bayes: replace distributions by convex sets of distributions (Berger m fl) • Dempster/Shafer/TBM: Describe imprecision with random sets • DSm: Transform parameter space to capture vagueness. (Dezert/Smarandache, controversial) • FISST: FInite Set STatistics: Generalisesobservation- and parameter space to product of spaces described as random sets.(Goodman, Mahler, Ngyuen)
Ellsberg’s Paradox:Ambiguity Avoidance Urna A innehåller 4 vita och 4 svarta kulor, och 4 av okänd färg (svart eller vit) Urna B innehåller 6 vita och 6 svarta kulor ? ? ? ? Du får en krona om du drar en svart kula. Ur vilken urnavill du dra den? En precis Bayesian bör först anta hur ?-kulorna är färgade och sedansvara. Men en majoritet föredrar urna B även om svart byts mot vit
Hur används imprecisa sannolikheter? • Förväntad nytta för beslutsalternativ blir intervall i stället för punkter: maximax, maximin, maximedel? u Bayesian optimist pessimist a
Ed Jaynes devoted a large part of his career to promoteBayesian inference. He also championed theuse of Maximum Entropy in physics Outside physics, he received resistance from people who hadalready invented other methods.Why should statistical mechanics say anything about our daily human world??
Cox approach to Bayesianism • Let A|C be the real-valued plausibility of A,given that we know C to be true. • AB|C=F(A|BC,B|C), plausibility of a conjunction depends only on plausibilities of its constituents. F is strictly monotone. Introduce S(A|B) - plausibility of not A given B. Cox/Jaynes argument has flavour of (somewhat imprecise) theoretical physics • Using several unstated assumptions, it is shown that plausibility can be scaled to probability, w(F(x,y))=w(x)w(y), w(S(x))=1-w(x))
Related Work • Michael Hardy: Scaled Boolean AlgebrasAdvances in Applied Mathematics, 2002 • C.H. Kraft, J.H. Pratt and A. Seidenberg: Intuitive Probability on Finite SetsAnn Math Stat, 1959 (Similar outlook, heavier math, but not same conclusions)
Halpern’s Example: 4 Worlds B|C = L|M M A L C B K H|J≈K|M D|G = K|LM A|C = I|J E|G = A|B J G E I H D D|E=H|J
Example: F(F(x,y),z)≈F(x,F(y,z)) B|C = L|M=z M A L C B K H|J≈K|M D|G = K|LM A|C = I|J E|G = A|B=y J G E I H D D|E=H|J=x (Halpern 2000)
Refine:A’|A=D|E: INCONSISTENCY B|C = L|M=z M A L C A’ B K H|J≈K|M D|G = K|LM A|C = I|J E|G = A|B=y J G E I H D D|E=H|J=x H|J=A’AB|C=K|M!!!!!!!!!!!!!
Proof structure: Rescalability=Consistnt Refinability • (i)->(ii): rescaling on discrete set can be interpolated smoothly over (0,1). • (ii)->(i) is trickier: assume that rescalability is impossible and show that existence of an inconsistent refinement follows. Find L such that ML=0 andDL>0
Duality explained If L such that ML=0 then notDL>0 DF d F= {L:ML=0} DF has non-neg normal! d1L1+…+d(n-1)L(n-1)=d1L2+…+d(n-1)Ln translates toF(a1,..,ak,c1,…,cm)=F(b1,…,bk,c1,…cm) with ai<bi -- and can be interpreted as inconsistent refinement!!
Inconsistency of Example: c Linear system turns out non-solvable; from dual solution we obtain c: F(x4,x4)=F(x3,x5)=a +1 F(x2,x4)=F(x1,x5)=b -1 F(x4,x6)=F(x3,x7)=c -1 F(x2,x6)=F(x1,x8)=d +1 Composing equations as indicated by c yields an inconsistency: F(x7,q)=F(x8,q), where q=F(x1,F(x2,F(x3,F(x4,F(x4,F(x5,x6)))))) This corresponds to an inconsistent refinement consistingof 9 information-independent new cases with plausibiltiesx1, x2, x3, x4, x4,…,x8 relative to an existing event
INFINITE CASE: NON-SEPARABILITY Probability model Counterexample i Log probability
Finite model (finite number of events): Every consistent real ordered plausibility measure can be rescaled to probability; using duality ‘like’ Purdom-Freedman (Arnborg, Sjödin, ECCAI 2000) However, this was difficult to extend to infinite models. After several failed approaches, the reason was found: It is not possible because the needed theorem is not true; However: For any (finite, enumerable, continuos family) modelits plausibility measure can be embedded in an ordered field (where conjunction and disjunction correspond to * and +)(Arnborg, Sjödin, MaxEnt 2000)
Arnborg, Sjödin ca 2001 Introduce:AB|C=F(A|C,B|AC)A+B|C=G(A|C,B-A|C)~A|C=S(A|C) The properties of propositional logic entail that F and G satisfy the axioms for and + of a ring! And truth and falsity ( T and ) are 1 and 0 of an integral domain Assuming the domain ordered and and + (strictly) increasing gives us an ordered field, because inversion of and + is possible (unless one operand of is ). Standard quotient constructions (first defines negative numbers and multiplication by integer, second defines rationals) but be careful since + is a partial function! By MacLane-Birkhoff, an ordered ring can be embedded in an ordered field, and there is a minimal such embedding field (a superset of Q). If the embedding field is a subset of R, we have standard probability. If superset of R, we have extended probability. Conway, in ”Numbers and Games”, showed that there is also a maximal ordered field, No. This field contains all infinitesimals and infinite numbers.
Infinitesimal probability (Adams) • If Obama wins the election, McCain will retire • If McCain dies before the election, Obama will win • Syllogism:If McCain dies, Obama wins and McCain retires? • Solution: ‘McCain dies’ has infinitesimal probability • Non-Monotonic logic in AI (McCarthy) is just infinitesimal probability!!
Cox approach to Bayesianism • Let A|C be the real-valued plausibility of A,given that we know C to be true. • AB|C=F(A|BC,B|C), plausibility of a conjunction depends only on plausibilities of its constituents. F is strictly monotone. Similar rule for disjunction G.Cox/Jaynes argument has flavour of (somewhat imprecise) theoretical physics • With some assumptions, F and G can be shown to inheritthe algebraic laws of a ring from logical ’and’ and ’or’ of logic,and the monotonicity assumptions imply that F and G are* and + of a monotone field (Körper, kropp). • These assumptions entail Bayesianism (possibly with infinitesimal probability)(Arnborg, Sjödin, 2000, Cox 1946) This argument does not exclude partially ordered plausibilitymeasures like intervals of probabilities.
Robust Bayes • Priors and likelihoods are convex sets of probability distributions (Berger, de Finetti, Walley,...): imprecise probability: • Every member of posterior is a ’parallell combination’ of one member of likelihood and one member of prior. • For decision making: Jaynes recommends to use that member of posterior with maximum entropy (Maxent estimate).
Hur används imprecisa sannolikheter? • Förväntad nytta för beslutsalternativ blir intervall i stället för punkter: maximax, maximin, maximedel? u Bayesian optimist pessimist a
Dempster/Shafer/Smets • Evidence is random set over over . • I.e., probability distribution over . • Probability of singleton: ‘Belief’ allocated to alternative, i.e., probability. • Probability of non-singelton: ‘Belief’ allocated to set of alternatives, but not to any part of it. • Evidences combined by random intersection conditioned to be non-empty (Dempster’s rule).
Correspondence DS-structure -- set of probability distributions For a pdf (bba) m over 2^, consider allways of reallocating the probability mass of non-singletons to their member atoms: This gives a convex set of probability distributions over . Example: ={A,B,C} set of pdfs bba A: 0.1B: 0.3 C: 0.1AB: 0.5 A: 0.1+0.5*xB: 0.3+0.5*(1-x)C: 0.1 for all x[0,1] Can we regard any set of pdf:s as a bba? Answer is NO!!There are more convex sets of pdf:s than DS-structures
Representing probability set as bba: 3-element universe Rounding up: use lower envelope. Rounding down: Linear programming Rounding is not unique!! Black: convex set Blue: rounded up Red: rounded down
Another appealing conjecture • Precise pdf can be regarded as (singleton) random set. • Bayesian combination of precise pdf:s corresponds to random set intersection (conditioned on non-emptiness) • DS-structure corresponds to Choquet capacity (set of pdf:s) • Is it reasonable to combine Choquet capacities by (nonempty) random set intersection (Dempster’s rule)?? • Answer is NO!! • Counterexample: Dempster’s combination cannot be obtained by combining members of prior and likelihood: Arnborg: JAIF vol 1, No 1, 2006
Consistency of fusion operators Axes are probabilities of A and B in a 3-element universe P(B) Operands (evidence) Robust Fusion Dempster’s rule Modified Dempster’s rule Rounded robust DS rule MDS rule P(A) P(C )=1-P(A)-P(B)
Zadeh’s Paradoxical Example • Patient has headache, possible explanations are M-- Meningitis ; C-- Concussion ; T-- Tumor. • Expert 1: P( M )=0 ; P( C )=0.9 ; P( T )=0.1 • Expert 2: P( M )=0.9 ; P( C )=0 ; P( T )=0.1 • Parallel comb: 0 0 0.01 • What is the combined conclusion? Parallelnormalized: (0,0,1)? • Is there a paradox??
Zadeh’s Paradox (ctd) • One expert (at least) made an error • Experts do not know what probability zero means • Experts made correct inferences based on different observation sets, and T is indeed the correct answer:f(|o1, o2) = c f(o1|)f(o2| )f() but this assumes f(o1,o2 | )=f(o1| ) f(o2| )which need not be true if granularity of istoo coarse (not taking variability of f(oi| ) into account).One reason (among several) to look at RobustBayes.