330 likes | 453 Views
Axioms and Algorithms for Inferences Involving Probabilistic Independence. Dan Geiger, Azaria Paz, and Judea Pearl, Information and Computation 91(1), March 1991, 128-141. Presentation by Guy Moses & Omer Weissbrod
E N D
Axioms and Algorithmsfor InferencesInvolvingProbabilistic Independence Dan Geiger, Azaria Paz, and Judea Pearl, Information and Computation 91(1), March 1991, 128-141. Presentation by Guy Moses & Omer Weissbrod for the course 236372 - Bayesian NetworksComputer Science Faculty, Technion – winter 2009 partially based on the presentation by Ilan Gronau
What’s ahead? Introduction- some definitions, notations and reminders. Proof of Completeness. - “if it’s true – it can be proved”. Preparations for the Membership Algorithm–more definitions, and some theoretical groundwork. The Membership Algorithm– description, proof of correctness, complexity analysis.
Definitions • U (Universe) – set of random variables with probability distributionP. • X,Y – finite sets of random variables:X= x1,…,xn, Y = y1,…,ym • P(X,Y) = P(X)·P(Y)- a short-hand notation for the equality:Pr{x1=a1,…, xn=an, y1=b1, …, ym=bm} = Pr{x1=a1,…, xn=an} · Pr{y1=b1, …, ym=bm} for every choice of a1, …, an, b1, …, bm • (X,Y) – short-hand for P(X,Y) = P(X)·P(Y) This is called an independence statement. *note that X,Yare disjoint sets of variables (XY = ).
Notations • - a specific independence statement of the form (X,Y) • - a set of independence statements of the form (X,Y): = 1, … , k • XY-short-hand notation for the union X Y • P satisfies = (X,Y) means: P(X,Y) = P(X)·P(Y)for that specific P.
Soundness and Completeness Definitions: • iff every distribution that satisfies also satisfies . • iff cl(),i.e. there exists a derivation chain 1,…,n= s.t. for each j, either j or jis derived by an axiom from the previous statements. For a set of axioms A: Soundness: A is sound iff for every and : Completeness: A is complete iff for every and : Completeness - Alternative definition:A is complete iff for every and every cl()there exists a distribution Pthat satisfies cl)( and does not satisfy.
Independence Axioms We saw (in 1st lecture) that axioms 1a-1d are sound (always infer correctly). Today we’ll show they are complete (can derive every true statement).
What’s ahead? Introduction- some definitions, notations and reminders. Proof of Completeness. - “if it’s true – it can be proved”. Preparations for the Membership Algorithm–more definitions, and some theoretical groundwork. The Membership Algorithm– description, proof of correctness, complexity analysis.
Minimal Statement • Definition: =(X,Y) cl()is minimal if for every non-empty X’,Y’s.t.X’X, Y’ Y,X’Y’XY we have (X’,Y’) cl(). • For every=(X,Y) cl()we can find an appropriate minimal ’=(X’,Y’)cl()through iterative decomposition. • Observation: Psatisfies Psatisfies’(decomposition soundness), Therefore:Pdoesn’t satisfy ’ Pdoesn’t satisfy . • Our plan: Given an arbitrary cl(), We will find a distribution P that satisfies cl() but doesn’t satisfy ’. This will prove completeness (using the alternative completeness definition and the observation above). • To simplify annotation, we will assume WLOG that =(X,Y)is already minimal.
=0 =0.5n =0.5m Completeness Proof Let =(X,Y) cl()be a minimal statement where: X={x1,…,xn},Y={y1,…,ym},andZ={z1,z2,…,zk}stand for the rest of the variables in U. We will construct Pas follows: All variables, except x1, are fair coins (probability for each of their two values) x1 is defined thus: Part 1: P does not satisfy We will inspect the following scenario: x1=1, all other variables are 0. P(x1, … , xn, y1, … , ym) P(x1, … , xn)·P(y1, … , ym) Therefore, P does not satisfy , as required.
Completeness Proof – cont’d Part 2: P satisfies cl() Let(V,W) cl(). We will show thatP(V,W)=P(V)·P(W). This is done by inspecting different scenarios: Scenario 1: either V or W contains only elements of Z. We will assume WLOG that W contains only elements of Z. all variables in Z are independent under Pand therefore: Z W Z Z Z Y Z Z Z V Z Z Y X Y Z X Y Y X Z Y Z Z X Z
Completeness Proof – cont’d Part 2: P satisfies cl() Let(V,W) cl(). We will show thatP(V,W)=P(V)·P(W). This is done by inspecting different scenarios: Scenario 2: Both V and W contain elements of X Y,butV W doesn’t contain all elements of X Y. Without full information about the assignments of the variables in X Y, x1could turn out to be 0 or 1 with probability, and therefore: Z Z W Z Z Y Z Z Z Z Z V Y X Y Z X Y Y X Z Y Z Z X Z
mix mix Completeness Proof – cont’d Part 2: P satisfies cl()- continued Scenario 3: Both V and W contain elements of X Y, and(X Y)(V W). We will show a derivation chain for =(X,Y), contradicting our original assumption that cl(): Mark: (V,W)=(XVYVZV, XWYWZW)cl() where: Y=YVYW, X=XVXW, ZVZWZ, V=XVYVZV,W=XWYWZW Remove all z’s by decomposition: (XVYV,XWYW)cl() Due to minimality of=(X,Y):(XV,YV)cl()and (XW,Y)cl() (XV,YV)(XVYV,XWYW) (XV,YV XWYW) = (XV,XWY) (XW,Y) (XWY,XV) (Y,XVXW) = (Y,X) = Z Z Z Z Y W Z Z Z Z Z Y X V Y Z X Y Y X Z Y Z Z X Z
Completeness Proof – Summary Reminder: Completeness - Alternative definition:A is complete iff for every and every cl()there exists a distribution Pthat satisfies cl)( and does not satisfy. We’ve shown: given a minimalcl(),there exists a distributionPthat obeys: • Pdoes not satisfy. • Psatisfies. Given a non-minimal cl(), we will derive itsminimal statement ’, and devise a distribution P’that satisfies but does not satisfy ’. Due to soundness of decomposition, P’ cannot satisfy as well.
all p.d.’s over U discrete p.d.’s normalp.d.’s binary p.d.’s Scope of Completeness The proof uses P- a binary p.d. (probability distribution function) therefore: • P however, for normal p.d.’s, the axiom set a1-d1 is not complete. a stronger axiom is required: replace: with:
What’s ahead? Introduction- some definitions, notations and reminders. Proof of Completeness. - “if it’s true – it can be proved”. Preparations for the Membership Algorithm–more definitions, and some theoretical groundwork. The Membership Algorithm– description, proof of correctness, complexity analysis.
Some more Definitions and Tools Definition: Span span(): the set of elements represented in statement . Example: span(x1x2,x3,x4) = {x1,x2,x3,x4} span(): the set of elements represented in all statements of . Example: span({(x1,x2),(x1,x3)}) = {x1,x2,x3}
Some more Definitions and Tools Definition: Projection The projection of onX, denoted (X), is the statement derived from by removing all elements not in X from . Example: if =(x1x2x3, x4x5)and X={x2,x3,x4}then (X)=(x2x3, x4). The projection of onX, denoted (X), is {(X) | }.
Some more Definitions and Tools Projection Lemma: iff‘ , where ’= (span()) )if ' then clearly because all the statements in ‘ can be derived from the statements in by decomposition.
Some more Definitions and Tools Projection Lemma: iff’ , where ’ = (span()), s = span() )if then there is a derivation chain for : 1, 2, … , k. For each j: if k j,k<j, (by symmetry or decomposition) then k(s) j(s)by symmetry or decomposition respectively. Similarly, if j is derived from kandl by mixing, then j(s)is derived from k(s),l(s)by mixing.
Some more Definitions and Tools Projection Lemma: iff’ , where ’ = (span()), s = span() Observations from projection lemma: • Variables not in are unnecessary for determining whether . • The problem of verifying whether can be simplified to the problem of verifying whether ', where '= (span()). • This problem can be solved with a possibly reduced time and space complexity.
Conditions for Inference of Independence Maim claim: for a given , we have ’ iff: • is trivial: =(X,)(up to symmetry) OR • is in ’:’(up to symmetry) OR • is derivable from ’: there exists ’’s.t. span() = span(’) and for ’=(AP,BQ) =(AQ,BP) (A,B,Q,P may be empty) ’ (A,P), ’ (B,Q) (up to symmetry)
Proof of Main Claim Maim claim: for a given , we have ’ iff: • is trivial*: =(X,) *up to symmetry • is in*’:’ • is derivable* from ’:’’s.t. span() = span(’) and for ’=(AP,BQ) =(AQ,BP) : ’ (A,P), ’ (B,Q) ) if 1. is trivial* OR 2. is in*’. than the proof is immediate. otherwise, 3. there exists ’’s.t. span() = span(’) and for ’=(AP,BQ) =(AQ,BP) : ’ (A,P), ’ (B,Q) we will show a constructive proof under these conditions
mix mix mix dec. Proof of Main Claim Maim claim: for a given , we have ’ iff: • is trivial*: =(X,) *up to symmetry • is in*’:’ • is derivable* from ’:’’s.t. span() = span(’) and for ’=(AP,BQ) =(AQ,BP) : ’ (A,P), ’ (B,Q) • ) (contd.) given that ’ (AP,BQ), ’ (A,P), ’ (B,Q). • (A,P)(AP,BQ) (A,PBQ) • (B,Q)(AP,BQ) (APB,Q) (PB,Q) • (PB,Q)(A,PBQ) (AQ,PB) = (AQ, BP) = • We’ve proven this direction.
dec. dec. Proof of Main Claim Maim claim: for a given , we have ’ iff: • is trivial*: =(X,) *up to symmetry • is in*’:’ • is derivable* from ’:’’s.t. span() = span(’) and for ’=(AP,BQ) =(AQ,BP) : ’ (A,P), ’ (B,Q) )Given’ , if 1. is trivial* OR 2. is in*’, than the proof is immediate. Otherwise, since no axiom can add new variables to a statement, there must exist ’’s.t. span() = span(’)in the derivation chain of. also: = (AQ,BP) (A,P) = (AQ,BP) (Q,B)
Conclusions from Claim • We’ve seen that, after discarding unneeded variables,it is possible to tell whether ’ (when it’s not immediately obvious) by: • Finding another statement ’’for whichspan() = span(’), • Verifying that ’ (A,P), ’ (B,Q)when ’=(AP,BQ) =(AQ,BP). • Thissuggests using a recursive “divide and conquer” approach.
What’s ahead? Introduction- some definitions, notations and reminders. Proof of Completeness. - “if it’s true – it can be proved”. Preparations for the Membership Algorithm–more definitions, and some theoretical groundwork. The Membership Algorithm– description, proof of correctness, complexity analysis.
The Membership Algorithm Procedure Find(,): • set ’ :=(span()). • if is trivial, or ’ (up to symmetry)then Find(,) := TRUE. • else if for all non-trivial ’’: span() span(’), then Find(,) := FALSE. • else there exists ’’: span() = span(’), and ’=(AP,BQ) =(AQ,BP), set 1:= (A,P), 2:= (B,Q). Find(,) := (Find(’,1) Find(’,2))
Algorithm Correctness Proof We will prove that Find(,) := TRUEcl() by induction on k=. Induction base: if k=1 then is trivial, therefore the algorithm will return TRUE in step 2 and cl().
Algorithm Correctness Proof Induction assumption: Find(,) := TRUEcl() for each ’<k. Induction step: Find(,) := TRUEiff either: 1. Step 2 returns TRUE is trivial or ’cl(). 2. Step 4 returns TRUE iff Find(’,1) := TRUE Find(’,2) := TRUE iff 1cl(’)2cl(’) iff cl(’) (according to algorithm’s definition) (according to induction assumption) (according to main claim) (according to projection lemma) iffcl()
Complexity Analysis Definitions: n = the number of distinct variables in {}. k = the number of distinct variables in {}. • First projection cost: O(||·n) – happens only once. • Recursive step: T)k) ||·k + T(k1) + T(k2) where k1+k2=k, k1=|1|, k2=|2| • Can be shown by induction: T)k) ||·k·(depth of recursion) • Worst case analysis: T)k) ||·k·k= ||·k2 • Total run time is bounded by: O(||·n + ||·k2)which is also:O(||·n2)since k n.
Improvements and Variations • Instead of arbitrarily choosing ’, find one whose sub-statements {A,B,P,Q} have balanced size (can improve run-time complexity). • Using the derivation chain presented in the constructive proof, the algorithm can also return a derivation chain for with a length of O(k).
Variations (contd.) The algorithm can be expanded into a polynomial algorithm for the following problems: • Given two sets and , is cl() cl() ?is cl() = cl() ? • Minimize the size of while preserving cl(): Start with a maximal-size statement and remove from all statements derivable from it.Repeat with the next largest statement etc.