560 likes | 776 Views
Overview. Introduction to PLL Foundations of PLL Logic Programming, Bayesian Networks, Hidden Markov Models, Stochastic Grammars Frameworks of PLL Independent Choice Logic,Stochastic Logic Programs, PRISM,Probabilistic Logic Programs,Probabilistic Relational Models, Bayesian Logic Programs
E N D
Overview • Introduction to PLL • Foundations of PLL • Logic Programming, Bayesian Networks, Hidden Markov Models, Stochastic Grammars • Frameworks of PLL • Independent Choice Logic,Stochastic Logic Programs, PRISM,Probabilistic Logic • Programs,Probabilistic Relational Models, Bayesian Logic Programs • Relational Hidden Markov Models • Learning • Applications
[Haddawy, Ngo] P(A | B,E) E B e b 0.9 0.1 Earthquake Burglary b 0.2 0.8 e 0.9 0.1 e b Alarm 0.99 0.01 e b MaryCalls JohnCalls Probability of being true State RV Probabilistic Logic Programs (PLPs) • Atoms = set of similar RVs • First arguments = RV • Last argument = state • Clause = CPD entry e b 0.9 • Probability distribution over Herbrand interpretations 0.1 : burglary(true). 0.9 : burglary(false). 0.01 : earthquake(true). 0.99: earthquake(false). 0.9 : alarm(true) :- burglary(true), earthquake(true). ... burglary(true)and burglary(false) true in the same interpretation? false :- burglary(true), burglary(false). burglary(true); burglary(false) :- true. false :- earthquake(true), earthquake(false). ... + Integrity constraints
[Haddawy, Ngo] Probabilistic Logic Programs (PLPs) father(rex,fred). mother(ann,fred). father(brian,doro). mother(utta, doro). father(fred,henry). mother(doro,henry). Context 1.0 : mc(P,a) :- mother(M,P), pc(M,a),mc(M,a). 0.0 : mc(P,b) :- mother(M,P), pc(M,a),mc(M,a). ... 0.5 : pc(P,a) :- father(F,P), pc(F,0),mc(F,a). 0.5 : pc(P,0) :- father(F,P), pc(F,0),mc(F,a). ... 1.0 : bt(P,a) :- mc(P,aa),pc(P,aa) Probabilities false :- pc(P,a),pc(P,b), pc(P,0). pc(P,a);pc(P,b); pc(P,0)´:- person(P). ... Constraints
[Haddawy, Ngo] RV State Dependency Probabilistic Logic Programs (PLPs) father(rex,fred). mother(ann,fred). father(brian,doro). mother(utta, doro). father(fred,henry). mother(doro,henry). Qualitative Part Quantitative Part 1.0 : mc(P,a) :- mother(M,P), pc(M,a),mc(M,a). 0.0 : mc(P,b) :- mother(M,P), pc(M,a),mc(M,a). ... 0.5 : pc(P,a) :- father(F,P), pc(F,0),mc(F,a). 0.5 : pc(P,0) :- father(F,P), pc(F,0),mc(F,a). ... 1.0 : bt(P,a) :- mc(P,aa),pc(P,aa) false :- pc(P,a),pc(P,b), pc(P,0). pc(P,a);pc(P,b); pc(P,0) :- person(P). ...
[Haddawy, Ngo] RV State Dependency Probabilistic Logic Programs (PLPs) father(rex,fred). mother(ann,fred). father(brian,doro). mother(utta, doro). father(fred,henry). mother(doro,henry). Qualitative Part Quantitative Part 1.0 : mc(P,a) :- mother(M,P), pc(M,a),mc(M,a). 0.0 : mc(P,b) :- mother(M,P), pc(M,a),mc(M,a). ... 0.5 : pc(P,a) :- father(F,P), pc(F,0),mc(F,a). 0.5 : pc(P,0) :- father(F,P), pc(F,0),mc(F,a). ... 1.0 : bt(P,a) :- mc(P,aa),pc(P,aa) Variable Binding false :- pc(P,a),pc(P,b), pc(P,0). pc(P,a);pc(P,b); pc(P,0) :- person(P). ...
[Haddawy, Ngo] Probabilistic Logic Programs (PLPs) father(rex,fred). mother(ann,fred). father(brian,doro). mother(utta, doro). father(fred,henry). mother(doro,henry). 1.0 : mc(P,a) :- mother(M,P), pc(M,a),mc(M,a). 0.0 : mc(P,b) :- mother(M,P), pc(M,a),mc(M,a). ... 0.5 : pc(P,a) :- father(F,P), pc(F,0),mc(F,a). 0.5 : pc(P,0) :- father(F,P), pc(F,0),mc(F,a). ... 1.0 : bt(P,a) :- mc(P,aa),pc(P,aa) mc(ann) pc(ann) mc(rex) pc(rex) mc(utta) mc(brian) pc(brian) pc(utta) pc(fred) pc(doro) mc(fred) mc(doro) bt(brian) bt(utta) bt(rex) bt(ann) mc(henry) pc(henry) bt(fred) bt(doro) bt(henry) false :- pc(P,a),pc(P,b), pc(P,0). pc(P,a);pc(P,b); pc(P,0) :- person(P). ...
[Haddawy, Ngo] Probabilistic Logic Programs (PLPs) • Unique probability distribution over Herbrand interpretations • finite branching factor, finite proofs, no self-dependency • Atoms = States • Integrity constraints encode mutually excl. states • Functors • BN used to do inference • Turing-complete programming language • BNs, HMMs, DBNs, SCFGs, ... • No learning
Database theory Entity-Relationship Models Attributes = RV [Getoor,Koller, Pfeffer] P(A | B,E) E B Earthquake Burglary e b 0.9 0.1 b 0.2 0.8 e Alarm 0.9 0.1 e b 0.99 0.01 e b MaryCalls JohnCalls Probabilistic Relational Models (PRMs) Database alarm system Earthquake Burglary Table Alarm MaryCalls JohnCalls Attribute
[Getoor,Koller, Pfeffer] Binary Relation Table Probabilistic Relational Models (PRMs) (Father) (Mother) Bloodtype Bloodtype M-chromosome M-chromosome P-chromosome P-chromosome Person Person M-chromosome P-chromosome Bloodtype Person
[Getoor,Koller, Pfeffer] Probabilistic Relational Models (PRMs) father(Father,Person). (Father) (Mother) Bloodtype Bloodtype M-chromosome M-chromosome P-chromosome P-chromosome Person Person bt(Person,BT). M-chromosome P-chromosome pc(Person,PC). mc(Person,MC). Bloodtype Person Dependencies : bt(Person,BT) :- pc(Person,PC), mc(Person,MC). pc(Person,PC) :- pc_father(Father,PCf), mc_father(Father,MCf). View : pc_father(Person,PCf) | father(Father,Person),pc(Father,PC).
[Getoor,Koller, Pfeffer] mc(ann) pc(ann) mc(rex) pc(rex) mc(utta) mc(brian) pc(brian) pc(utta) pc(fred) pc(doro) mc(fred) mc(doro) bt(brian) bt(utta) bt(rex) bt(ann) mc(henry) pc(henry) bt(fred) bt(doro) bt(henry) Probabilistic Relational Models (PRMs) father(rex,fred). mother(ann,fred). father(brian,doro). mother(utta, doro). father(fred,henry). mother(doro,henry). pc_father(Person,PCf) | father(Father,Person),pc(Father,PC). ... mc(Person,MC) | pc_mother(Person,PCm), pc_mother(Person,MCm). pc(Person,PC) | pc_father(Person,PCf), mc_father(Person,MCf). bt(Person,BT) | pc(Person,PC), mc(Person,MC). State RV
[Getoor,Koller, Pfeffer] Probabilistic Relational Models (PRMs) • Datalog • Unique Probability Distribution over finite Herbrand interpretations • No self-dependency • Discrete and continuous RV • BN used to do inference • Highlight Graphical Representation • BNs • Learning
[Kersting, De Raedt] P(A | B,E) E B Earthquake Burglary e b 0.9 0.1 b 0.2 0.8 e Alarm 0.9 0.1 e b 0.99 0.01 e b MaryCalls JohnCalls local BN fragment earthquake burglary P(A | B,E) E B e b 0.9 0.1 b 0.2 0.8 e 0.9 0.1 e b alarm 0.99 0.01 e b Bayesian Logic Programs (BLPs) Rule Graph earthquake/0 burglary/0 alarm/0 maryCalls/0 johnCalls/0 alarm :- earthquake, burglary.
[Kersting, De Raedt] mc(Person) pc(Mother) mc(Mother) Mother (1.0,0.0,0.0) a a (0.5,0.5,0.0) a b pc mc ... ... ... mother mc Person argument atom Person pc mc bt(Person) pc(Person) mc(Person) (1.0,0.0,0.0,0.0) a a (0.0,0.0,1.0,0.0) a b bt ... ... ... predicate Bayesian Logic Programs (BLPs) Rule Graph pc/1 mc/1 bt/1 variable bt(Person) :- pc(Person),mc(Person).
[Kersting, De Raedt] mc(Person) pc(Mother) mc(Mother) Mother (1.0,0.0,0.0) a a (0.5,0.5,0.0) a b pc mc ... ... ... mother mc Person Bayesian Logic Programs (BLPs) pc/1 mc/1 bt/1 mc(Person) | mother(Mother,Person), pc(Mother),mc(Mother). pc(Person) | father(Father,Person), pc(Father),mc(Father). bt(Person) | pc(Person),mc(Person).
[Kersting, De Raedt] mc(ann) pc(ann) mc(rex) pc(rex) mc(utta) mc(brian) pc(brian) pc(utta) pc(fred) pc(doro) mc(fred) mc(doro) bt(brian) bt(utta) bt(rex) bt(ann) mc(henry) pc(henry) bt(fred) bt(doro) bt(henry) Bayesian Logic Programs (BLPs) father(rex,fred). mother(ann,fred). father(brian,doro). mother(utta, doro). father(fred,henry). mother(doro,henry). mc(Person) | mother(Mother,Person), pc(Mother),mc(Mother). pc(Person) | father(Father,Person), pc(Father),mc(Father). bt(Person) | pc(Person),mc(Person).
[Kersting, De Raedt] Bayesian Logic Programs (BLPs) • Unique probability distribution over Herbrand interpretations • Finite branching factor, finite proofs, no self-dependency • Highlight • Separation of qualitative and quantiative parts • Functors • Graphical Representation • Discrete and continuous RV • BNs, DBNs, HMMs, SCFGs, Prolog ... • Turing-complete programming language • Learning
Combining Partial Knowledge ... Topic discusses Book discusses/2 read/1 prepared read Student prepared(Student,Topic) | read(Student,Book), discusses(Book,Topic). prepared/2 logic prepared bn passes passes/1 prepared Student passes(Student) | prepared(Student,bn), prepared(Student,logic).
discusses(b2,bn) discusses(b1,bn) prepared(s2,bn) prepared(s1,bn) Combining Partial Knowledge • variable #parents for prepared/2 due to read/2 • whether a student prepared a topic depends on the books she read • CPD only for one book-topic pair Topic discusses Book prepared read Student prepared(Student,Topic) | read(Student,Book), discusses(Book,Topic).
Combining Rules Topic P(A|B) and P(A|C) discusses Book prepared read Student CR prepared(Student,Topic) | read(Student,Book), discusses(Book,Topic). P(A|B,C) • Any algorithm which • has an empty output if and only if the input is empty • combines a set of CPDs into a single (combined) CPD • E.g. noisy-or, regression, ...
... registration_grade/2 registered/2 student_ranking/1 Aggregates Map multisets of values to summary values (e.g., sum, average, max, cardinality)
... registration_grade/2 registered/2 registered/2 Functional Dependency (average) Course Student grade_avg/1 registration_grade grade_avg Probabilistic Dependency (CPD) grade_avg student_ranking/1 student_ranking Student Aggregates Map multisets of values to summary values (e.g., sum, average, max, cardinality) grade_avg/1 Deterministic
Stochastic Relational Models (SRMs) • Type I, i.e., frequencies in databases • Probability that a select-join query succeeds: • Independently sample tuples ri from Ri; select as values for Ai the values r.Ai
Stochastic Relational Models (SRMs) WHO Mortality Database country.name death.cause pers.sex pers.jCountry pers.dyear pers.jDeath pers.dage query(pers:dage=‚75-79y‘,death:cause=k) = 0.012 query(pers:dage=‚85-89y‘,death:cause=k) = 0.0012 query(pers:dage=‚75-79y‘,death:cause=r) = 0.02 query(pers:dage=‚85-89y‘,death:cause=r) = 0.114 query(pers:dage=‘1-4y‘) = 0.00201 query(pers:dage=‘25-29y‘) = 7.1*10-5 query(pers:dage=‘75-79y‘) = 0.12 query(pers:dage=‘85-89y‘) = 0.176
Overview • Introduction to PLL • Foundations of PLL • Logic Programming, Bayesian Networks, Hidden Markov Models, Stochastic Grammars • Frameworks of PLL • Independent Choice Logic,Stochastic Logic Programs, PRISM,Probabilistic Logic • Programs,Probabilistic Relational Models, Bayesian Logic Programs • Relational Hidden Markov Models • Learning • Applications
CS course on data mining. Lecturer Wolfram Lecturer Luc CS department CS course on statistics CS course on Robotics Relational (Hidden) Markov Models • Each state is trained independently • No sharing of experience, large state space
course(cs,dm) lecturer(cs,luc) lecturer(cs,wolfram) dept(cs) course(cs,stats) course(cs,rob) dept(D) course(D,C) course(D,C) lecturer(D,L) Relational (Hidden) Markov Models (0.7) dept(D) -> course(D,C). (0.2) dept(D) -> lecturer(D,L). ... (0.3) course(D,C) -> lecturer(D,L). (0.3) course(D,C) -> dept(D). (0.3) course(D,C) -> course(D´,C´). ... (0.1) lecturer(D,L) -> course(D,C). ... Abstract states
Relational (Hidden) Markov Models • So far, only transitions between abstract states • Needed: possible transitions and their probabilities for any ground state lecturer(D,L) Possible instantiations for each arguments {cs,math,bio,...} x {luc, wolfram, ...} Chance of instations P(lecturer(cs,luc))
Relational (Hidden) Markov Models RMMs [Anderson et al. 03]: Probability Estimation Trees lecturer(D,L) LOHMMs [Kersting et al. 03]: Naive Bayes P(D) P(L) *
Learning Tasks • Parameter Estimation • Numerical Optimization Problem • Model Selection • Combinatorical Search Learning Algorithm Database Model
Differences between SL and PLL ? • Representation (cf. above) • Structure on the search space becomes more complex • operators for traversing the space • … • Algorithms remain essentially the same
What is the data about? – Model Theoretic E Earthquake Burglary Alarm MaryCalls JohnCalls Model(1) earthquake=yes, burglary=no, alarm=?, marycalls=yes, Johncalls=no Model(3) earthquake=?, burglary=?, alarm=yes, marycalls=yes, Johncalls=yes Model(2) earthquake=no, burglary=no, alarm=no, marycalls=no, Johncalls=no
What is the data about? – Model Theoretic E • Data case: • RV specified = (partial) Herbrand interpretation • Aking to „learning from interpretations“ (ILP) Background m(ann,dorothy), f(brian,dorothy), m(cecily,fred), f(henry,fred), f(fred,bob), m(kim,bob), ... Model(2) bt(cecily)=ab, bt(henry)=a, bt(fred)=?, bt(kim)=a, bt(bob)=b Model(1) pc(brian)=b, bt(ann)=a, bt(brian)=?, bt(dorothy)=a Model(3) pc(rex)=b, bt(doro)=a, bt(brian)=? Bloodtype example
Parameter Estimation – Model Theoretic Database D Learning Algorithm + Parameter Q Underlying Logic program L
Parameter Estimation – Model Theoretic • Estimate the CPD q entries that best fit the data • „Best fit“: ML parameters q* q* = argmaxq P( data | logic program, q) • = argmaxq log P( data | logic program, q) • Reduces to problem within Bayesian networks: given structure, partially observed random varianbles
... ... ... ... Excourse: Decomposable CRs E • Parameters of the clauses and not of the support network. Single ground instance Multiple ground instance of the same clause Deterministic CPD for Combining Rule
Parameter Estimation – Model Theoretic + Parameter tighting
Inference EM – Model Theoretic EM-algorithm: iterate until convergence Logic Program L Expectation Initial Parameters q0 Current Model (M,qk) Expected counts Maximization Update parameters (ML, MAP)
Model Selection – Model Theoretic Database Learning Algorithm + Language: Bayesian bt/1, pc/1,mc/1 Background Knowledge: Logical mother/2, father/2
Model Selection – Model Theoretic • Combination of ILP and BN learning • Combinatorical search for hypo M* s.t. • M* logically covers the data D • M* is optimal w.r.t. some scoring function score, i.e., M* = argmaxM score(M,D). • Highlights • Refinement operators • Background knowledge • Language biase • Search bias
Refinement Operators • Add a fact, delete a fact or refine an existing clause: • Specialization: • Add atom • apply a substitution { X / Y } where X,Y already appear in atom • apply a substitution { X / f(Y1, … , Yn)} where Yi new variables • apply a substitution {X / c } where c is a constant • Generalization: • delete atom • turn ‘term’ into variable • p(a,f(b)) becomes p(X,f(b)) or p(a,f(X)) • p(a,a) becomes p(X,X) or p(a,X) or p(X,a) • replace two occurences of variable X into X1 and X2 • p(X,X) becomes p(X1,X2)
Original program Data cases mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). {m(ann,john)=true, pc(ann)=a, mc(ann)=?, f(eric,john)=true, pc(eric)=b, mc(eric)=a, mc(john)=ab, pc(john)=a, bt(john) = ? } ... mc(ann) mc(eric) pc(ann) pc(eric) mc(john) pc(john) m(ann,john) f(eric,john) bc(john) Example
Original program mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). mc(ann) mc(eric) pc(ann) pc(eric) Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). mc(john) pc(john) m(ann,john) f(eric,john) bc(john) Example
Original program mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). mc(ann) mc(eric) pc(ann) pc(eric) Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). mc(john) pc(john) m(ann,john) f(eric,john) bc(john) Example
Original program mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). Refinement mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Example mc(ann) mc(eric) pc(ann) pc(eric) mc(john) pc(john) m(ann,john) f(eric,john) bc(john)
Original program mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). Refinement Refinement mc(X) | m(M,X),mc(X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Example mc(ann) mc(eric) pc(ann) pc(eric) mc(john) pc(john) m(ann,john) f(eric,john) bc(john)
Original program mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). Refinement Refinement mc(X) | m(M,X),pc(X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Example mc(ann) mc(eric) pc(ann) pc(eric) mc(john) pc(john) m(ann,john) f(eric,john) bc(john)
Original program mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). Refinement Refinement mc(X) | m(M,X),pc(X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Example E mc(ann) mc(eric) pc(ann) pc(eric) mc(john) pc(john) m(ann,john) f(eric,john) bc(john) ...