1 / 55

Overview

Overview. Introduction to PLL Foundations of PLL Logic Programming, Bayesian Networks, Hidden Markov Models, Stochastic Grammars Frameworks of PLL Independent Choice Logic,Stochastic Logic Programs, PRISM, Bayesian Logic Programs, Probabilistic Logic Programs,Probabilistic Relational Models

tress
Download Presentation

Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview • Introduction to PLL • Foundations of PLL • Logic Programming, Bayesian Networks, Hidden Markov Models, Stochastic Grammars • Frameworks of PLL • Independent Choice Logic,Stochastic Logic Programs, PRISM, • Bayesian Logic Programs, Probabilistic Logic Programs,Probabilistic Relational Models • Logical Hidden Markov Models • Applications

  2. [Haddawy, Ngo] P(A | B,E) E B e b 0.9 0.1 Earthquake Burglary b 0.2 0.8 e 0.9 0.1 e b Alarm 0.99 0.01 e b MaryCalls JohnCalls Probability of being true State RV Probabilistic Logic Programs (PLPs) • Atoms = set of similar RVs • First arguments = RV • Last argument = state • Clause = CPD entry e b 0.9 • Probability distribution over Herbrand interpretations 0.1 : burglary(true). 0.9 : burglary(false). 0.01 : earthquake(true). 0.99: earthquake(false). 0.9 : alarm(true) :- burglary(true), earthquake(true). ... burglary(true)and burglary(false) true in the same interpretation? false :- burglary(true), burglary(false). burglary(true); burglary(false) :- true. false :- earthquake(true), earthquake(false). ... + Integrity constraints

  3. [Haddawy, Ngo] RV State Dependency Probabilistic Logic Programs (PLPs) father(rex,fred). mother(ann,fred). father(brian,doro). mother(utta, doro). father(fred,henry). mother(doro,henry). Qualitative Part Quantitative Part 1.0 : mc(P,a) :- mother(M,P), pc(M,a),mc(M,a). 0.0 : mc(P,b) :- mother(M,P), pc(M,a),mc(M,a). ... 0.5 : pc(P,a) :- father(F,P), pc(F,0),mc(F,a). 0.5 : pc(P,0) :- father(F,P), pc(F,0),mc(F,a). ... 1.0 : bt(P,a) :- mc(P,a),pc(P,a) Variable Binding false :- pc(P,a),pc(P,b), pc(P,0). pc(P,a);pc(P,b); pc(P,0) :- person(P). ...

  4. [Haddawy, Ngo] Probabilistic Logic Programs (PLPs) father(rex,fred). mother(ann,fred). father(brian,doro). mother(utta, doro). father(fred,henry). mother(doro,henry). 1.0 : mc(P,a) :- mother(M,P), pc(M,a),mc(M,a). 0.0 : mc(P,b) :- mother(M,P), pc(M,a),mc(M,a). ... 0.5 : pc(P,a) :- father(F,P), pc(F,0),mc(F,a). 0.5 : pc(P,0) :- father(F,P), pc(F,0),mc(F,a). ... 1.0 : bt(P,a) :- mc(P,aa),pc(P,aa) mc(ann) pc(ann) mc(rex) pc(rex) mc(utta) mc(brian) pc(brian) pc(utta) pc(fred) pc(doro) mc(fred) mc(doro) bt(brian) bt(utta) bt(rex) bt(ann) mc(henry) pc(henry) bt(fred) bt(doro) bt(henry) false :- pc(P,a),pc(P,b), pc(P,0). pc(P,a);pc(P,b); pc(P,0) :- person(P). ...

  5. Database theory Entity-Relationship Models Attributes = RV [Getoor,Koller, Pfeffer] P(A | B,E) E B Earthquake Burglary e b 0.9 0.1 b 0.2 0.8 e Alarm 0.9 0.1 e b 0.99 0.01 e b MaryCalls JohnCalls Probabilistic Relational Models (PRMs) Database alarm system Earthquake Burglary Table Alarm MaryCalls JohnCalls Attribute

  6. [Getoor,Koller, Pfeffer] Binary Relation Table Probabilistic Relational Models (PRMs) (Father) (Mother) Bloodtype Bloodtype M-chromosome M-chromosome P-chromosome P-chromosome Person Person M-chromosome P-chromosome Bloodtype Person

  7. [Getoor,Koller, Pfeffer] Probabilistic Relational Models (PRMs) father(Father,Person). (Father) (Mother) mother(Mother,Person). Bloodtype Bloodtype M-chromosome M-chromosome P-chromosome P-chromosome Person Person bt(Person,BT). M-chromosome P-chromosome pc(Person,PC). mc(Person,MC). Bloodtype Person Dependencies (CPDs associated with): bt(Person,BT) :- pc(Person,PC), mc(Person,MC). pc(Person,PC) :- pc_father(Father,PCf), mc_father(Father,MCf). View : pc_father(Person,PCf) | father(Father,Person),pc(Father,PC). ...

  8. [Getoor,Koller, Pfeffer] mc(ann) pc(ann) mc(rex) pc(rex) mc(utta) mc(brian) pc(brian) pc(utta) pc(fred) pc(doro) mc(fred) mc(doro) bt(brian) bt(utta) bt(rex) bt(ann) mc(henry) pc(henry) bt(fred) bt(doro) bt(henry) Probabilistic Relational Models (PRMs) father(rex,fred). mother(ann,fred). father(brian,doro). mother(utta, doro). father(fred,henry). mother(doro,henry). pc_father(Person,PCf) | father(Father,Person),pc(Father,PC). ... mc(Person,MC) | pc_mother(Person,PCm), pc_mother(Person,MCm). pc(Person,PC) | pc_father(Person,PCf), mc_father(Person,MCf). bt(Person,BT) | pc(Person,PC), mc(Person,MC). State RV

  9. [Kersting, De Raedt] P(A | B,E) E B Earthquake Burglary e b 0.9 0.1 b 0.2 0.8 e Alarm 0.9 0.1 e b 0.99 0.01 e b MaryCalls JohnCalls local BN fragment earthquake burglary P(A | B,E) E B e b 0.9 0.1 b 0.2 0.8 e 0.9 0.1 e b alarm 0.99 0.01 e b Bayesian Logic Programs (BLPs) Rule Graph earthquake/0 burglary/0 alarm/0 maryCalls/0 johnCalls/0 alarm :- earthquake, burglary.

  10. [Kersting, De Raedt] mc(Person) pc(Mother) mc(Mother) Mother (1.0,0.0,0.0) a a (0.5,0.5,0.0) a b pc mc ... ... ... mother mc Person argument atom Person pc mc bt(Person) pc(Person) mc(Person) (1.0,0.0,0.0,0.0) a a (0.0,0.0,1.0,0.0) a b bt ... ... ... predicate Bayesian Logic Programs (BLPs) Rule Graph pc/1 mc/1 bt/1 variable bt(Person) :- pc(Person),mc(Person).

  11. [Kersting, De Raedt] mc(Person) pc(Mother) mc(Mother) Mother (1.0,0.0,0.0) a a (0.5,0.5,0.0) a b pc mc ... ... ... mother mc Person Bayesian Logic Programs (BLPs) pc/1 mc/1 bt/1 mc(Person) | mother(Mother,Person), pc(Mother),mc(Mother). pc(Person) | father(Father,Person), pc(Father),mc(Father). bt(Person) | pc(Person),mc(Person).

  12. [Kersting, De Raedt] mc(ann) pc(ann) mc(rex) pc(rex) mc(utta) mc(brian) pc(brian) pc(utta) pc(fred) pc(doro) mc(fred) mc(doro) bt(brian) bt(utta) bt(rex) bt(ann) mc(henry) pc(henry) bt(fred) bt(doro) bt(henry) Bayesian Logic Programs (BLPs) father(rex,fred). mother(ann,fred). father(brian,doro). mother(utta, doro). father(fred,henry). mother(doro,henry). mc(Person) | mother(Mother,Person), pc(Mother),mc(Mother). pc(Person) | father(Father,Person), pc(Father),mc(Father). bt(Person) | pc(Person),mc(Person). Bayesian Network induced over least Herbrand model

  13. [Kersting, De Raedt] Bayesian Logic Programs (BLPs) • Unique probability distribution over Herbrand interpretations • Finite branching factor, finite proofs, no self-dependency • Highlight • Separation of qualitative and quantitative parts • Functors • Graphical Representation • Discrete and continuous RV • BNs, DBNs, HMMs, SCFGs, Prolog ... • Turing-complete programming language • Learning

  14. mc(ann) pc(ann) mc(rex) pc(rex) mc(utta) mc(brian) pc(brian) pc(utta) pc(fred) pc(doro) mc(fred) mc(doro) bt(brian) bt(utta) bt(rex) bt(ann) mc(henry) pc(henry) bt(fred) bt(doro) bt(henry) Declaritive Semantics • Dependency Graph = (possibly infite) Bayesian network consequence operator If the body of C holds then the head holds, too: mc(fred) is true because mother(ann,fred) mc(ann),pc(ann) are true

  15. mc(ann) pc(ann) mc(rex) pc(rex) mc(utta) mc(brian) pc(brian) pc(utta) pc(fred) pc(doro) mc(fred) mc(doro) bt(brian) bt(utta) bt(rex) bt(ann) mc(henry) pc(henry) bt(fred) bt(doro) bt(henry) Procedural Semantics P(bt(ann)) ?

  16. P(bt(ann),bt(fred)) P(bt(fred)) Procedural Semantics Bayes‘ rule P(bt(ann)| bt(fred)) = P(bt(ann), bt(fred)) ? mc(ann) pc(ann) mc(rex) pc(rex) mc(utta) mc(brian) pc(brian) pc(utta) pc(fred) pc(doro) mc(fred) mc(doro) bt(brian) bt(utta) bt(rex) bt(ann) mc(henry) pc(henry) bt(fred) bt(doro) bt(henry)

  17. Queries using And/Or trees P(bt(fred)) ? bt(fred) Or node is proven if at least one of its successors is provable. Andnode is proven if all of its successors are provable. pc(fred), mc(fred) pc(fred) mc(fred) father(rex,fred),mc(rex),pc(rex) mother(ann,fred),mc(ann),pc(ann) mc(ann) pc(ann) mc(rex) pc(rex) father(rex,fred) mother(ann,fred) pc(fred) mc(fred) mc(rex) mc(ann) bt(ann) pc(rex) pc(ann)) bt(fred) ...

  18. Topic discusses Book prepared read Student prepared(Student,Topic) | read(Student,Book), discusses(Book,Topic). logic prepared bn passes prepared Student passes(Student) | prepared(Student,bn), prepared(Student,logic). Combining Partial Knowledge ... discusses/2 read/1 prepared/2 passes/1

  19. discusses(b2,bn) discusses(b1,bn) prepared(s2,bn) prepared(s1,bn) Combining Partial Knowledge • variable # of parents for prepared/2 due to read/2 • whether a student prepared a topic depends on the books she read • CPD only for one book-topic pair Topic discusses Book prepared read Student prepared(Student,Topic) | read(Student,Book), discusses(Book,Topic).

  20. Combining Rules Topic P(A|B) and P(A|C) discusses Book prepared read Student CR prepared(Student,Topic) | read(Student,Book), discusses(Book,Topic). P(A|B,C) • Any algorithm which • has an empty output if and only if the input is empty • combines a set of CPDs into a single (combined) CPD • E.g. noisy-or, regression, ...

  21. ... registration_grade/2 registered/2 student_ranking/1 Aggregates Map multisets of values to summary values (e.g., sum, average, max, cardinality)

  22. ... registration_grade/2 registered/2 registered/2 Functional Dependency (average) Course Student grade_avg/1 registration_grade grade_avg Probabilistic Dependency (CPD) grade_avg student_ranking/1 student_ranking Student Aggregates Map multisets of values to summary values (e.g., sum, average, max, cardinality) grade_avg/1 Deterministic

  23. mc(Person) pc(Mother) mc(Mother) (1.0,0.0,0.0) a a (0.5,0.5,0.0) a b ... ... ... mc(Person) | mother(Mother,Person), pc(Mother),mc(Mother). pc(Person) | father(Father,Person), pc(Father),mc(Father). bt(Person) | pc(Person),mc(Person). Summary: Model-Theoretic Underlying logic pogram If the body holds then the head holds, too. Consequence operator + Conditional independencies encoded in the induced BN structure = Local probability models + (macro) CPDs noisy-or, ... CRs + Joint probability distribution over the least Herbrand interpretation =

  24. Learning Tasks • Parameter Estimation • Numerical Optimization Problem • Model Selection • Combinatorical Search Learning Algorithm Database Model

  25. Differences between SL and PLL ? • Representation (cf. above) • Structure on the search space becomes more complex • operators for traversing the space • … • Algorithms remain essentially the same

  26. What is the data about? – Model Theoretic E Earthquake Burglary Alarm MaryCalls JohnCalls Model(1) earthquake=yes, burglary=no, alarm=?, marycalls=yes, johncalls=no Model(3) earthquake=?, burglary=?, alarm=yes, marycalls=yes, johncalls=yes Model(2) earthquake=no, burglary=no, alarm=no, marycalls=no, johncalls=no

  27. What is the data about? – Model Theoretic • Data case: • Random Variable + States = (partial) Herbrand interpretation • Akin to „learning from interpretations“ in ILP Background m(ann,dorothy), f(brian,dorothy), m(cecily,fred), f(henry,fred), f(fred,bob), m(kim,bob), ... Model(2) bt(cecily)=ab, pc(henry)=a, mc(fred)=?, bt(kim)=a, pc(bob)=b Model(1) pc(brian)=b, bt(ann)=a, bt(brian)=?, bt(dorothy)=a Model(3) pc(rex)=b, bt(doro)=a, bt(brian)=? Bloodtype example

  28. Parameter Estimation – Model Theoretic Database D Learning Algorithm + Parameter Q Underlying Logic program L

  29. Parameter Estimation – Model Theoretic • Estimate the CPD q entries that best fit the data • „Best fit“: ML parameters q* q* = argmaxq P( data | logic program, q) • = argmaxq log P( data | logic program, q) • Reduces to problem to estimate parameters of a Bayesian networks: given structure, partially observed random varianbles

  30. Parameter Estimation – Model Theoretic +

  31. ... ... ... ... Excourse: Decomposable CRs E • Parameters of the clauses and not of the support network. Multiple ground instance of the same clause Deterministic CPD for Combining Rule

  32. Parameter Estimation – Model Theoretic +

  33. Parameter Estimation – Model Theoretic + Parameter tighting

  34. Inference M M M M M M P( head(GI), body(GI) | DC ) P( head(GI), body(GI) | DC ) P( body(GI) | DC ) DataCase DC Ground Instance GI Ground Instance GI Ground Instance GI DataCase DC DataCase DC EM – Model Theoretic EM-algorithm: iterate until convergence Logic Program L Expectation Initial Parameters q0 Current Model (M,qk) Expected counts of a clause Maximization Update parameters (ML, MAP)

  35. Model Selection – Model Theoretic Database Learning Algorithm + Language: Bayesian bt/1, pc/1,mc/1 Background Knowledge: Logical mother/2, father/2

  36. Model Selection – Model Theoretic • Combination of ILP and BN learning • Combinatorical search for hypo M* s.t. • M* logically covers the data D • M* is optimal w.r.t. some scoring function score, i.e., M* = argmaxM score(M,D). • Highlights • Refinement operators • Background knowledge • Language biase • Search bias

  37. Refinement Operators • Add a fact, delete a fact or refine an existing clause: • Specialization: • Add atom • apply a substitution { X / Y } where X,Y already appear in atom • apply a substitution { X / f(Y1, … , Yn)} where Yi new variables • apply a substitution {X / c } where c is a constant • Generalization: • delete atom • turn ‘term’ into variable • p(a,f(b)) becomes p(X,f(b)) or p(a,f(X)) • p(a,a) becomes p(X,X) or p(a,X) or p(X,a) • replace two occurences of variable X into X1 and X2 • p(X,X) becomes p(X1,X2)

  38. Original program Data cases mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). {m(ann,john)=true, pc(ann)=a, mc(ann)=?, f(eric,john)=true, pc(eric)=b, mc(eric)=a, mc(john)=ab, pc(john)=a, bt(john) = ? } ... mc(ann) mc(eric) pc(ann) pc(eric) mc(john) pc(john) m(ann,john) f(eric,john) bc(john) Example

  39. Original program mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). mc(ann) mc(eric) pc(ann) pc(eric) Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). mc(john) pc(john) m(ann,john) f(eric,john) bc(john) Example

  40. Original program mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). mc(ann) mc(eric) pc(ann) pc(eric) Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). mc(john) pc(john) m(ann,john) f(eric,john) bc(john) Example

  41. Original program mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). Refinement mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Example mc(ann) mc(eric) pc(ann) pc(eric) mc(john) pc(john) m(ann,john) f(eric,john) bc(john)

  42. Original program mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). Refinement Refinement mc(X) | m(M,X),mc(X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Example mc(ann) mc(eric) pc(ann) pc(eric) mc(john) pc(john) m(ann,john) f(eric,john) bc(john)

  43. Original program mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). Refinement Refinement mc(X) | m(M,X),pc(X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Example mc(ann) mc(eric) pc(ann) pc(eric) mc(john) pc(john) m(ann,john) f(eric,john) bc(john)

  44. Original program mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). Refinement Refinement mc(X) | m(M,X),pc(X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Example E mc(ann) mc(eric) pc(ann) pc(eric) mc(john) pc(john) m(ann,john) f(eric,john) bc(john) ...

  45. Bias • Many clauses can be eliminated a priori • Due to type structure of clauses • e.g. atom(compound,atom, charge), bond(compound,atom,atom,bondtype) active(compound) • eliminate e.g. • active(C) :- atom(X,C,5) • not conform to type structure

  46. Bias - continued • or to modes of predicates : determines calling pattern in queries • + : input; - : output • mode(atom(+,-,-)) • mode(bond(+,+,-,-)) • all variables in head are + (input) • active(C) :- bond(C,A1,A2,T) not mode conform • because A1 does not exist in left part of clause and argument declared + • active(C) :- atom(C,A,P), bond(C,A,A2,double) mode conform.

  47. Conclusions on Learning • Algorithms remain essentially the same • Not single edges but bunches of edges are modified • Structure on the search space becomes more complex Refinement Operators Scores Statistical Learning Inductive Logic Programming/ Multi-relational Data Mining Independency Bias Priors Background Knowledge

  48. Overview • Introduction to PLL • Foundations of PLL • Logic Programming, Bayesian Networks, Hidden Markov Models, Stochastic Grammars • Frameworks of PLL • Independent Choice Logic,Stochastic Logic Programs, PRISM, • Bayesian Logic Programs, Probabilistic Logic Programs,Probabilistic Relational Models • Logical Hidden Markov Models • Applications

  49. CS course on data mining. Lecturer Wolfram Lecturer Luc CS department CS course on statistics CS course on Robotics Logical (Hidden) Markov Models • Each state is trained independently • No sharing of experience, large state space

  50. course(cs,dm) lecturer(cs,luc) lecturer(cs,wolfram) dept(cs) course(cs,stats) course(cs,rob) dept(D) course(D,C) course(D,C) lecturer(D,L) Logical (Hidden) Markov Models (0.7) dept(D) -> course(D,C). (0.2) dept(D) -> lecturer(D,L). ... (0.3) course(D,C) -> lecturer(D,L). (0.3) course(D,C) -> dept(D). (0.3) course(D,C) -> course(D´,C´). ... (0.1) lecturer(D,L) -> course(D,C). ... Abstract states

More Related