550 likes | 724 Views
Overview. Introduction to PLL Foundations of PLL Logic Programming, Bayesian Networks, Hidden Markov Models, Stochastic Grammars Frameworks of PLL Independent Choice Logic,Stochastic Logic Programs, PRISM, Bayesian Logic Programs, Probabilistic Logic Programs,Probabilistic Relational Models
E N D
Overview • Introduction to PLL • Foundations of PLL • Logic Programming, Bayesian Networks, Hidden Markov Models, Stochastic Grammars • Frameworks of PLL • Independent Choice Logic,Stochastic Logic Programs, PRISM, • Bayesian Logic Programs, Probabilistic Logic Programs,Probabilistic Relational Models • Logical Hidden Markov Models • Applications
[Haddawy, Ngo] P(A | B,E) E B e b 0.9 0.1 Earthquake Burglary b 0.2 0.8 e 0.9 0.1 e b Alarm 0.99 0.01 e b MaryCalls JohnCalls Probability of being true State RV Probabilistic Logic Programs (PLPs) • Atoms = set of similar RVs • First arguments = RV • Last argument = state • Clause = CPD entry e b 0.9 • Probability distribution over Herbrand interpretations 0.1 : burglary(true). 0.9 : burglary(false). 0.01 : earthquake(true). 0.99: earthquake(false). 0.9 : alarm(true) :- burglary(true), earthquake(true). ... burglary(true)and burglary(false) true in the same interpretation? false :- burglary(true), burglary(false). burglary(true); burglary(false) :- true. false :- earthquake(true), earthquake(false). ... + Integrity constraints
[Haddawy, Ngo] RV State Dependency Probabilistic Logic Programs (PLPs) father(rex,fred). mother(ann,fred). father(brian,doro). mother(utta, doro). father(fred,henry). mother(doro,henry). Qualitative Part Quantitative Part 1.0 : mc(P,a) :- mother(M,P), pc(M,a),mc(M,a). 0.0 : mc(P,b) :- mother(M,P), pc(M,a),mc(M,a). ... 0.5 : pc(P,a) :- father(F,P), pc(F,0),mc(F,a). 0.5 : pc(P,0) :- father(F,P), pc(F,0),mc(F,a). ... 1.0 : bt(P,a) :- mc(P,a),pc(P,a) Variable Binding false :- pc(P,a),pc(P,b), pc(P,0). pc(P,a);pc(P,b); pc(P,0) :- person(P). ...
[Haddawy, Ngo] Probabilistic Logic Programs (PLPs) father(rex,fred). mother(ann,fred). father(brian,doro). mother(utta, doro). father(fred,henry). mother(doro,henry). 1.0 : mc(P,a) :- mother(M,P), pc(M,a),mc(M,a). 0.0 : mc(P,b) :- mother(M,P), pc(M,a),mc(M,a). ... 0.5 : pc(P,a) :- father(F,P), pc(F,0),mc(F,a). 0.5 : pc(P,0) :- father(F,P), pc(F,0),mc(F,a). ... 1.0 : bt(P,a) :- mc(P,aa),pc(P,aa) mc(ann) pc(ann) mc(rex) pc(rex) mc(utta) mc(brian) pc(brian) pc(utta) pc(fred) pc(doro) mc(fred) mc(doro) bt(brian) bt(utta) bt(rex) bt(ann) mc(henry) pc(henry) bt(fred) bt(doro) bt(henry) false :- pc(P,a),pc(P,b), pc(P,0). pc(P,a);pc(P,b); pc(P,0) :- person(P). ...
Database theory Entity-Relationship Models Attributes = RV [Getoor,Koller, Pfeffer] P(A | B,E) E B Earthquake Burglary e b 0.9 0.1 b 0.2 0.8 e Alarm 0.9 0.1 e b 0.99 0.01 e b MaryCalls JohnCalls Probabilistic Relational Models (PRMs) Database alarm system Earthquake Burglary Table Alarm MaryCalls JohnCalls Attribute
[Getoor,Koller, Pfeffer] Binary Relation Table Probabilistic Relational Models (PRMs) (Father) (Mother) Bloodtype Bloodtype M-chromosome M-chromosome P-chromosome P-chromosome Person Person M-chromosome P-chromosome Bloodtype Person
[Getoor,Koller, Pfeffer] Probabilistic Relational Models (PRMs) father(Father,Person). (Father) (Mother) mother(Mother,Person). Bloodtype Bloodtype M-chromosome M-chromosome P-chromosome P-chromosome Person Person bt(Person,BT). M-chromosome P-chromosome pc(Person,PC). mc(Person,MC). Bloodtype Person Dependencies (CPDs associated with): bt(Person,BT) :- pc(Person,PC), mc(Person,MC). pc(Person,PC) :- pc_father(Father,PCf), mc_father(Father,MCf). View : pc_father(Person,PCf) | father(Father,Person),pc(Father,PC). ...
[Getoor,Koller, Pfeffer] mc(ann) pc(ann) mc(rex) pc(rex) mc(utta) mc(brian) pc(brian) pc(utta) pc(fred) pc(doro) mc(fred) mc(doro) bt(brian) bt(utta) bt(rex) bt(ann) mc(henry) pc(henry) bt(fred) bt(doro) bt(henry) Probabilistic Relational Models (PRMs) father(rex,fred). mother(ann,fred). father(brian,doro). mother(utta, doro). father(fred,henry). mother(doro,henry). pc_father(Person,PCf) | father(Father,Person),pc(Father,PC). ... mc(Person,MC) | pc_mother(Person,PCm), pc_mother(Person,MCm). pc(Person,PC) | pc_father(Person,PCf), mc_father(Person,MCf). bt(Person,BT) | pc(Person,PC), mc(Person,MC). State RV
[Kersting, De Raedt] P(A | B,E) E B Earthquake Burglary e b 0.9 0.1 b 0.2 0.8 e Alarm 0.9 0.1 e b 0.99 0.01 e b MaryCalls JohnCalls local BN fragment earthquake burglary P(A | B,E) E B e b 0.9 0.1 b 0.2 0.8 e 0.9 0.1 e b alarm 0.99 0.01 e b Bayesian Logic Programs (BLPs) Rule Graph earthquake/0 burglary/0 alarm/0 maryCalls/0 johnCalls/0 alarm :- earthquake, burglary.
[Kersting, De Raedt] mc(Person) pc(Mother) mc(Mother) Mother (1.0,0.0,0.0) a a (0.5,0.5,0.0) a b pc mc ... ... ... mother mc Person argument atom Person pc mc bt(Person) pc(Person) mc(Person) (1.0,0.0,0.0,0.0) a a (0.0,0.0,1.0,0.0) a b bt ... ... ... predicate Bayesian Logic Programs (BLPs) Rule Graph pc/1 mc/1 bt/1 variable bt(Person) :- pc(Person),mc(Person).
[Kersting, De Raedt] mc(Person) pc(Mother) mc(Mother) Mother (1.0,0.0,0.0) a a (0.5,0.5,0.0) a b pc mc ... ... ... mother mc Person Bayesian Logic Programs (BLPs) pc/1 mc/1 bt/1 mc(Person) | mother(Mother,Person), pc(Mother),mc(Mother). pc(Person) | father(Father,Person), pc(Father),mc(Father). bt(Person) | pc(Person),mc(Person).
[Kersting, De Raedt] mc(ann) pc(ann) mc(rex) pc(rex) mc(utta) mc(brian) pc(brian) pc(utta) pc(fred) pc(doro) mc(fred) mc(doro) bt(brian) bt(utta) bt(rex) bt(ann) mc(henry) pc(henry) bt(fred) bt(doro) bt(henry) Bayesian Logic Programs (BLPs) father(rex,fred). mother(ann,fred). father(brian,doro). mother(utta, doro). father(fred,henry). mother(doro,henry). mc(Person) | mother(Mother,Person), pc(Mother),mc(Mother). pc(Person) | father(Father,Person), pc(Father),mc(Father). bt(Person) | pc(Person),mc(Person). Bayesian Network induced over least Herbrand model
[Kersting, De Raedt] Bayesian Logic Programs (BLPs) • Unique probability distribution over Herbrand interpretations • Finite branching factor, finite proofs, no self-dependency • Highlight • Separation of qualitative and quantitative parts • Functors • Graphical Representation • Discrete and continuous RV • BNs, DBNs, HMMs, SCFGs, Prolog ... • Turing-complete programming language • Learning
mc(ann) pc(ann) mc(rex) pc(rex) mc(utta) mc(brian) pc(brian) pc(utta) pc(fred) pc(doro) mc(fred) mc(doro) bt(brian) bt(utta) bt(rex) bt(ann) mc(henry) pc(henry) bt(fred) bt(doro) bt(henry) Declaritive Semantics • Dependency Graph = (possibly infite) Bayesian network consequence operator If the body of C holds then the head holds, too: mc(fred) is true because mother(ann,fred) mc(ann),pc(ann) are true
mc(ann) pc(ann) mc(rex) pc(rex) mc(utta) mc(brian) pc(brian) pc(utta) pc(fred) pc(doro) mc(fred) mc(doro) bt(brian) bt(utta) bt(rex) bt(ann) mc(henry) pc(henry) bt(fred) bt(doro) bt(henry) Procedural Semantics P(bt(ann)) ?
P(bt(ann),bt(fred)) P(bt(fred)) Procedural Semantics Bayes‘ rule P(bt(ann)| bt(fred)) = P(bt(ann), bt(fred)) ? mc(ann) pc(ann) mc(rex) pc(rex) mc(utta) mc(brian) pc(brian) pc(utta) pc(fred) pc(doro) mc(fred) mc(doro) bt(brian) bt(utta) bt(rex) bt(ann) mc(henry) pc(henry) bt(fred) bt(doro) bt(henry)
Queries using And/Or trees P(bt(fred)) ? bt(fred) Or node is proven if at least one of its successors is provable. Andnode is proven if all of its successors are provable. pc(fred), mc(fred) pc(fred) mc(fred) father(rex,fred),mc(rex),pc(rex) mother(ann,fred),mc(ann),pc(ann) mc(ann) pc(ann) mc(rex) pc(rex) father(rex,fred) mother(ann,fred) pc(fred) mc(fred) mc(rex) mc(ann) bt(ann) pc(rex) pc(ann)) bt(fred) ...
Topic discusses Book prepared read Student prepared(Student,Topic) | read(Student,Book), discusses(Book,Topic). logic prepared bn passes prepared Student passes(Student) | prepared(Student,bn), prepared(Student,logic). Combining Partial Knowledge ... discusses/2 read/1 prepared/2 passes/1
discusses(b2,bn) discusses(b1,bn) prepared(s2,bn) prepared(s1,bn) Combining Partial Knowledge • variable # of parents for prepared/2 due to read/2 • whether a student prepared a topic depends on the books she read • CPD only for one book-topic pair Topic discusses Book prepared read Student prepared(Student,Topic) | read(Student,Book), discusses(Book,Topic).
Combining Rules Topic P(A|B) and P(A|C) discusses Book prepared read Student CR prepared(Student,Topic) | read(Student,Book), discusses(Book,Topic). P(A|B,C) • Any algorithm which • has an empty output if and only if the input is empty • combines a set of CPDs into a single (combined) CPD • E.g. noisy-or, regression, ...
... registration_grade/2 registered/2 student_ranking/1 Aggregates Map multisets of values to summary values (e.g., sum, average, max, cardinality)
... registration_grade/2 registered/2 registered/2 Functional Dependency (average) Course Student grade_avg/1 registration_grade grade_avg Probabilistic Dependency (CPD) grade_avg student_ranking/1 student_ranking Student Aggregates Map multisets of values to summary values (e.g., sum, average, max, cardinality) grade_avg/1 Deterministic
mc(Person) pc(Mother) mc(Mother) (1.0,0.0,0.0) a a (0.5,0.5,0.0) a b ... ... ... mc(Person) | mother(Mother,Person), pc(Mother),mc(Mother). pc(Person) | father(Father,Person), pc(Father),mc(Father). bt(Person) | pc(Person),mc(Person). Summary: Model-Theoretic Underlying logic pogram If the body holds then the head holds, too. Consequence operator + Conditional independencies encoded in the induced BN structure = Local probability models + (macro) CPDs noisy-or, ... CRs + Joint probability distribution over the least Herbrand interpretation =
Learning Tasks • Parameter Estimation • Numerical Optimization Problem • Model Selection • Combinatorical Search Learning Algorithm Database Model
Differences between SL and PLL ? • Representation (cf. above) • Structure on the search space becomes more complex • operators for traversing the space • … • Algorithms remain essentially the same
What is the data about? – Model Theoretic E Earthquake Burglary Alarm MaryCalls JohnCalls Model(1) earthquake=yes, burglary=no, alarm=?, marycalls=yes, johncalls=no Model(3) earthquake=?, burglary=?, alarm=yes, marycalls=yes, johncalls=yes Model(2) earthquake=no, burglary=no, alarm=no, marycalls=no, johncalls=no
What is the data about? – Model Theoretic • Data case: • Random Variable + States = (partial) Herbrand interpretation • Akin to „learning from interpretations“ in ILP Background m(ann,dorothy), f(brian,dorothy), m(cecily,fred), f(henry,fred), f(fred,bob), m(kim,bob), ... Model(2) bt(cecily)=ab, pc(henry)=a, mc(fred)=?, bt(kim)=a, pc(bob)=b Model(1) pc(brian)=b, bt(ann)=a, bt(brian)=?, bt(dorothy)=a Model(3) pc(rex)=b, bt(doro)=a, bt(brian)=? Bloodtype example
Parameter Estimation – Model Theoretic Database D Learning Algorithm + Parameter Q Underlying Logic program L
Parameter Estimation – Model Theoretic • Estimate the CPD q entries that best fit the data • „Best fit“: ML parameters q* q* = argmaxq P( data | logic program, q) • = argmaxq log P( data | logic program, q) • Reduces to problem to estimate parameters of a Bayesian networks: given structure, partially observed random varianbles
... ... ... ... Excourse: Decomposable CRs E • Parameters of the clauses and not of the support network. Multiple ground instance of the same clause Deterministic CPD for Combining Rule
Parameter Estimation – Model Theoretic + Parameter tighting
Inference M M M M M M P( head(GI), body(GI) | DC ) P( head(GI), body(GI) | DC ) P( body(GI) | DC ) DataCase DC Ground Instance GI Ground Instance GI Ground Instance GI DataCase DC DataCase DC EM – Model Theoretic EM-algorithm: iterate until convergence Logic Program L Expectation Initial Parameters q0 Current Model (M,qk) Expected counts of a clause Maximization Update parameters (ML, MAP)
Model Selection – Model Theoretic Database Learning Algorithm + Language: Bayesian bt/1, pc/1,mc/1 Background Knowledge: Logical mother/2, father/2
Model Selection – Model Theoretic • Combination of ILP and BN learning • Combinatorical search for hypo M* s.t. • M* logically covers the data D • M* is optimal w.r.t. some scoring function score, i.e., M* = argmaxM score(M,D). • Highlights • Refinement operators • Background knowledge • Language biase • Search bias
Refinement Operators • Add a fact, delete a fact or refine an existing clause: • Specialization: • Add atom • apply a substitution { X / Y } where X,Y already appear in atom • apply a substitution { X / f(Y1, … , Yn)} where Yi new variables • apply a substitution {X / c } where c is a constant • Generalization: • delete atom • turn ‘term’ into variable • p(a,f(b)) becomes p(X,f(b)) or p(a,f(X)) • p(a,a) becomes p(X,X) or p(a,X) or p(X,a) • replace two occurences of variable X into X1 and X2 • p(X,X) becomes p(X1,X2)
Original program Data cases mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). {m(ann,john)=true, pc(ann)=a, mc(ann)=?, f(eric,john)=true, pc(eric)=b, mc(eric)=a, mc(john)=ab, pc(john)=a, bt(john) = ? } ... mc(ann) mc(eric) pc(ann) pc(eric) mc(john) pc(john) m(ann,john) f(eric,john) bc(john) Example
Original program mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). mc(ann) mc(eric) pc(ann) pc(eric) Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). mc(john) pc(john) m(ann,john) f(eric,john) bc(john) Example
Original program mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). mc(ann) mc(eric) pc(ann) pc(eric) Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). mc(john) pc(john) m(ann,john) f(eric,john) bc(john) Example
Original program mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). Refinement mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Example mc(ann) mc(eric) pc(ann) pc(eric) mc(john) pc(john) m(ann,john) f(eric,john) bc(john)
Original program mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). Refinement Refinement mc(X) | m(M,X),mc(X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Example mc(ann) mc(eric) pc(ann) pc(eric) mc(john) pc(john) m(ann,john) f(eric,john) bc(john)
Original program mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). Refinement Refinement mc(X) | m(M,X),pc(X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Example mc(ann) mc(eric) pc(ann) pc(eric) mc(john) pc(john) m(ann,john) f(eric,john) bc(john)
Original program mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). Refinement Refinement mc(X) | m(M,X),pc(X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Example E mc(ann) mc(eric) pc(ann) pc(eric) mc(john) pc(john) m(ann,john) f(eric,john) bc(john) ...
Bias • Many clauses can be eliminated a priori • Due to type structure of clauses • e.g. atom(compound,atom, charge), bond(compound,atom,atom,bondtype) active(compound) • eliminate e.g. • active(C) :- atom(X,C,5) • not conform to type structure
Bias - continued • or to modes of predicates : determines calling pattern in queries • + : input; - : output • mode(atom(+,-,-)) • mode(bond(+,+,-,-)) • all variables in head are + (input) • active(C) :- bond(C,A1,A2,T) not mode conform • because A1 does not exist in left part of clause and argument declared + • active(C) :- atom(C,A,P), bond(C,A,A2,double) mode conform.
Conclusions on Learning • Algorithms remain essentially the same • Not single edges but bunches of edges are modified • Structure on the search space becomes more complex Refinement Operators Scores Statistical Learning Inductive Logic Programming/ Multi-relational Data Mining Independency Bias Priors Background Knowledge
Overview • Introduction to PLL • Foundations of PLL • Logic Programming, Bayesian Networks, Hidden Markov Models, Stochastic Grammars • Frameworks of PLL • Independent Choice Logic,Stochastic Logic Programs, PRISM, • Bayesian Logic Programs, Probabilistic Logic Programs,Probabilistic Relational Models • Logical Hidden Markov Models • Applications
CS course on data mining. Lecturer Wolfram Lecturer Luc CS department CS course on statistics CS course on Robotics Logical (Hidden) Markov Models • Each state is trained independently • No sharing of experience, large state space
course(cs,dm) lecturer(cs,luc) lecturer(cs,wolfram) dept(cs) course(cs,stats) course(cs,rob) dept(D) course(D,C) course(D,C) lecturer(D,L) Logical (Hidden) Markov Models (0.7) dept(D) -> course(D,C). (0.2) dept(D) -> lecturer(D,L). ... (0.3) course(D,C) -> lecturer(D,L). (0.3) course(D,C) -> dept(D). (0.3) course(D,C) -> course(D´,C´). ... (0.1) lecturer(D,L) -> course(D,C). ... Abstract states