1 / 72

Bayesian Logic Programs

Bayesian Logic Programs. Summer School on Relational Data Mining 17 and 18 August 2002, Helsinki, Finland. Kristian Kersting, Luc De Raedt Albert-Ludwigs University Freiburg, Germany. uncertainty. complex, structured domains. probability theory discrete, continuous. logic

jasper
Download Presentation

Bayesian Logic Programs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bayesian Logic Programs Summer School on Relational Data Mining 17 and 18 August 2002, Helsinki, Finland Kristian Kersting, Luc De Raedt Albert-Ludwigs University Freiburg, Germany Summer School on Relational Data Mining, 17 and 18 August, Helsinki, Finland K. Kersting, Luc De Raedt, Machine Learning Lab, Albert-Ludwigs-University, Freiburg, Germany

  2. uncertainty complex, structured domains probability theory discrete, continuous logic objects, relations,functors Bayesian networks Logic Programming (Prolog) + Bayesian logic programs Context Real-world applications

  3. Outline • Bayesian Logic Programs • Examples and Language • Semantics and Support Networks • Learning Bayesian Logic Programs • Data Cases • Parameter Estimation • Structural Learning

  4. Bayesian Logic Programs E • Probabilistic models structured using logic • Extend Bayesian networks with notions of objects and relations • Probability density over (countably) infinitely many random variables • Flexible discrete-time stochastic processes • Generalize pure Prolog, Bayesian networks, dynamic Bayesian networks, dynamic Bayesian multinets, hidden Markov models,...

  5. Bayesian Networks E • One of the successes of AI • State-of-the-art to model uncertainty, in particular the degree of belief • Advantage [Russell, Norvig 96]: „strict separation of qualitative and quantitative aspects of the world“ • Disadvantge [Breese, Ngo, Haddawy, Koller, ...]: Propositional character, no notion of objects and relations among them

  6. Stud farm (Jensen ´96) E • The colt John has been born recently on a stud farm. • John suffers from a life threatening hereditary carried by a recessive gene. The disease is so serious that John is displaced instantly, and the stud farm wants the gene out of production, his parents are taken out of breeding. • What are the probabilities for the remaining horses to be carriers of the unwanted gene?

  7. Bayesian networks [Pearl ´88] E Based on the stud farm example [Jensen ´96] bt_unknown1 bt_ann bt_brian bt_cecily bt_unknown2 bt_gwenn bt_fred bt_dorothy bt_eric bt_henry bt_irene bt_john

  8. Bayesian networks [Pearl ´88] E Based on the stud farm example [Jensen ´96] bt_unknown1 bt_ann bt_brian bt_cecily bt_unknown2 bt_gwenn bt_fred bt_dorothy bt_eric bt_henry bt_irene (Conditional) Probability distribution bt_john P(bt_cecily=aA|bt_john=aA)=0.1499 P(bt_john=AA|bt_ann=aA)=0.6906 P(bt_john=AA)=0.9909

  9. Bayesian networks (contd.) E • acyclic graphs • probability distribution over a finite set of random variables:

  10. From Bayesian Networks to Bayesian Logic Programs E bt_unknown1 bt_ann bt_brian bt_cecily bt_unknown2 bt_fred bt_dorothy bt_eric bt_gwenn bt_henry bt_irene bt_john

  11. From Bayesian Networks to Bayesian Logic Programs E bt_unknown1. bt_ann. bt_brian. bt_cecily. bt_unknown2. bt_fred bt_dorothy bt_eric bt_gwenn bt_henry bt_irene bt_john

  12. From Bayesian Networks to Bayesian Logic Programs E bt_unknown1. bt_ann. bt_brian. bt_cecily. bt_unknown2. bt_fred | bt_unknown1, bt_ann. bt_dorothy | bt_ann, bt_brian. bt_eric| bt_brian, bt_cecily. bt_gwenn | bt_ann, bt_unknown2. bt_henry bt_irene bt_john

  13. From Bayesian Networks to Bayesian Logic Programs E bt_unknown1. bt_ann. bt_brian. bt_cecily. bt_unknown2. bt_fred | bt_unknown1, bt_ann. bt_dorothy | bt_ann, bt_brian. bt_eric| bt_brian, bt_cecily. bt_gwenn | bt_ann, bt_unknown2. bt_henry | bt_fred, bt_dorothy. bt_irene | bt_eric, bt_gwenn. bt_john

  14. From Bayesian Networks to Bayesian Logic Programs E bt_unknown1. bt_ann. bt_brian. bt_cecily. bt_unknown2. bt_fred | bt_unknown1, bt_ann. bt_dorothy | bt_ann, bt_brian. bt_eric| bt_brian, bt_cecily. bt_gwenn | bt_ann, bt_unknown2. bt_henry | bt_fred, bt_dorothy. bt_irene | bt_eric, bt_gwenn. bt_john | bt_henry ,bt_irene.

  15. Domain e.g. finite, discrete, continuous P(bt_john) bt_henry bt_irene (1.0,0.0,0.0) aa aa (0.5,0.5,0.0) aa aA (conditional) probability distribution (0.0,1.0,0.0) aa AA ... (0.33,0.33,0.33) AA AA From Bayesian Networks to Bayesian Logic Programs E % apriori nodes bt_ann. bt_brian. bt_cecily. bt_unknown1. bt_unknown1. % aposteriori nodes bt_henry | bt_fred, bt_dorothy. bt_irene | bt_eric, bt_gwenn. bt_fred | bt_unknown1, bt_ann. bt_dorothy| bt_brian, bt_ann. bt_eric | bt_brian, bt_cecily. bt_gwenn | bt_unknown2, bt_ann. bt_john | bt_henry, bt_irene.

  16. P(bt(john)) bt(henry) bt(irene) (1.0,0.0,0.0) aa aa (0.5,0.5,0.0) aa aA (conditional) probability distribution (0.0,1.0,0.0) aa AA ... (0.33,0.33,0.33) AA AA From Bayesian Networks to Bayesian Logic Programs E % apriori nodes bt(ann). bt(brian). bt(cecily). bt(unknown1). bt(unknown1). % aposteriori nodes bt(henry) | bt(fred), bt(dorothy). bt(irene) | bt(eric), bt(gwenn). bt(fred) | bt(unknown1), bt(ann). bt(dorothy)| bt(brian), bt(ann). bt(eric) | bt(brian), bt(cecily). bt(gwenn) | bt(unknown2), bt(ann). bt(john) | bt(henry), bt(irene).

  17. P(bt(X)) father(F,X) bt(F) mother(M,X) bt(M) (conditional) probability distribution (1.0,0.0,0.0) true Aa true aa ... (0.33,0.33,0.33) false AA false AA From Bayesian Networks to Bayesian Logic Programs E % ground facts / apriori bt(ann). bt(brian). bt(cecily). bt(unkown1). bt(unkown1). father(unkown1,fred). mother(ann,fred). father(brian,dorothy). mother(ann, dorothy). father(brian,eric). mother(cecily,eric). father(unkown2,gwenn). mother(ann,gwenn). father(fred,henry). mother(dorothy,henry). father(eric,irene). mother(gwenn,irene). father(henry,john). mother(irene,john). % rules/ aposteriori bt(X) | father(F,X), bt(F), mother(M,X), bt(M).

  18. Dependency graph = Bayesian network E bt(unknown2) father(brian,dorothy) bt(cecily) bt(brian) father(brian,eric) bt(ann) mother(ann,dorothy) mother(ann,gwenn) mother(cecily,eric) bt(eric) bt(dorothy) bt(gwenn) bt(unknown1) father(eric,irene) bt(fred) bt(irene) mother(dorothy,henry) father(unknown2,eric) father(unknown1,fred) bt(henry) bt(john) mother(gwenn,irene) mother(ann,fred) father(henry,john) father(fred,henry) mother(irene,john)

  19. Dependency graph = Bayesian network E bt_unknown1 bt_ann bt_brian bt_cecily bt_unknown2 bt_fred bt_dorothy bt_eric bt_gwenn bt_henry bt_irene bt_john

  20. Bayesian Logic Programs- a first definition E A BLP consists of • a finite set of Bayesian clauses. • To each clause in a conditional probability distribution is associated: • Proper random variables ~ LH(B) • graphical structure ~ dependency graph • Quantitative information ~ CPDs

  21. nat(0) nat(s(0)) nat(s(s(0)) state(0) state(s(0)) ... output(0) output(s(0)) n2(s(0)) ... n2(0) n3(0) n3(s(0)) n1(0) n1(s(0)) Bayesian Logic Programs- Examples E pure Prolog % apriori nodes nat(0). % aposteriori nodes nat(s(X)) | nat(X). MC ... % apriori nodes state(0). % aposteriori nodes state(s(Time)) | state(Time). output(Time) | state(Time) HMM % apriori nodes n1(0). % aposteriori nodes n1(s(TimeSlice) | n2(TimeSlice). n2(TimeSlice) | n1(TimeSlice). n3(TimeSlice) | n1(TimeSlice), n2(TimeSlice). DBN

  22. P(bt(X)) father(F,X) bt(F) mother(M,X) bt(M) (1.0,0.0,0.0) true Aa true aa ... (0.33,0.33,0.33) false AA false AA ... Multiple ground instances of clauses having the same head atom? Associated CPDs E • represent generically the CPD for each ground instanceof the corresponding Bayesian clause.

  23. cpd(bt(john)|father(henry,john), bt(henry)) and cpd(bt(john)|mother(henry,john), bt(irene)) But we need !!! cpd(bt(john)|father(henry,john),bt(henry),mother(irene,john),bt(irene)) Combining Rules E Multiple ground instances of clauses having the same head atom? % ground facts as before % rules bt(X) | father(F,X), bt(F). bt(X) | mother(M,X), bt(M).

  24. Combining Rules (contd.) E • Any algorithm which • combines a set of PDFs into the (combined) PDFs where • has an empty output if and only if the input is empty • E.g. noisy-or, regression, ... P(A|B) and P(A|C) CR P(A|B,C)

  25. Bayesian Logic Programs- a definition E A BLP consists of • a finite set of Bayesian clauses. • To each clause in a conditional probability distribution is associated: • To each Bayesian predicate p a combining rule is associated to combine CPDs of multiple ground instances of clauses having the same head • Proper random variables ~ LH(B) • graphical structure ~ dependency graph • Quantitative information ~ CPDs and CRs

  26. Outline • Bayesian Logic Programs • Examples and Language • Semantics and Support Networks • Learning Bayesian Logic Programs • Data Cases • Parameter Estimation • Structural Learning E

  27. Discrete-Time Stochastic Process E • Family of random variables over a domain X, where • for each linearization of the partial order induced by the dependency graph a Bayesian logic program specifies a discrete-time stochastic process

  28. : a Polish space • : set of all non-empty, finite subsets of J • : the probability measure over • If the projective family exists then there exists a unique probability measure Existence and uniqueness of probability measure Theorem of Kolmogorov E

  29. Consistency Conditions E • Probability measure , is represented by a finite Bayesian network which is a subnetwork of the dependency graph over LH(B): Support Network • (Elimination Order): All stochastic processes represented by a Bayesian logic program B specify the same probability measure over LH(B).

  30. Support network E bt(unknown2) father(brian,dorothy) bt(cecily) bt(brian) father(brian,eric) bt(ann) mother(ann,dorothy) mother(ann,gwenn) mother(cecily,eric) bt(eric) bt(dorothy) bt(gwenn) bt(unknown1) father(eric,irene) bt(fred) bt(irene) mother(dorothy,henry) father(unknown2,eric) father(unknown1,fred) bt(henry) bt(john) mother(gwenn,irene) mother(ann,fred) father(henry,john) father(fred,henry) mother(irene,john)

  31. Support network E bt(unknown2) father(brian,dorothy) bt(cecily) bt(brian) father(brian,eric) bt(ann) mother(ann,dorothy) mother(ann,gwenn) mother(cecily,eric) bt(eric) bt(dorothy) bt(gwenn) bt(unknown1) father(eric,irene) bt(fred) bt(irene) mother(dorothy,henry) father(unknown2,eric) father(unknown1,fred) bt(henry) bt(john) mother(gwenn,irene) mother(ann,fred) father(henry,john) father(fred,henry) mother(irene,john)

  32. Support network E bt(unknown2) father(brian,dorothy) bt(cecily) bt(brian) father(brian,eric) bt(ann) mother(ann,dorothy) mother(ann,gwenn) mother(cecily,eric) bt(eric) bt(dorothy) bt(gwenn) bt(unknown1) father(eric,irene) bt(fred) bt(irene) mother(dorothy,henry) father(unknown2,eric) father(unknown1,fred) bt(henry) bt(john) mother(gwenn,irene) mother(ann,fred) father(henry,john) father(fred,henry) mother(irene,john)

  33. Support network E • Support network of is the induced subnetwork of • Support network of is defined as • Computation utilizes And/Or trees

  34. bt(eric) cpd father(brian,eric),bt(brian), mother(cecily,eric),bt(cecily) • Ornode is proven if at least one of its successors is provable. • Andnode is proven if all of its successors are provable. bt(brian) bt(cecily) father(brian,eric) mother(cecily,eric) combined cpd bt(eric) bt(brian) bt(cecily) father(brian,eric) mother(cecily,eric) Queries using And/Or trees E • A probabilistic query ?- Q1...,Qn|E1=e1,...,Em=em. asks for the distribution P(Q1, ..., Qn |E1=e1, ..., Em=em). • ?- bt(eric).

  35. Projective family exists if well-defined Bayesian logic program Consistency Condition (contd.) E • the dependency graph is acyclic, and • every random variable is influenced by a finite set of random variables only

  36. % ground facts bt(petra). bt(bsilvester). bt(anne). bt(wilhelm). bt(beate). father(silvester,claudien). mother(beate,claudien). father(wilhelm,marta). mother(anne, marthe). ... % ground facts bt(susanne). bt(ralf). bt(peter). bt(uta). father(ralf,luca). mother(susanne,luca). ... Relational Character E % ground facts bt(ann). bt(brian). bt(cecily). bt(unknown1). bt(unknown1). father(unknown1,fred). mother(ann,fred). father(brian,dorothy). mother(ann, dorothy). father(brian,eric). mother(cecily,eric). father(unknown2,gwenn). mother(ann,gwenn). father(fred,henry). mother(dorothy,henry). father(eric,irene). mother(gwenn,irene). father(henry,john). mother(irene,john). % rules bt(X) | father(F,X), bt(F), mother(M,X), bt(M).

  37. Bayesian Logic Programs- Summary E • First order logic extension of Bayesian networks • constants, relations, functors • discrete and continuous random variables • ground atoms = random variables • CPDs associated to clauses • Dependency graph = (possibly) infinite Bayesian network • Generalize dynamic Bayesian networks and definite clause logic (range-restricted)

  38. Applications • Probabilistic, logical • Description and prediction • Regression • Classification • Clustering • Computational Biology • APrIL IST-2001-33053 • Web Mining • Query approximation • Planning, ...

  39. Other frameworks • Probabilistic Horn Abduction [Poole 93] • Distributional Semantics (PRISM) [Sato 95] • Stochastic Logic Programs [Muggleton 96; Cussens 99] • Relational Bayesian Nets [Jaeger 97] • Probabilistic Logic Programs [Ngo, Haddawy 97] • Object-Oriented Bayesian Nets [Koller, Pfeffer 97] • Probabilistic Frame-Based Systems [Koller, Pfeffer 98] • Probabilistic Relational Models [Koller 99]

  40. Outline • Bayesian Logic Programs • Examples and Language • Semantics and Support Networks • Learning Bayesian Logic Programs • Data Cases • Parameter Estimation • Structural Learning E

  41. A | E, B. learning algorithm E B P(A) .9 .1 e b e b .7 .3 .8 .2 e b .99 .01 e b Learning Bayesian Logic Programs D Data + Background Knowledge

  42. Why Learning Bayesian Logic Programs ? D Learning within Bayesian network Inductive Logic Programming Of interest to different communities ? • scoring functions, pruning techniques, theoretical insights, ... Learning within Bayesian Logic Programs

  43. A data case is a partially observed • joint state of a finite, nonempty subset What is the data about ? E

  44. Learning Task E Given: • set of data cases • a Bayesian logic program B Goal:for each the parameters of that best fit the given data

  45. Parameter Estimation (contd.) E • „best fit“ ~ ML-Estimation where the hypothesis space is spanned by the product space over the possible values of :

  46. Parameter Estimation (contd.) E Assumption: D1,...,DN are independently sampled from indentical distributions (e.g. totally separated families),),

  47. marginal of marginal of is an ordinary BN Parameter Estimation (contd.) E

  48. Parameter Estimation (contd.) E • Reduced to a problem within Bayesian networks: given structure, partially observed random varianbles • EM [Dempster, Laird, Rubin, ´77],[Lauritzen, ´91] • Gradient Ascent [Binder, Koller, Russel, Kanazawa, ´97], [Jensen, ´99]

  49. ... ... ... ... Decomposable CRs E • Parameters of the clauses and not of the support network. Single ground instance of a Bayesian clause Multiple ground instance of the same Bayesian clause CPD for Combining Rule

  50. Bayesian ground clause Bayesian clause Gradient Ascent E

More Related