1 / 28

Learning First-Order Probabilistic Models with Combining Rules

Learning First-Order Probabilistic Models with Combining Rules. Sriraam Natarajan Prasad Tadepalli Eric Altendorf Thomas G. Dietterich Alan Fern Angelo Restificar School of EECS Oregon State University. First-order Probabilistic Models.

Download Presentation

Learning First-Order Probabilistic Models with Combining Rules

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning First-Order Probabilistic Models with Combining Rules Sriraam Natarajan Prasad Tadepalli Eric Altendorf Thomas G. Dietterich Alan Fern Angelo Restificar School of EECS Oregon State University

  2. First-order Probabilistic Models • Combine the expressiveness of first-order logic with the uncertainty modeling of the graphical models • Several formalisms already exist: • Probabilistic Relational Models (PRMs) • Bayesian Logic Programs (BLPs) • Stochastic Logic Programs (SLPs) • Relational Bayesian Networks (RBNs) • Probabilistic Logic Programs (PLPs), … • Parameter sharing and quantification allow compact representation The

  3. Multiple Parents Problem • Often multiple objects are related to an object by the same relationship • One’s friend’s drinking habits influence one’s own • A student’s GPA depends on the grades in the courses he takes • The size of a mosquito population depends on the temperature and the rainfall each day since the last freeze • The target variable in each of these statements has multiple influents (“parents” in Bayes net jargon)

  4. Temp1 Rain1 Temp2 Rain2 Temp3 Rain3 Population Multiple Parents for population • Variable number of parents • Large number of parents • Need for compact parameterization

  5. Solution 1: Aggregators Temp1 Rain1 Temp2 Rain2 Temp3 Rain3 Deterministic AverageTemp AverageRain Stochastic Population Problem: Does not take into account the interaction between related parents Rain and Temp

  6. Solution 2: Combining Rules Temp1 Rain1 Temp2 Rain2 Temp3 Rain3 Population1 Population3 Population2 Population • Top 3 distributions share parameters • The 3 distributions are combined into one final distribution

  7. First-order Conditional Influence Language (FOCIL) • Task and role of a document influence its folder if {task(t), doc(d), role(d,r,t)} then r.id, t.id Qinf d.folder. • The folder of the source of the document influences the folder of the document if {doc(d1), doc(d2), source(d1,d2)} then d1.folder Qinf d2.folder • The difficulty of the course and the intelligence of the student influence his/her GPA if (student(s), course(c), takes(s,c))} then s.IQ, c.difficulty Qinf s.gpa)

  8. Combining Multiple Instances of a Single Statement If {task(t), doc(d), role(d,r,t)} then t.id, r.id Qinf (Mean) d.folder t1.id r1.id t2.id r2.id d.folder d.folder Mean d.folder

  9. A Different FOCIL Statement for the Same Target Variable If {doc(s), doc(d), source(s,d) } then s.folder Qinf (Mean) d.folder s1.folder s2.folder d.folder d.folder Mean d.folder

  10. Combining Multiple Statements Weighted Mean{ If {task(t), doc(d), role(d,r,t)} then t.id, r.id Qinf (Mean) d.folder If {doc(s), doc(d), source(s,d) } then s.folder Qinf (Mean) d.folder }

  11. “Unrolled” Network for Folder Prediction t1.id r1.id t2.id r2.id s1.folder s2.folder d.folder d.folder d.folder d.folder Mean Mean d.folder d.folder Weighted Mean d.folder

  12. General Unrolled Network X2m2,k … … X2m2,k … … … … X1m1,k X11,1 X12,1 X1m1,k X21,1 X22,1 X11,k X12,k X21,k X22,k … … m1 m2 1 2 1 2 Mean Mean Rule1 Rule2 Y Weighted mean

  13. Gradient Descent for Squared Error • Squared error where

  14. Gradient Descent for Loglikelihood • Loglikelihood , where

  15. Learning the weights • Mean Squared Error • Loglikelihood

  16. Expectation-Maximization X2m2,k … … X2m2,k … … X1m1,k X11,1 X1m1,k X21,1 X11,k X21,k … … 1m1 21 11 2m2 m1 m2 1 1 1/m1 1/m1 1/m2 1/m2 Mean Mean w2 w1 Weighted mean Y

  17. EM learning • Expectation-step: Compute the responsibilities of each instance of each rule • Maximization-step: Compute the maximum likelihood parameters using responsibilities as the counts where n is the # of examples with 2 or more rules instantiated

  18. Experimental Setup • 500 documents, 6 tasks, 2 roles, 11 folders • Each document typically has 1-2 task-role pairs • 25% of documents have a source folder • 10-fold cross validation Weighted Mean{ If {task(t), doc(d), role(d,r,t)} then t.id, r.id Qinf (Mean) d.folder. If {doc(s), doc(d), source(s,d) } then s.folder Qinf (Mean) d.folder. }

  19. Folder prediction task • Mean reciprocal rank – where ni is the number of times the true folder was ranked as i • Propositional classifiers: • Decision trees and Naïve Bayes • Features are the number of occurrences of each task-role pair and source document folder

  20. Lessons from Real-world Data • The propositional learners are almost as good as the first-order learners in this domain! • The number of parents is 1-2 in this domain • About ¾ of the time only one rule is applicable • Ranking of probabilities is easy in this case • Accurate modeling of the probabilities is needed • Making predictions that combine with other predictions • Cost-sensitive decision making

  21. Synthetic Data Set • 2 rules with 2 inputs each: Wrule1= 0.1,Wrule2= 0.9 • Probability that an example matches a rule = .5 • If an example matches a rule, the number of instances is 3 - 10 • Performance metric: average absolute error in predicted probability

  22. Synthetic Data Set - Results

  23. Synthetic Data Set GDMS

  24. Synthetic Data Set GDLL

  25. Synthetic Data Set EM

  26. Conclusions • Introduced a general instance of multiple parents problem in first-order probabilistic languages • Gradient descent and EM successfully learn the parameters of the conditional distributions as well as the parameters of the combining rules (weights) • First order methods significantly outperform propositional methods in modeling the distributions when the number of parents ¸ 3

  27. Future Work • We plan to extend these results to more general classes of combining rules • Develop efficient inference algorithms with combining rules • Develop compelling applications • Combining rules and aggregators • Can they both be understood as instances of causal independence?

More Related