280 likes | 409 Views
Learning First-Order Probabilistic Models with Combining Rules. Sriraam Natarajan Prasad Tadepalli Eric Altendorf Thomas G. Dietterich Alan Fern Angelo Restificar School of EECS Oregon State University. First-order Probabilistic Models.
E N D
Learning First-Order Probabilistic Models with Combining Rules Sriraam Natarajan Prasad Tadepalli Eric Altendorf Thomas G. Dietterich Alan Fern Angelo Restificar School of EECS Oregon State University
First-order Probabilistic Models • Combine the expressiveness of first-order logic with the uncertainty modeling of the graphical models • Several formalisms already exist: • Probabilistic Relational Models (PRMs) • Bayesian Logic Programs (BLPs) • Stochastic Logic Programs (SLPs) • Relational Bayesian Networks (RBNs) • Probabilistic Logic Programs (PLPs), … • Parameter sharing and quantification allow compact representation The
Multiple Parents Problem • Often multiple objects are related to an object by the same relationship • One’s friend’s drinking habits influence one’s own • A student’s GPA depends on the grades in the courses he takes • The size of a mosquito population depends on the temperature and the rainfall each day since the last freeze • The target variable in each of these statements has multiple influents (“parents” in Bayes net jargon)
Temp1 Rain1 Temp2 Rain2 Temp3 Rain3 Population Multiple Parents for population • Variable number of parents • Large number of parents • Need for compact parameterization
Solution 1: Aggregators Temp1 Rain1 Temp2 Rain2 Temp3 Rain3 Deterministic AverageTemp AverageRain Stochastic Population Problem: Does not take into account the interaction between related parents Rain and Temp
Solution 2: Combining Rules Temp1 Rain1 Temp2 Rain2 Temp3 Rain3 Population1 Population3 Population2 Population • Top 3 distributions share parameters • The 3 distributions are combined into one final distribution
First-order Conditional Influence Language (FOCIL) • Task and role of a document influence its folder if {task(t), doc(d), role(d,r,t)} then r.id, t.id Qinf d.folder. • The folder of the source of the document influences the folder of the document if {doc(d1), doc(d2), source(d1,d2)} then d1.folder Qinf d2.folder • The difficulty of the course and the intelligence of the student influence his/her GPA if (student(s), course(c), takes(s,c))} then s.IQ, c.difficulty Qinf s.gpa)
Combining Multiple Instances of a Single Statement If {task(t), doc(d), role(d,r,t)} then t.id, r.id Qinf (Mean) d.folder t1.id r1.id t2.id r2.id d.folder d.folder Mean d.folder
A Different FOCIL Statement for the Same Target Variable If {doc(s), doc(d), source(s,d) } then s.folder Qinf (Mean) d.folder s1.folder s2.folder d.folder d.folder Mean d.folder
Combining Multiple Statements Weighted Mean{ If {task(t), doc(d), role(d,r,t)} then t.id, r.id Qinf (Mean) d.folder If {doc(s), doc(d), source(s,d) } then s.folder Qinf (Mean) d.folder }
“Unrolled” Network for Folder Prediction t1.id r1.id t2.id r2.id s1.folder s2.folder d.folder d.folder d.folder d.folder Mean Mean d.folder d.folder Weighted Mean d.folder
General Unrolled Network X2m2,k … … X2m2,k … … … … X1m1,k X11,1 X12,1 X1m1,k X21,1 X22,1 X11,k X12,k X21,k X22,k … … m1 m2 1 2 1 2 Mean Mean Rule1 Rule2 Y Weighted mean
Gradient Descent for Squared Error • Squared error where
Gradient Descent for Loglikelihood • Loglikelihood , where
Learning the weights • Mean Squared Error • Loglikelihood
Expectation-Maximization X2m2,k … … X2m2,k … … X1m1,k X11,1 X1m1,k X21,1 X11,k X21,k … … 1m1 21 11 2m2 m1 m2 1 1 1/m1 1/m1 1/m2 1/m2 Mean Mean w2 w1 Weighted mean Y
EM learning • Expectation-step: Compute the responsibilities of each instance of each rule • Maximization-step: Compute the maximum likelihood parameters using responsibilities as the counts where n is the # of examples with 2 or more rules instantiated
Experimental Setup • 500 documents, 6 tasks, 2 roles, 11 folders • Each document typically has 1-2 task-role pairs • 25% of documents have a source folder • 10-fold cross validation Weighted Mean{ If {task(t), doc(d), role(d,r,t)} then t.id, r.id Qinf (Mean) d.folder. If {doc(s), doc(d), source(s,d) } then s.folder Qinf (Mean) d.folder. }
Folder prediction task • Mean reciprocal rank – where ni is the number of times the true folder was ranked as i • Propositional classifiers: • Decision trees and Naïve Bayes • Features are the number of occurrences of each task-role pair and source document folder
Lessons from Real-world Data • The propositional learners are almost as good as the first-order learners in this domain! • The number of parents is 1-2 in this domain • About ¾ of the time only one rule is applicable • Ranking of probabilities is easy in this case • Accurate modeling of the probabilities is needed • Making predictions that combine with other predictions • Cost-sensitive decision making
Synthetic Data Set • 2 rules with 2 inputs each: Wrule1= 0.1,Wrule2= 0.9 • Probability that an example matches a rule = .5 • If an example matches a rule, the number of instances is 3 - 10 • Performance metric: average absolute error in predicted probability
Conclusions • Introduced a general instance of multiple parents problem in first-order probabilistic languages • Gradient descent and EM successfully learn the parameters of the conditional distributions as well as the parameters of the combining rules (weights) • First order methods significantly outperform propositional methods in modeling the distributions when the number of parents ¸ 3
Future Work • We plan to extend these results to more general classes of combining rules • Develop efficient inference algorithms with combining rules • Develop compelling applications • Combining rules and aggregators • Can they both be understood as instances of causal independence?