Learning First-Order Probabilistic Models with Combining Rules

Learning First-Order Probabilistic Models with Combining Rules Sriraam Natarajan Prasad Tadepalli Eric Altendorf Thomas G. Dietterich Alan Fern Angelo Restificar School of EECS Oregon State University

First-order Probabilistic Models • Combine the expressiveness of first-order logic with the uncertainty modeling of the graphical models • Several formalisms already exist: • Probabilistic Relational Models (PRMs) • Bayesian Logic Programs (BLPs) • Stochastic Logic Programs (SLPs) • Relational Bayesian Networks (RBNs) • Probabilistic Logic Programs (PLPs), … • Parameter sharing and quantification allow compact representation The

Multiple Parents Problem • Often multiple objects are related to an object by the same relationship • One’s friend’s drinking habits influence one’s own • A student’s GPA depends on the grades in the courses he takes • The size of a mosquito population depends on the temperature and the rainfall each day since the last freeze • The target variable in each of these statements has multiple influents (“parents” in Bayes net jargon)

Temp1 Rain1 Temp2 Rain2 Temp3 Rain3 Population Multiple Parents for population • Variable number of parents • Large number of parents • Need for compact parameterization

Solution 1: Aggregators Temp1 Rain1 Temp2 Rain2 Temp3 Rain3 Deterministic AverageTemp AverageRain Stochastic Population Problem: Does not take into account the interaction between related parents Rain and Temp

Solution 2: Combining Rules Temp1 Rain1 Temp2 Rain2 Temp3 Rain3 Population1 Population3 Population2 Population • Top 3 distributions share parameters • The 3 distributions are combined into one final distribution

First-order Conditional Influence Language (FOCIL) • Task and role of a document influence its folder if {task(t), doc(d), role(d,r,t)} then r.id, t.id Qinf d.folder. • The folder of the source of the document influences the folder of the document if {doc(d1), doc(d2), source(d1,d2)} then d1.folder Qinf d2.folder • The difficulty of the course and the intelligence of the student influence his/her GPA if (student(s), course(c), takes(s,c))} then s.IQ, c.difficulty Qinf s.gpa)

Combining Multiple Instances of a Single Statement If {task(t), doc(d), role(d,r,t)} then t.id, r.id Qinf (Mean) d.folder t1.id r1.id t2.id r2.id d.folder d.folder Mean d.folder

A Different FOCIL Statement for the Same Target Variable If {doc(s), doc(d), source(s,d) } then s.folder Qinf (Mean) d.folder s1.folder s2.folder d.folder d.folder Mean d.folder

Combining Multiple Statements Weighted Mean{ If {task(t), doc(d), role(d,r,t)} then t.id, r.id Qinf (Mean) d.folder If {doc(s), doc(d), source(s,d) } then s.folder Qinf (Mean) d.folder }

“Unrolled” Network for Folder Prediction t1.id r1.id t2.id r2.id s1.folder s2.folder d.folder d.folder d.folder d.folder Mean Mean d.folder d.folder Weighted Mean d.folder

General Unrolled Network X2m2,k … … X2m2,k … … … … X1m1,k X11,1 X12,1 X1m1,k X21,1 X22,1 X11,k X12,k X21,k X22,k … … m1 m2 1 2 1 2 Mean Mean Rule1 Rule2 Y Weighted mean

Gradient Descent for Squared Error • Squared error where

Gradient Descent for Loglikelihood • Loglikelihood , where

Learning the weights • Mean Squared Error • Loglikelihood

Expectation-Maximization X2m2,k … … X2m2,k … … X1m1,k X11,1 X1m1,k X21,1 X11,k X21,k … … 1m1 21 11 2m2 m1 m2 1 1 1/m1 1/m1 1/m2 1/m2 Mean Mean w2 w1 Weighted mean Y

EM learning • Expectation-step: Compute the responsibilities of each instance of each rule • Maximization-step: Compute the maximum likelihood parameters using responsibilities as the counts where n is the # of examples with 2 or more rules instantiated

Experimental Setup • 500 documents, 6 tasks, 2 roles, 11 folders • Each document typically has 1-2 task-role pairs • 25% of documents have a source folder • 10-fold cross validation Weighted Mean{ If {task(t), doc(d), role(d,r,t)} then t.id, r.id Qinf (Mean) d.folder. If {doc(s), doc(d), source(s,d) } then s.folder Qinf (Mean) d.folder. }

Folder prediction task • Mean reciprocal rank – where ni is the number of times the true folder was ranked as i • Propositional classifiers: • Decision trees and Naïve Bayes • Features are the number of occurrences of each task-role pair and source document folder

Lessons from Real-world Data • The propositional learners are almost as good as the first-order learners in this domain! • The number of parents is 1-2 in this domain • About ¾ of the time only one rule is applicable • Ranking of probabilities is easy in this case • Accurate modeling of the probabilities is needed • Making predictions that combine with other predictions • Cost-sensitive decision making

Synthetic Data Set • 2 rules with 2 inputs each: Wrule1= 0.1,Wrule2= 0.9 • Probability that an example matches a rule = .5 • If an example matches a rule, the number of instances is 3 - 10 • Performance metric: average absolute error in predicted probability

Synthetic Data Set - Results

Synthetic Data Set GDMS

Synthetic Data Set GDLL

Synthetic Data Set EM

Conclusions • Introduced a general instance of multiple parents problem in first-order probabilistic languages • Gradient descent and EM successfully learn the parameters of the conditional distributions as well as the parameters of the combining rules (weights) • First order methods significantly outperform propositional methods in modeling the distributions when the number of parents ¸ 3

Future Work • We plan to extend these results to more general classes of combining rules • Develop efficient inference algorithms with combining rules • Develop compelling applications • Combining rules and aggregators • Can they both be understood as instances of causal independence?

Learning First-Order Probabilistic Models with Combining Rules

Learning First-Order Probabilistic Models with Combining Rules

Presentation Transcript

First-Order Probabilistic Languages: Into the Unknown

Learning with Probabilistic Features for Improved Pipeline Models

Learning Probabilistic Relational Models

Probabilistic models

Probabilistic Models

Active Learning for Probabilistic Models

Combining models

Probabilistic Models

Learning Probabilistic Environmental Models with Vision: Successes and Challenges

CS b351 Learning Probabilistic Models

Lifted First-Order Probabilistic Inference

Learning Probabilistic Models of Link Structure

Learning Probabilistic Relational Models

Models in First Order Logics

Probabilistic Models

Probabilistic Models

First-Order Probabilistic Languages: Into the Unknown

Probabilistic Models

Probabilistic Models

First-Order Probabilistic Inference

First-Order Rule Learning

Probabilistic models