Practical Probabilistic Relational Learning

Practical Probabilistic Relational Learning Sriraam Natarajan

Take-Away Message Learn from rich, highly structured data!

Traditional Learning Data is i.i.d. Burglary Earthquake + Alarm MaryCalls JohnCalls Attributes(Features) Data

Learning Earthquake Burglary Alarm JohnCalls MaryCalls

Real-World Problem: Predicting Adverse Drug Reactions PatientID Date Physician Symptoms Diagnosis P1 1/1/01 Smith palpitations hypoglycemic P1 2/1/03 Jones fever, aches influenza PatientIDGenderBirthdate P1 M 3/22/63 Visit Table Patient Table PatientID Date Lab Test Result PatientID SNP1 SNP2 … SNP500K P1 AA AB BB P2 AB BB AA P1 1/1/01 blood glucose 42 P1 1/9/01 blood glucose 45 SNP Table Lab Tests PatientID Date Prescribed Date Filled Physician Medication Dose Duration P1 5/17/98 5/18/98 Jones prilosec 10mg 3 months Prescriptions

Logic + Probability = Probabilistic Logic aka Statistical Relational Learning Models Statistical Relational Learning (SRL) Add Probabilities Logic Add Relations Probabilities • Several previous SRL Workshops in the past decade • This year – StaRAI @ AAAI 2013

Classical Machine Learning Statistical Relational Learning Probability Theory Probabilistic Logic Stochastic Deterministic Prop Rule Learning Inductive Logic Programming Learning First Order Logic Propositional Logic No Learning Prop FO

Costs and Benefits of the SRL soup • Benefits • Rich pool of different languages • Very likely that there is a language that fits your task at hand well • A lot research remains to be done, ;-) • Costs • “Learning” SRL is much harder • Not all frameworks support all kinds of inference and learning settings How do weactuallylearn relational modelsfromdata?

Why is this problem hard? • Non-convex problem • Repeated search of parameters for every step in induction of the model • First-order logic allows for different levels of generalization • Repeated inference for every step of parameter learning • Inference is P# complete • How can we scale this?

Relational Probability Trees To predict heartAttack(X) male(X) • Each conditional probability distribution can be learned as a tree • Leaves are probabilities • The final model is the set of the RRTs yes no chol(X,Y,L), Y>40,L>200 … yes no diag(X,Hypertension,Z),Z>55 0.8 no yes bmi(X,W,55), W>30 0.05 [Blockeel & De Raedt ’98] no yes 0.3 0.77

Gradient (Tree) Boosting[Friedman AnnalsofStatistics 29(5):1189-1232, 2001] • Models = weighted combination of a large number ofsmalltrees (models) • Intuition: Generate an additive model by sequentially fitting small trees to pseudo-residuals from a regression at each iteration… Data Residuals = - Data Induce + Predictions Loss fct + Initial Model Iterate + + Final Model = + + + + …

Boosting Results – MLJ 11 Predicting the advisor for a student Movie Recommendation Citation Analysis Machine Reading

Other Applications • Similar Results in several other problems • Imitation Learning – Learning how to act from demonstrations (Natarajan et al IJCAI ‘11) • Robocup, a grid world domain, traffic signal domain and blocksworld • Prediction of CAC Levels – Predicting cardio-vascular risks in young adults (Natarajan et al – IAAI 13) • Prediction of heart attacks (Weiss et al – IAAI 12, AI Magazine 12) • Prediction of onset of Alzheimer’s (Natarajan et al ICMLA ’12, Natarajan et al IJMLC 2013)

Parallel Lifted Learning

Stochastic ML Statistical Relational Parallel Scales well, stochastic gradients, online learning, … Symmetries, compact models, lifted inference, …. Symmetries, compact models, lifted inference, ….

Symmetry based inference

2 1 1 1 2 2 3 4 3 3 4 4 5 5 5 root clause Tree (set of clauses) P(Anna) HI (Bob) P(Anna)  !P(Bob) P(Anna)!P(Bob) P(Bob)=> HI(Bob) P(Bob)=> !HI(Anna) 1 neighboring clauses P(Anna) => !HI(Bob) 2 3 4 Variabilized tree P(Anna) => HI(Anna) P(X)!P(Y) P(Y)=> HI(Y) P(Y)=> !HI(X) P(Bob) => HI(Bob) 5 HI(Anna) P(Bob) => !HI(Anna) P(Bob)

Lifted Training Generate initial tree pieces and variablize its arguments.

Challenges • Message schedules • Iterative Map-reduce? • How do we take this idea to learning the models? • How can we more efficiently parallelize symmetry identification? • What are the compelling problems? Vision, NLP,…

Conclusion • The world is inherently relational and uncertain • SRL has developed into an exciting field in the past decade • Several previous SRL workshops • Boosting Relational models has promising initial results • Applied to several different problems • First scalable relational learning algorithm • How can we parallelize/scale this algorithm? • Can this benefit from an inference algorithm like Belief Propagation that can be parallelized easily?

Practical Probabilistic Relational Learning

Practical Probabilistic Relational Learning

Presentation Transcript

Statistical Relational Learning

Representation, Inference and Learning in Relational Probabilistic Languages

Learning Probabilistic Relational Models

Practical Statistical Relational Learning

Learning 5000 Relational Extractors

Learning Relational Probability Trees

CryptDB : A Practical Encrypted Relational DBMS

Statistical Relational Learning

A Quick Romp Through Probabilistic Relational Models

Probabilistic Models of Relational Data

Statistical Relational Learning

Machine learning, probabilistic modelling

Learning probabilistic finite automata

6.899 Relational Data Learning

Probabilistic answers to relational queries (PARQ)

Probabilistic Models for Relational Data

Hierarchical Probabilistic Relational Models for Collaborative Filtering

Identifying co-regulation using Probabilistic Relational Models

Probabilistic Models of Object-Relational Domains

Learning Probabilistic Relational Models

Practical Statistical Relational AI

A Quick Romp Through Probabilistic Relational Models