200 likes | 310 Views
Practical Probabilistic Relational Learning. Sriraam Natarajan. Take-Away Message. Learn from rich, highly structured data!. Traditional Learning. Data is i.i.d. Burglary. Earthquake. +. Alarm. MaryCalls. JohnCalls. Attributes(Features). Data. Learning. Earthquake. Burglary.
E N D
Practical Probabilistic Relational Learning Sriraam Natarajan
Take-Away Message Learn from rich, highly structured data!
Traditional Learning Data is i.i.d. Burglary Earthquake + Alarm MaryCalls JohnCalls Attributes(Features) Data
Learning Earthquake Burglary Alarm JohnCalls MaryCalls
Real-World Problem: Predicting Adverse Drug Reactions PatientID Date Physician Symptoms Diagnosis P1 1/1/01 Smith palpitations hypoglycemic P1 2/1/03 Jones fever, aches influenza PatientIDGenderBirthdate P1 M 3/22/63 Visit Table Patient Table PatientID Date Lab Test Result PatientID SNP1 SNP2 … SNP500K P1 AA AB BB P2 AB BB AA P1 1/1/01 blood glucose 42 P1 1/9/01 blood glucose 45 SNP Table Lab Tests PatientID Date Prescribed Date Filled Physician Medication Dose Duration P1 5/17/98 5/18/98 Jones prilosec 10mg 3 months Prescriptions
Logic + Probability = Probabilistic Logic aka Statistical Relational Learning Models Statistical Relational Learning (SRL) Add Probabilities Logic Add Relations Probabilities • Several previous SRL Workshops in the past decade • This year – StaRAI @ AAAI 2013
Classical Machine Learning Statistical Relational Learning Probability Theory Probabilistic Logic Stochastic Deterministic Prop Rule Learning Inductive Logic Programming Learning First Order Logic Propositional Logic No Learning Prop FO
Costs and Benefits of the SRL soup • Benefits • Rich pool of different languages • Very likely that there is a language that fits your task at hand well • A lot research remains to be done, ;-) • Costs • “Learning” SRL is much harder • Not all frameworks support all kinds of inference and learning settings How do weactuallylearn relational modelsfromdata?
Why is this problem hard? • Non-convex problem • Repeated search of parameters for every step in induction of the model • First-order logic allows for different levels of generalization • Repeated inference for every step of parameter learning • Inference is P# complete • How can we scale this?
Relational Probability Trees To predict heartAttack(X) male(X) • Each conditional probability distribution can be learned as a tree • Leaves are probabilities • The final model is the set of the RRTs yes no chol(X,Y,L), Y>40,L>200 … yes no diag(X,Hypertension,Z),Z>55 0.8 no yes bmi(X,W,55), W>30 0.05 [Blockeel & De Raedt ’98] no yes 0.3 0.77
Gradient (Tree) Boosting[Friedman AnnalsofStatistics 29(5):1189-1232, 2001] • Models = weighted combination of a large number ofsmalltrees (models) • Intuition: Generate an additive model by sequentially fitting small trees to pseudo-residuals from a regression at each iteration… Data Residuals = - Data Induce + Predictions Loss fct + Initial Model Iterate + + Final Model = + + + + …
Boosting Results – MLJ 11 Predicting the advisor for a student Movie Recommendation Citation Analysis Machine Reading
Other Applications • Similar Results in several other problems • Imitation Learning – Learning how to act from demonstrations (Natarajan et al IJCAI ‘11) • Robocup, a grid world domain, traffic signal domain and blocksworld • Prediction of CAC Levels – Predicting cardio-vascular risks in young adults (Natarajan et al – IAAI 13) • Prediction of heart attacks (Weiss et al – IAAI 12, AI Magazine 12) • Prediction of onset of Alzheimer’s (Natarajan et al ICMLA ’12, Natarajan et al IJMLC 2013)
Stochastic ML Statistical Relational Parallel Scales well, stochastic gradients, online learning, … Symmetries, compact models, lifted inference, …. Symmetries, compact models, lifted inference, ….
2 1 1 1 2 2 3 4 3 3 4 4 5 5 5 root clause Tree (set of clauses) P(Anna) HI (Bob) P(Anna) !P(Bob) P(Anna)!P(Bob) P(Bob)=> HI(Bob) P(Bob)=> !HI(Anna) 1 neighboring clauses P(Anna) => !HI(Bob) 2 3 4 Variabilized tree P(Anna) => HI(Anna) P(X)!P(Y) P(Y)=> HI(Y) P(Y)=> !HI(X) P(Bob) => HI(Bob) 5 HI(Anna) P(Bob) => !HI(Anna) P(Bob)
Lifted Training Generate initial tree pieces and variablize its arguments.
Challenges • Message schedules • Iterative Map-reduce? • How do we take this idea to learning the models? • How can we more efficiently parallelize symmetry identification? • What are the compelling problems? Vision, NLP,…
Conclusion • The world is inherently relational and uncertain • SRL has developed into an exciting field in the past decade • Several previous SRL workshops • Boosting Relational models has promising initial results • Applied to several different problems • First scalable relational learning algorithm • How can we parallelize/scale this algorithm? • Can this benefit from an inference algorithm like Belief Propagation that can be parallelized easily?