1.17k likes | 1.28k Views
S tatistical R elational L earning: An Introduction. Lise Getoor University of Maryland, College Park. September 5, 2007 Progic 2007. S tatistical R elational L earning: An Introduction. biased. X. ^. Lise Getoor University of Maryland, College Park. September 5, 2007 Progic 2007.
E N D
Statistical Relational Learning: An Introduction Lise Getoor University of Maryland, College Park September 5, 2007 Progic 2007
Statistical Relational Learning: An Introduction biased X ^ Lise Getoor University of Maryland, College Park September 5, 2007 Progic 2007
acknowledgements • Statistical Relational Learning (SRL) is a synthesis of ideas of many individuals who have participated in various SRL events, workshops and classes: • Hendrik Blockeel, Mark Craven, James Cussens, Bruce D’Ambrosio, Luc De Raedt, Tom Dietterich, Pedro Domingos, Saso Dzeroski, Peter Flach, Rob Holte, Manfred Jaeger, David Jensen, Kristian Kersting, Daphne Koller, Heikki Mannila, Andrew McCallum Tom Mitchell, Ray Mooney, Stephen Muggleton, Kevin Murphy, Jen Neville, David Page, Avi Pfeffer, Claudia Perlich, David Poole, Foster Provost, Dan Roth, Stuart Russell, Taisuke Sato, Jude Shavlik, Ben Taskar, Lyle Ungar and many others…
Why SRL? • Traditional statistical machine learning approaches assume: • A random sample of homogeneous objects from single relation • Traditional relational learning approaches assume: • No noise or uncertainty in data • Real world data sets: • Multi-relational and heterogeneous • Noisy and uncertain • Statistical Relational Learning (SRL): • newly emerging research area at the intersection of statistical models and relational learning/inductive logic programming • Sample Domains: • web data, social networks, biological data, communication data, customer networks, sensor networks, natural language, vision, …
SRL Theory • Methods that combine expressive knowledge representation formalisms such as relational and first-order logic with principled probabilistic and statistical approaches to inference and learning • Directed Approaches • Semantics based on Bayesian Networks • Frame-based Directed Models • Rule-based Directed Models • Undirected Approaches • Semantics based on Markov Networks • Frame-based Undirected Models • Rule-based Undirected Models • Process-based Approaches
SRL Theory • Methods that combine expressive knowledge representation formalisms such as relational and first-order logic with principled probabilistic and statistical approaches to inference and learning • Directed Approaches • Semantics based on Bayesian Networks • Frame-based Directed Models • Rule-based Directed Models • Undirected Approaches • Semantics based on Markov Networks • Frame-based Undirected Models • Rule-based Undirected Models • Process-based Approaches
Directed Frame-based Approaches • Probabilistic Relational Models (PRMs) • Representation & Inference [Koller & Pfeffer 98, Pfeffer, Koller, Milch &Takusagawa 99, Pfeffer 00] • Learning [Friedman et al. 99, Getoor, Friedman, Koller & Taskar 01 & 02, Getoor 01] • Probabilistic Entity Relation Models (PERs) • Representation [Heckerman, Meek & Koller 04] • Logical syntax for PRMs (PRL) [Getoor & Grant 06]
Probabilistic Relational Models • BN Tutorial • PRMs w/ Attribute Uncertainty • Inference in PRMs • Learning in PRMs • PRMs w/ Structural Uncertainty • PRMs w/ Class Hierarchies
conditional probability table (CPT) S P(Q| W, S) W w s 0.6 0.4 w s 0.3 0.7 w s 0.4 0.6 0.1 0.9 w s Bayesian Networks Smart Good Writer Reviewer Mood Quality nodes = domain variables edges = direct causal influence Review Length Accepted Network structure encodes conditional independencies: I(Review-Length , Good-Writer | Reviewer-Mood)
S W M Q L A BN Semantics • Compact & natural representation: • nodes k parents O(2k n) vs. O(2n) params • natural parameters conditional independencies in BN structure local CPTs full joint distribution over domain + =
S W M Q L A Reasoning in BNs • Full joint distribution answers any query • P(event | evidence) • Allows combination of different types of reasoning: • Causal:P(Reviewer-Mood | Good-Writer) • Evidential: P(Reviewer-Mood | not Accepted) • Intercausal: P(Reviewer-Mood | not Accepted, Quality)
mood good writer pissy false 0.9 pissy true 0.1 good false 0.7 good true 0.3 Variable Elimination • To compute factors A factor is a function from values of variables to positive real numbers
Variable Elimination • To compute
Variable Elimination • To compute sum out l
Variable Elimination • To compute new factor
Variable Elimination • To compute multiply factors together then sum out w
Variable Elimination • To compute new factor
Variable Elimination • To compute
Other Inference Algorithms • Exact • Junction Tree [Lauritzen & Spiegelhalter 88] • Cutset Conditioning [Pearl 87] • Approximate • Loopy Belief Propagation [McEliece et al 98] • Likelihood Weighting [Shwe & Cooper 91] • Markov Chain Monte Carlo [eg MacKay 98] • Gibbs Sampling [Geman & Geman 84] • Metropolis-Hastings [Metropolis et al 53, Hastings 70] • Variational Methods [Jordan et al 98]
Learning BNs Structure and Parameters Parameters only Complete Data Incomplete Data See [Heckerman 98] for a general introduction
BN Parameter Estimation • Assume known dependency structure G • Goal: estimate BN parameters q • entries in local probability models, • q is good if it’s likely to generate observed data. • MLE Principle: Choose q* so as to maximize l • Alternative: incorporate a prior
Learning With Complete Data • Fully observed data: data consists of set of instances, each with a value for all BN variables • With fully observed data, we can compute = number of instances with , and • and similarly for other counts • We then estimate
Dealing w/ missing values • Can’t compute • But can use Expectation Maximization (EM) • Given parameter values, can compute expected counts: • Given expected counts, estimate parameters: • Begin with arbitrary parameter values • Iterate these two steps • Converges to local maximum of likelihood this requires BN inference
Structure search • Begin with an empty network • Consider all neighbors reached by a search operator that are acyclic • add an edge • remove an edge • reverse an edge • For each neighbor • compute ML parameter values • compute score(s) = • Choose the neighbor with the highest score • Continue until reach a local maximum
Mini-BN Tutorial Summary • Representation – probability distribution factored according to the BN DAG • Inference – exact + approximate • Learning – parameters + structure
Probabilistic Relational Models • BN Tutorial • PRMs w/ Attribute Uncertainty • Inference in PRMs • Learning in PRMs • PRMs w/ Structural Uncertainty • PRMs w/ Class Hierarchies
Relational Schema Author Review Good Writer Mood Smart Length Paper Quality Accepted Has Review Author of • Describes the types of objects and relations in the world
Probabilistic Relational Model Review Author Smart Mood Good Writer Length Paper Quality Accepted
Paper.Accepted | æ ö ÷ ç Paper.Quality, P ÷ ç ÷ ç Paper.Review.Mood è ø Probabilistic Relational Model Review Author Smart Mood Good Writer Length Paper Quality Accepted
Probabilistic Relational Model Review Author Smart Mood Good Writer Length Paper P(A | Q, M) Q , M f , f 0 . 1 0 . 9 Quality f , t 0 . 2 0 . 8 Accepted t , f 0 . 6 0 . 4 t , t 0 . 7 0 . 3
Primary Keys Foreign Keys Relational Skeleton Paper P1 Author: A1 Review: R1 Review R1 Author A1 Paper P2 Author: A1 Review: R2 Review R2 Author A2 Review R2 Paper P3 Author: A2 Review: R2 Fixed relational skeleton : • set of objects in each class • relations between them
Smart Smart Mood Mood Mood Good Writer Good Writer Length Length Length Quality Quality Quality Accepted Accepted Accepted PRM w/ Attribute Uncertainty Paper P1 Author: A1 Review: R1 Author A1 Review R1 Paper P2 Author: A1 Review: R2 Author A2 Review R2 Paper P3 Author: A2 Review: R2 Review R3 PRM defines distribution over instantiations of attributes
Low Pissy r2.Mood r3.Mood P(A | Q, M) Q , M P2.Quality P3.Quality f , f 0 . 1 0 . 9 f , t 0 . 2 0 . 8 P(A | Q, M) Q , M t , f 0 . 6 0 . 4 f , f 0 . 1 0 . 9 t , t 0 . 7 0 . 3 f , t 0 . 2 0 . 8 t , f 0 . 6 0 . 4 t , t 0 . 7 0 . 3 A Portion of the BN P2.Accepted P3.Accepted
High Low Pissy Pissy r2.Mood r3.Mood P2.Quality P3.Quality A Portion of the BN P(A | Q, M) Q , M f , f 0 . 1 0 . 9 f , t 0 . 2 0 . 8 P2.Accepted t , f 0 . 6 0 . 4 t , t 0 . 7 0 . 3 P3.Accepted
Review R2 Review R3 Review R1 Mood Mood Mood Length Length Length Paper P1 Quality Accepted PRM: Aggregate Dependencies Paper Review Mood Quality Length Accepted
Review R3 Review R1 Review R2 Mood Mood Mood Length Length Length Paper P1 Quality Accepted PRM: Aggregate Dependencies Paper Review Mood Quality Length Accepted P(A | Q, M) Q , M f , f 0 . 1 0 . 9 f , t 0 . 2 0 . 8 t , f 0 . 6 0 . 4 t , t 0 . 7 0 . 3 mode sum, min, max, avg, mode, count
Objects Attributes PRM with AU Semantics Author • Review • R1 Author A1 Paper Paper P1 • Review • R2 Author A2 Review Paper P2 • Review • R3 Paper P3 PRM + relational skeleton = probability distribution over completions I:
Probabilistic Relational Models • BN Tutorial • PRMs w/ Attribute Uncertainty • Inference in PRMs • Learning in PRMs • PRMs w/ Structural Uncertainty • PRMs w/ Class Hierarchies
PRM Inference • Simple idea: enumerate all attributes of all objects • Construct a Bayesian network over all the attributes
Inference Example • Review • R1 Skeleton • Paper • P1 • Review • R2 • Author • A1 • Review • R3 • Paper • P2 • Review • R4 Query is P(A1.good-writer) Evidence is P1.accepted = T, P2.accepted = T
P1.Quality P2.Quality R1.Mood R3.Mood R2.Mood R4.Mood R1.Length R3.Length R2.Length R4.Length P1.Accepted P2.Accepted PRM Inference: Constructed BN A1.Smart A1.Good Writer
PRM Inference • Problems with this approach: • constructed BN may be very large • doesn’t exploit object structure • Better approach: • reason about objects themselves • reason about whole classes of objects • In particular, exploit: • reuse of inference • encapsulation of objects
R2.Mood R1.Mood R2.Length R1.Length PRM Inference: Interfaces Variables pertaining to R2: inputs and internal attributes A1.Smart A1.Good Writer P1.Quality P1.Accepted
R1.Mood R1.Length PRM Inference: Interfaces Interface: imported and exported attributes A1.Smart A1.Good Writer R2.Mood P1.Quality R2.Length P1.Accepted
P1.Quality P2.Quality R1.Mood R3.Mood R2.Mood R4.Mood R1.Length R3.Length R2.Length R4.Length P1.Accepted P2.Accepted PRM Inference: Encapsulation R1 and R2 are encapsulated inside P1 A1.Smart A1.Good Writer
P1.Quality P2.Quality R1.Mood R3.Mood R2.Mood R4.Mood R1.Length R3.Length R2.Length R4.Length P1.Accepted P2.Accepted PRM Inference: Reuse A1.Smart A1.Good Writer
Structured Variable Elimination Author 1 A1.Smart A1.Good Writer Paper-1 Paper-2
Structured Variable Elimination Author 1 A1.Smart A1.Good Writer Paper-1 Paper-2
Structured Variable Elimination Paper 1 A1.Smart A1.Good Writer Review-1 Review-2 P1.Quality P1.Accepted
Structured Variable Elimination Paper 1 A1.Smart A1.Good Writer Review-1 Review-2 P1.Quality P1.Accepted