910 likes | 1.1k Views
Statistical Relational Learning for Knowledge Extraction from the Web. Hoifung Poon Dept. of Computer Science & Eng. University of Washington. 1. “Drowning in Information, Starved for Knowledge”. WWW. 2. 2. 2. Great Vision: Knowledge Extraction from Web.
E N D
Statistical Relational Learning for Knowledge Extraction from the Web Hoifung Poon Dept. of Computer Science & Eng. University of Washington 1
Great Vision:Knowledge Extraction from Web Craven et al., “Learning to Construct Knowledge Bases from the World Wide Web," Artificial Intelligence, 1999. • Also need: • Knowledge representation and reasoning • Close the loop: Apply knowledge to extraction • Machine reading[Etzioni et al., 2007] 3
Machine Reading: Text Knowledge …… 4 4 4
Rapidly Growing Interest • AAAI-07 Spring Symposium on Machine Reading • DARPA Machine Reading Program (2009-2014) • NAACL-10 Workshop on Learning By Reading • Etc. 5
Great Impact • Scientific inquiry and commercial applications • Literature-based discovery, robot scientists • Question answering, semantic search • Drug design, medical diagnosis • Breach knowledge acquisition bottleneck for AI and natural language understanding • Automatically semantify the Web • Etc. 6
This Talk • Statistical relational learning offers promising solutions to machine reading • Markov logic is a leading unifying framework • A success story: USP • Unsupervised, end-to-end machine reading • Extracts five times as many correct answers as state of the art, with highest accuracy of 91% 7
USP: Question-Answer Example Interestingly, the DEX-mediated IkappaBalpha induction was completely inhibited by IL-2, but not IL-4, in Th1 cells, while the reverse profile was seen in Th2 cells. Q: What does IL-2 control? A: The DEX-mediated IkappaBalpha induction 8 8
Overview Machine reading: Challenges Statistical relational learning Markov logic USP: Unsupervised Semantic Parsing Research directions 9 9 9
Key Challenges • Complexity • Uncertainty • Pipeline accumulates errors • Supervision is scarce 10
Languages Are Structural governments lm$pxtm (Hebrew: according to their families) IL-4 induces CD11B Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41...... George Walker Bush was the 43rd President of the United States. …… Bush was the eldest son of President G. H. W. Bush and Babara Bush. ……. In November 1977, hemet Laura Welch at a barbecue. 11 11 11
Languages Are Structural S govern-ment-s l-m$px-t-m (Hebrew: according to their families) VP NP V NP IL-4 induces CD11B Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41...... George Walker Bush was the 43rd President of the United States. …… Bush was the eldest son of President G. H. W. Bush and Babara Bush. ……. In November 1977, hemet Laura Welch at a barbecue. involvement Theme Cause up-regulation activation Theme Cause Site Theme human monocyte IL-10 gp41 p70(S6)-kinase 12 12 12
Knowledge Is Heterogeneous • Individuals E.g.: Socrates is a man • Types E.g.: Man is mortal • Inference rules E.g.: Syllogism • Ontological relations • Etc. MAMMAL FACE ISA ISPART HUMAN EYE 13 13
Complexity • Can handle using first-order logic • Trees, graphs, dependencies, hierarchies, etc. easily expressed • Inference algorithms (satisfiability testing, theorem proving, etc.) • But … logic is brittle with uncertainty 14 14
Languages Are Ambiguous Microsoft buysPowerset Microsoft acquires Powerset Powersetis acquired by Microsoft Corporation The Redmond software giant buysPowerset Microsoft’s purchase ofPowerset, … …… I saw the man with the telescope NP I sawthe man with the telescope NP ADVP I sawthe manwith the telescope Here in London, Frances Deek is a retired teacher … In the Israeli town …, Karen London says … Now London says … G. W. Bush …… …… Laura Bush…… Mrs. Bush …… Which one? London PERSON or LOCATION? 15 15 15
Knowledge Has Uncertainty • We need to model correlations • Our information is always incomplete • Our predictions are uncertain 16 16
Uncertainty • Statistics provides the tools to handle this • Mixture models • Hidden Markov models • Bayesian networks • Markov random fields • Maximum entropy models • Conditional random fields • Etc. • But … statistical models assume i.i.d. data(independently and identically distributed)objects feature vectors
Pipeline is Suboptimal • E.g., NLP pipeline: Tokenization Morphology Chunking Syntax … • Accumulates and propagates errors • Wanted: Joint inference • Across all processing stages • Among all interdependent objects 18
Supervision is Scarce Tons of text … but most is not annotated Labeling is expensive (Cf. Penn-Treebank) Need to leverage indirect supervision 19 19 19
Redundancy • Key source of indirect supervision • State-of-the-art systems depend on this E.g., TextRunner [Banko et al., 2007] • But … Web is heterogeneous: Long tail • Redundancy only present in head regime
Overview Machine reading: Challenges Statistical relational learning Markov logic USP: Unsupervised Semantic Parsing Research directions 21 21 21
Statistical Relational Learning Burgeoning field in machine learning Offers promising solutions for machine reading Unify statistical and logical approaches Replace pipeline with joint inference Principled framework to leverage both direct and indirect supervision 22 22
Machine Reading: A Vision Challenge: Long tail 23
Challenges in Applying Statistical Relational Learning Learning is much harder Inference becomes a crucial issue Greater complexity for user 25 25
Progress to Date Probabilistic logic [Nilsson, 1986] Statistics and beliefs [Halpern, 1990] Knowledge-based model construction[Wellman et al., 1992] Stochastic logic programs [Muggleton, 1996] Probabilistic relational models [Friedman et al., 1999] Relational Markov networks [Taskar et al., 2002] Markov logic[Domingos & Lowd, 2009] Etc. 26 26
Progress to Date Probabilistic logic [Nilsson, 1986] Statistics and beliefs [Halpern, 1990] Knowledge-based model construction[Wellman et al., 1992] Stochastic logic programs [Muggleton, 1996] Probabilistic relational models [Friedman et al., 1999] Relational Markov networks [Taskar et al., 2002] Markov logic [Domingos & Lowd, 2009] Etc. Leading unifying framework 27 27
Overview Machine reading Statistical relational learning Markov logic USP: Unsupervised Semantic Parsing Research directions 28 28 28
Markov Networks Undirected graphical models Smoking Cancer Asthma Cough • Log-linear model: Weight of Feature i Feature i 29
First-Order Logic Constants, variables, functions, predicatesE.g.: Anna, x, MotherOf(x), Friends(x,y) Grounding: Replace all variables by constantsE.g.: Friends (Anna, Bob) World (model, interpretation):Assignment of truth values to all ground predicates 30
Markov Logic • Intuition: Soften logical constraints • Syntax: Weighted first-order formulas • Semantics: Feature templates for Markov networks • A Markov Logic Network (MLN) is a set of pairs (Fi, wi) where • Fi is a formula in first-order logic • wiis a real number Number of true groundings of Fi 31
Example: Friends & Smokers Probabilistic graphical models andfirst-order logic are special cases Two constants: Anna (A) and Bob (B) Friends(A,B) Friends(A,A) Smokes(A) Smokes(B) Friends(B,B) Cancer(A) Cancer(B) Friends(B,A) 35
Efficient Inference • Logical or statistical inference already hard • But … can do approximate inference Suffice to perform well in most cases • Combine ideas from both camps • E.g., MC-SAT MCMC SAT solver • Can also leverage sparsity in relational domains More: Poon & Domingos, “Sound and Efficient Inference with Probabilistic and Deterministic Dependencies”, in Proc. AAAI-2006. More: Poon, Domingos & Sumner, “A General Method for Reducing the Complexity of Relational Inference and its Application to MCMC”, in Proc. AAAI-2008. 37
Weight Learning • Probability model P(X) • X: Observable in training data • Maximize likelihood of observed data • Regularization to prevent overfitting
Weight Learning Gradient descent Use MC-SAT for inference Can also leverage second-order information [Lowd & Domingos, 2007] Requires inference No. of times clause i is true in data Expected no. times clause i is true according to MLN 39 39 39
Unsupervised Learning: How? I.I.D. learning: Sophisticated model requires more labeled data Statistical relational learning: Sophisticated model may require less labeled data Ambiguities vary among objects Joint inference Propagate information from unambiguous objects to ambiguous ones One formula is worth a thousand labels Small amount of domain knowledge large-scale joint inference 40 40 40
Unsupervised Weight Learning • Probability model P(X,Z) • X: Observed in training data • Z: Hidden variables • E.g., clustering with mixture models • Z: Cluster assignment • X: Observed features • Maximize likelihood of observed data by summing out hidden variables Z
Unsupervised Weight Learning Gradient descent Use MC-SAT to compute both expectations May also combine with contrastive estimation Sum over z, conditioned on observed x Summed over bothx and z More: Poon, Cherry, & Toutanova, “Unsupervised Morphological Segmentation with Log-Linear Models”, in Proc. NAACL-2009. 42 42 42 Best Paper Award
Markov Logic Unified inference and learning algorithms Can handle millions of variables, billions of features, ten of thousands of parameters Easy-to-use software: Alchemy Many successful applications E.g.: Information extraction, coreference resolution, semantic parsing, ontology induction 43 43 43
Pipeline Joint Inference Combine segmentation and entity resolution for information extraction Extract complex and nested bio-events from PubMed abstracts More: Poon & Domingos, “Joint Inference for Information Extraction”, in Proc. AAAI-2007. More: Poon & Vanderwende, “Joint Inference for Knowledge Extraction from Biomedical Literature”, in Proc. NAACL-2010. 44 44
Unsupervised Learning: Example Coreference resolution:Accuracy comparable to previous supervised state of the art More: Poon & Domingos, “Joint Unsupervised Coreference Resolution with Markov Logic”, in Proc. EMNLP-2008. 45 45
Overview Machine reading: Challenges Statistical relational learning Markov logic USP:Unsupervised Semantic Parsing Research directions 46 46 46
Unsupervised Semantic Parsing USP [Poon & Domingos, EMNLP-09] First unsupervised approach for semantic parsing End-to-end machine reading system Read text, answer questions OntoUSP USP Ontology Induction [Poon & Domingos, ACL-10] Encoded in a few Markov logic formulas Best Paper Award 47 47
Semantic Parsing Goal • Microsoft buys Powerset • BUY(MICROSOFT,POWERSET) Challenge Microsoft buysPowerset Microsoft acquiressemantic search engine Powerset Powersetis acquired by Microsoft Corporation The Redmond software giant buysPowerset Microsoft’s purchase of Powerset, … 48 48 48
Limitations of Existing Approaches • Manual grammar or supervised learning • Applicable to restricted domains only • For general text • Not clear what predicates and objects to use • Hard to produce consistent meaning annotation • Also, often learn both syntax and semantics • Fail to leverage advanced syntactic parsers • Make semantic parsing harder
USP: Key Idea # 1 Target predicates and objects can be learned Viewed as clusters of syntactic or lexical variations of the same meaning BUY(-,-) buys, acquires, ’s purchase of, … Cluster of various expressions for acquisition MICROSOFT Microsoft, the Redmond software giant, … Cluster of various mentions of Microsoft 50