310 likes | 472 Views
LEILA – Learning to Extract Information by Linguistic Analysis. presented at the 2 nd Workshop on Ontology Learning and Population (OLP2). Fabian M. Suchanek , Georgiana Ifrim, Gerhard Weikum (Max-Planck Institute for Computer Science Saarbrücken/Germany). Overview. ر Motivation
E N D
LEILA – Learning to Extract Information by Linguistic Analysis presented at the 2nd Workshop on Ontology Learning and Population (OLP2) Fabian M. Suchanek, Georgiana Ifrim, Gerhard Weikum (Max-Planck Institute for Computer Science Saarbrücken/Germany) LEILA - Learning to Extract Information by Linguistic Analysis
Overview ر Motivation ر The LEILA System ر Plan of Attack ر System Architecture ر Experiments ر Conclusion LEILA - Learning to Extract Information by Linguistic Analysis
Motivation Meat dish Google Search I'm feeling hungry ? This page has been created to enlighten the public about the Wiener Schnitzel. [...] LEILA - Learning to Extract Information by Linguistic Analysis
Motivation To know that a Schnitzel is a meat dish, we need an ontology. رUse hand-crafted ontologies (like WordNet) (but: low coverage, high cost, fast aging) ر Or: Gather ontological data from Web documents LEILA - Learning to Extract Information by Linguistic Analysis
Goal Given ر a binary target relation (e.g. subclassOf) ر a set of Web documents extract all pairs of entities that are in the target relation LEILA - Learning to Extract Information by Linguistic Analysis
Related Work Learn text patterns (e.g. Soderland, Chakrabarti, KnowItAll) X is a Y A Schnitzel is a meat dishfrom Austria. LEILA - Learning to Extract Information by Linguistic Analysis
Related Work Learn text patterns (e.g. Soderland, Chakrabarti, KnowItAll) X is a Y A Schnitzel, also called Wiener Schnitzel, is a meat dish. LEILA - Learning to Extract Information by Linguistic Analysis
Related Work Learn text patterns (e.g. Soderland, Chakrabarti, KnowItAll) ┌──────Subject───────────┐┌Obj─┐ A Schnitzel, also called Wiener Schnitzel, is a meat dish. Idea: Learn linguistic patterns! LEILA - Learning to Extract Information by Linguistic Analysis
Plan of Attack subclassOf (Output pairs) (Web documents) (Target relation) LEILA - Learning to Extract Information by Linguistic Analysis
Preprocessing subclassOf (Output pairs) (Web documents) (Target relation) The Schnitzel (0.0314946089 stones) is best enjoyed with Ösibräu. LEILA - Learning to Extract Information by Linguistic Analysis
Preprocessing subclassOf (Output pairs) (Web documents) (Target relation) The Schnitzel (200g) is best enjoyed with Ösibräu. LEILA - Learning to Extract Information by Linguistic Analysis
Preprocessing subclassOf (Output pairs) (Web documents) (Target relation) The Schnitzel (200g) is best enjoyed with Oesibraeu. LEILA - Learning to Extract Information by Linguistic Analysis
Preprocessing subclassOf (Output pairs) (Web documents) (Target relation) The Schnitzel is best enjoyed with Oesibraeu. The Schnitzel ( 200 g ) LEILA - Learning to Extract Information by Linguistic Analysis
Preprocessing subclassOf participle mod comp det subj adv The Schnitzel is best enjoyed with Oesibraeu. adj adj adj adj adj The Schnitzel ( 200 g ) LEILA - Learning to Extract Information by Linguistic Analysis
Preprocessing subclassOf (Output pairs) (Web documents) (Target relation) LEILA - Learning to Extract Information by Linguistic Analysis
Algorithm + - (Output pairs) (Seed pairs) (Web documents) A dog is a mammal. LEILA - Learning to Extract Information by Linguistic Analysis
Algorithm + - (Output pairs) (Seed pairs) (Web documents) A X is a Y. This dog is a nag. (Positive patterns) LEILA - Learning to Extract Information by Linguistic Analysis
Algorithm + - (Output pairs) (Seed pairs) (Web documents) A X is a Y. This X is a Y. (Positive patterns) (Negative patterns) LEILA - Learning to Extract Information by Linguistic Analysis
Algorithm + - (Output pairs) (Seed pairs) (Web documents) A Schnitzel is a meat dish. A X is a Y. (Generalized positive patterns) LEILA - Learning to Extract Information by Linguistic Analysis
LEILA: System Architecture (Output pairs) (Seed pairs) (Web documents) Seed pair data sets LEILA kNN Learner Preprocessing, stemming LinkParser (Sleator, CMU) SVMLight (Joachims, Cornell U) LEILA - Learning to Extract Information by Linguistic Analysis
Gold Standard for Evaluation (Output pairs) (Web documents) (Target relation) Schnitzel meat dish A Schnitzel is practically vitamin-free and thus the meat dish is extremely popular in Europe. (Ideal pairs) LEILA - Learning to Extract Information by Linguistic Analysis
Results with different relations birthDate Seed pairs are given by a function that decides whether a word pair is ر an example (here: list of birth dates from www.famousbirthdays.com) ر a counterexample (here: can be deduced from examples) ر a candidate (here: all pairs of a name and a date) LEILA - Learning to Extract Information by Linguistic Analysis
Results with different relations Target Relation Corpus Precision Recall birthDate Wikip composers 79%8% 70%9% Patterns: X (born in Y) X was born in Y ... (see paper for details on the experiments) LEILA - Learning to Extract Information by Linguistic Analysis
Results with different relations Target Relation Corpus Precision Recall birthDate Wikip composers 79%8% 70%9% synonymy Wikip geography 73%7% 64%7% Examples: all WordNet synsets Counterexamples: all words that are not in a synset Candidates: all pairs of proper names Patterns: X or Y, X (or Y), ... LEILA - Learning to Extract Information by Linguistic Analysis
Results with different relations Target Relation Corpus Precision Recall birthDate Wikip composers 79%8% 70%9% synonymy Wikip geography 73%7% 64%7% instanceOf Wikip composers 58%3% 41%3% Examples: all direct WordNet hyponyms Counterexamples: all words that are not hyponyms of each other Candidates: all pairs of a proper name and a WordNet concept Patterns: an X is a Y, X is unusual among the Y,... LEILA - Learning to Extract Information by Linguistic Analysis
Results with different relations Target Relation Corpus Precision Recall birthDate Wikip composers 79%8% 70%9% synonymy Wikip geography 73%7% 64%7% instanceOf Wikip composers 58%3% 41%3% Wikip random 33%3% 33%3% Google composers 28%3% 17%2% (see paper for details on the experiments) LEILA - Learning to Extract Information by Linguistic Analysis
Results with different competitors (Results in %, LEILA in red) Precision Recall Precision Recall Precision Recall Precision Recall 90 58 58 50 50 41 41 39 32 34 32 30 26 22 15 4 2 4 Snowball TextToOnto,Text2Onto CV-System CV-System headquarters instanceOf instanceOf instanceOf Snowball’s corpus Wikip composers CV’s corpus Wikip composers (see paper for explanations, conditions and details!) LEILA - Learning to Extract Information by Linguistic Analysis
Conclusion Our system LEILA ر can learn arbitrary binary relations from Web documents ر uses a deep linguistic analysis ر compares favorably with other systems See http://www.mpi-inf.de/~suchanek LEILA - Learning to Extract Information by Linguistic Analysis
Results with different competitors System Relation Corpus Precision Recall Snowball headquarters Snowball’s 34%8% 30%7% Snowball’s 90%6% 50%7% LEILA headquarters TextToOnto instanceOf Wikip composers 39%9% 4%1% Text2Onto instanceOf Wikip composers 50% 2%1% LEILA instanceOf Wikip composers 58%3% 41%3% CV’s 32%5% 32%5% CV-System instanceOf LEILA 26%7% 15%4% instanceOf CV’s instanceOf Wikip composers 22% 4%2% CV-System LEILA instanceOf Wikip composers 58%3% 41%3% (see paper for explanations, conditions and details!) LEILA - Learning to Extract Information by Linguistic Analysis
Pattern Generalization – kNN A X is a big Y + A X is a Y. - This X is a Y. + X such as Y (See our paper at KDD for details) LEILA - Learning to Extract Information by Linguistic Analysis
Pattern Generalization – SVM + A X is a big Y A X is a Y. + - This X is a Y. + X such as Y - + (See our paper at KDD for details) LEILA - Learning to Extract Information by Linguistic Analysis