310 likes | 418 Views
A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch. Roberto Navigli, Paola Velardi and Stefano Faralli {navigli,velardi,faralli}@di.uniroma1.it. http://lcl.uniroma1.it. ERC StG: Multilingual Joint Word Sense Disambiguation (MultiJEDI) Roberto Navigli. 1.
E N D
A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli {navigli,velardi,faralli}@di.uniroma1.it http://lcl.uniroma1.it ERC StG: Multilingual Joint Word Sense Disambiguation (MultiJEDI) Roberto Navigli 1
A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Motivations We present a graph-based approach to learn a lexical taxonomy automatically starting from a domain corpus and the Web. Unlike other approaches, we learn both concepts and relations entirely from scratch in 3 steps: 1) term extraction 2) definition and hypernym extraction 3) graph pruning
A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Taxonomy Learning Workflow Web glossaries & documents Domain terms Domain terms Upper terms Domain Corpus Domain filtering Definition & hypernym extraction Graph pruning Terminology extraction Hypernym graph Induced taxonomy
A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Taxonomy Learning Workflow Web glossaries & documents Domain terms Domain terms Upper terms Domain Corpus Domain filtering Definition & hypernym extraction Graph pruning Terminology extraction Hypernym graph Induced taxonomy
maximum likelihood flow network mesh generation hash function pattern recognition information processing • A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Terminology Extraction Domain Corpus Domain terms http://lcl.uniroma1.it/termextractor
A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Taxonomy Learning Workflow Taxonomy Learning Workflow Web glossaries & documents Domain terms Domain terms Upper terms Domain Corpus Domain filtering Definition & hypernym extraction Graph pruning Terminology extraction Hypernym graph Induced taxonomy
non domain domain • A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Definition & Hypernym Extraction + Domain Filtering Web glossaries & documents Domain Corpus Domain terms flow network definition extraction (WCL) In graph theory, a flow network is a directed graph. Global Cash Flow Network is a business opportunity to make money online. A flow network is a network with two distinguished vertices.
network directed graph flow network • A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Definition & Hypernym Extraction + Domain Filtering Web glossaries & documents Domain Corpus Domain terms flow network definition extraction (WCL) In graph theory, a flow networkis a directed graph. A flow network is a network with two distinguished vertices. directed graph hypernym extraction network
graph data structure network directed graph flow network • A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Definition & Hypernym Extraction + Domain Filtering Web glossaries & documents Domain Corpus Terms from previous iteration directed graph definition extraction (WCL) A directedgraph is a graph where ... A directed graph is a data structure ... graph hypernym extraction data structure
A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Hypernym Extraction Algorithm (1) • Large training set with many uncommon patterns “X is a ADJ term that refers to a kind of Y” • Annotated with 4 fields: definiendum (D), definitor (V) containing the verbal pattern and definiens (H) containing the hypernym, and the rest of the sentence (R). • An <Albedo> (often represented by the generic formula HA)/ is traditionally considered / any chemical compound/ that, when dissolved in water, gives a solution with a hydrogen ion activity greater than in pure water • The algorithm builds a set of word lattices from the training set. Independent lattices are created for each of the 3 basic fields
A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Hypernym Extraction Algorithm (2) Lattice learning consists of three steps: • each sentence in the training set is pre-processed and each field is generalized to a star pattern “[In arts, a chiaroscuro]D [is]V [a monochrome picture]H.” D=“In *, a <TARGET>”, V=“is”, H=“a * <HYPER>”
A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Hypernym Extraction Algorithm (3) • Clustering: for each field, the training sentences are then clustered according to the star patterns they belong to; In arts, a chiaroscuro is a monochrome picture. In mathematics, a graph is a data structure that consists of . . . In computer science, a pixel is a dot that is part of a computer image. D: In * , a <TARGET> V: is H: a * <HYPER>
A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Hypernym Extraction Algorithm (4) 3. Word-Class Lattice construction: for each sentence cluster, a WCL is created by means of a greedy alignment algorithm
A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Performance in definition extraction Wikipedia UKWac corpus Outperforms existing methods for definition extraction
A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Precision in hypernym extraction Wikipedia UKWac Pattern-based methods achieve much lower recall: 62 vs. 383 hypernyms extracted from UKWac
A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli The iterative growth of the hypernym graph
A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli The iterative growth of the hypernym graph
A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Taxonomy Learning Workflow Web glossaries & documents Domain terms Domain terms Upper terms Domain Corpus Domain filtering Definition & hypernym extraction Graph pruning Terminology extraction Hypernym graph Induced taxonomy
A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Graph Pruning Given the hypernym graph 1) We disconnect false roots and false leaves. 2) We weight edges and nodes with a novel weighting algorithm.
A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Graph Pruning Given the hypernym graph 0 5 1) We disconnect false roots and false leaves. 5 5 5 2 3 2) We weight edges and nodes with a novel weighting algorithm. 8 8 8 7 7 1 2 1 2 10 9 0 1
A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Graph Pruning Given the hypernym graph 0 5 1) We disconnect false roots and false leaves. 5 5 5 2 3 2) We weight edges and nodes with a novel weighting algorithm. 8 8 8 7 7 1 2 1 2 3) We apply Chu-Liu/Edmond's algorithm, to obtain an Optimal Branching. 10 9 0 1
A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Graph Pruning Given the hypernym graph 1) We disconnect false roots and false leaves. 2) We weight edges and nodes with a novel weighting algorithm. 3) We apply Chu-Liu/Edmond's algorithm, to obtain an Optimal Branching. As a result we obtain a tree-like taxonomy.
A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli From the Noisy Hypernym Graph...
A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Application to ACL Taxonomy • ACL Anthology from year 1979 to 2010 (4176 papers). • 29 upper terms from WordNet’s abstaction • 10,000 terms extracted, first 2000 inspected, 1006 selected (eliminated e.g. : word pair, input sentence, human judgement) • 5 iterations, 1329 definitions, 1031 nodes 1274 edges • After pruning, 936 nodes 935 edges
A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Evaluation (5 annotators) Another application of the algorithm starting from IJCAI collection, similar results
Evaluation: WordNet reconstruction • Same evaluation strategy as in Kozareva&Hovy (EMNLP2010) • Only nodes both in WordNet and in the acquired taxonomy are considered in the evaluation (as in K&H)
A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch Roberto Navigli, Paola Velardi and Stefano Faralli Future work • From “strict” taxonomy to lattice • A in-house implementation of google “define” to overcome search limitations (no API for Google define) • Extension to other languages
http://lcl.uniroma1.it April 2011
Initial terminology Upper terms Hypernyms from iteration I Hypernyms from iteration II Hypernyms from iteration III Hypernyms from iteration IV Hypernyms from iteration V