1 / 54

Graph-Based Methods for “Open Domain” Information Extraction

Graph-Based Methods for “Open Domain” Information Extraction. William W. Cohen Machine Learning Dept. and Language Technologies Institute School of Computer Science Carnegie Mellon University. Goal : recognize people, places, companies, times, dates, … in NL text.

linke
Download Presentation

Graph-Based Methods for “Open Domain” Information Extraction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Graph-Based Methods for “Open Domain” Information Extraction William W. Cohen Machine Learning Dept. and Language Technologies Institute School of Computer Science Carnegie Mellon University

  2. Goal: recognize people, places, companies, times, dates, … in NL text. Supervised learning from corpus completely annotated with target entity class (e.g. “people”) Linear-chain CRFs Language- and genre-specific extractors Goal: recognize arbitrary entity sets in text Minimal info about entity class Example 1: “ICML, NIPS” Example 2: “Machine learning conferences” Semi-supervised learning from very large corpora (WWW) Graph-based learning methods Techniques are largely language-independent (!) Graph abstraction fits many languages Traditional IE vs Open Domain IE

  3. Examples with three seeds

  4. Outline • History • Open-domain IE by pattern-matching • The bootstrapping-with-noise problem • Bootstrapping as a graph walk • Open-domain IE as finding nodes “near” seeds on a graph • Approach 1: A “natural” graph derived from a smaller corpus + learned similarity • Approach 2: A carefully-engineered graph derived from huge corpus (e.g’s above)

  5. History: Open-domain IE by pattern-matching (Hearst, 92) • Start with seeds: “NIPS”, “ICML” • Look thru a corpus for certain patterns: • … “at NIPS, AISTATS, KDD and other learning conferences…” • Expand from seeds to new instances • Repeat….until ___ • “on PC of KDD, SIGIR, … and…”

  6. NIPS SNOWBIRD “…at NIPS, AISTATS, KDD and other learning conferences…” For skiiers, NIPS, SNOWBIRD,… and…” AISTATS SIGIR KDD … “on PC of KDD, SIGIR, … and…” “… AISTATS,KDD,…” shorter paths ~ earlier iterationsmany paths ~ additional evidence Bootstrapping as graph proximity

  7. Outline • Open-domain IE as finding nodes “near” seeds on a graph • Approach 1: A “natural” graph derived from a smaller corpus + learned similarity • Approach 2: A carefully-engineered graph derived from huge corpus (above) “with” Einat Minkov (CMU  Nokia) “with” Richard Wang (CMU  ?)

  8. Learning Similarity Measures for Parsed Text (Minkov & Cohen, EMNLP 2008) nsubj partmod prep.with boys like playing all kinds cars det prep.of NN NN VB VB DT NN Dependency parsed sentence is a naturally represented as a tree

  9. Learning Similarity Measures for Parsed Text (Minkov & Cohen, EMNLP 2008) Dependency parsed corpus is “naturally” represented as a graph

  10. Learning Similarity Measures for Parsed Text (Minkov & Cohen, EMNLP 2008) • Open IE Goal: • Find “coordinate terms” (eg, girl/boy, dolls/cars) in the graph, or find • Similarity measure S so S(girl,boy) is high • What about off-the-shelf similarity measures: • Random Walk with Restart (RWR) • Hitting time • Commute time • … ?

  11. Personalized PR/RWR A query language:Q: { , } The graph Nodes Node type Edge label Edge weight Returns a list of nodes (of type ) ranked by the graph walk probs. graph walk parameters: edge weights Θ , walk length K and reset probabilityγ. Approximate with power iteration, cut off after fixed number of iterations K. M[x,y] = Prob. of reaching y from x in one step:the edge weight from x to y, out of the outgoing weight from x. `Personalized PageRank’:reset probability biased towardsinitial distribution.

  12. mention nsubj mention-1 mention nsubj-1 mention-1 girls girls1 like1 like like2 boys2 boys

  13. mention nsubj mention-1 mention nsubj-1 mention-1 girls girls1 like1 like like2 boys2 boys mention nsubj partmod mention-1 mention mention-1 girls girls1 like1 playing1 playing … boys

  14. mention nsubj mention-1 Prep.with mention-1 girls girls1 like1 playing1 dolls1 dolls Useful but not our goal here…

  15. Learning a better similarity metric Task T (query class) Seed words (“girl”, “boy”, …) … Query q Query a Query b + Rel. answers a + Rel. answers b + Rel. answers q GRAPH WALK • node rank 1 • node rank 2 • node rank 3 • node rank 4 • … • node rank 10 • node rank 11 • node rank 12 • … • node rank 50 • node rank 1 • node rank 2 • node rank 3 • node rank 4 • … • node rank 10 • node rank 11 • node rank 12 • … • node rank 50 • node rank 1 • node rank 2 • node rank 3 • node rank 4 • … • node rank 10 • node rank 11 • node rank 12 • … • node rank 50 Potential new instances of the target concept (“doll”, “child”, “toddler”, …)

  16. Learning methods Weight tuning – weightslearned per edge type[Diligenti et-al, 2005] Reranking – re-order the retrieved list using global featuresof all paths from source to destination [Minkov et-al, 2006] boys dolls FEATURES • Edge label sequences nsubj.nsubj-inv nsubj  partmod  prep.in nsubj  partmod  partmod-inv  nsubj-inv • Lexical unigrams • … “like”, “playing” “like”, “playing”

  17. Vq “girls” nsubj nsubj-inv x1 partmod x2 partmod-inv prep.in x3 nsubj-inv boys Learning methods: Path-Constrained Graph Walk PCW (summary): for each node x, learn • P(xz : relevant(z) | history(Vq,x) ) • History(Vq,x) = seq of edge labels leading from Vq to x, with all histories stored in a tree boys dolls boys nsubj.nsubj-inv nsubj  partmod  prep.in nsubj  partmod  partmod-inv  nsubj-inv dolls

  18. City and person name extraction City names:Vq = {sydney, stamford, greenville, los_angeles} Person names:Vq = {carter, dave_kingman, pedro_ramos, florio} Labeling Complete Partial/Noisy • 10 (X4) queries for each task • Train queries q1-q5 / test queries q6-q10 • Extract nodes of type NE. • GW: 6 steps, uniform/learned weights • Reranking: top 200 nodes (using learned weights) • Path trees: 20 correct / 20 incorrect; threshold 0.5

  19. MUC City names Person names precision rank

  20. MUC City names Person names precision rank conj-and, prep-in, nn, appos … subj, obj, poss, nn …

  21. MUC City names Person names precision rank conj-and, prep-in, nn, appos … subj, obj, poss, nn … prep-in-inv  conj-andnn-inv  nn nsubj  nsubj-invappos  nn-inv

  22. MUC City names Person names precision rank conj-and, prep-in, nn, appos … subj, obj, poss, nn … Prep-in-inv  conj-andnn-inv  nn nsubj  nsubj-invappos  nn-inv LEX.”based”, LEX.”downtown” LEX.”mr”, LEX.”president”

  23. Vector-space models • Co-occurrence vectors (counts; window: +/- 2) • Dependency vectors [Padó & Lapata, Comp Ling 07] • A path value function: • Length-based value: 1 / length(path) • Relation based value: subj-5, obj-4, obl-3, gen-2, else-1 • Context selection function: • Minimal: verbal predicate-argument (length 1) • Medium: coordination, genitive construction, noun compounds (<=3) • Maximal: combinations of the above (<=4) • Similarity function: • Cosine • Lin • Only score the top nodes retrieved with reranking (~1000 overall)

  24. GWs – Vector models MUC City names Person names precision rank • The graph-based methods are best (syntactic + learning)

  25. GWs – Vector models MUC + AP City names Person names precision rank • The advantage of the graph based models diminishes with the amount of data. • This is hard to evaluate at high ranks

  26. Outline • Open-domain IE as finding nodes “near” seeds on a graph • Approach 1: A “natural” graph derived from a smaller corpus + learned similarity • Approach 2: A carefully-engineered graph derived from huge corpus “with” Einat Minkov (CMU  Nokia) “with” Richard Wang (CMU  ?)

  27. Set Expansion for Any Language (SEAL) – (Wang & Cohen, ICDM 07) • Basic ideas • Dynamically build the graph using queries to the web • Constrain the graph to be as useful as possible • Be smart about queries • Be smart about “patterns”: use clever methods for finding meaningful structure on web pages

  28. Pentax • Sony • Kodak • Minolta • Panasonic • Casio • Leica • Fuji • Samsung • … System Architecture • Canon • Nikon • Olympus • Fetcher: download web pages from the Web that contain all the seeds • Extractor: learn wrappers from web pages • Ranker: rank entities extracted by wrappers

  29. The Extractor • Learn wrappers from web documents and seeds on the fly • Utilize semi-structured documents • Wrappers defined at character level • Very fast • No tokenization required; thus language independent • Wrappers derived from doc d applied to d only • See ICDM 2007 paper for details

  30. I am noise Me too!

  31. The Ranker • Rank candidate entity mentions based on “similarity” to seeds • Noisy mentions should be ranked lower • Random Walk with Restart (GW) • As before… • What’s the graph?

  32. Building a Graph • A graph consists of a fixed set of… • Node Types: {seeds, document, wrapper, mention} • Labeled Directed Edges: {find, derive, extract} • Each edge asserts that a binary relation r holds • Each edge has an inverse relation r-1 (graph is cyclic) • Intuition: good extractions are extracted by many good wrappers, and good wrappers extract many good extractions, “ford”, “nissan”, “toyota” Wrapper #2 find northpointcars.com extract curryauto.com derive “chevrolet” 22.5% “volvo chicago” 8.4% Wrapper #1 “honda” 26.1% Wrapper #3 Wrapper #4 “acura” 34.6% “bmw pittsburgh” 8.4%

  33. Evaluation Datasets: closed sets

  34. Evaluation Method • Mean Average Precision • Commonly used for evaluating ranked lists in IR • Contains recall and precision-oriented aspects • Sensitive to the entire ranking • Mean of average precisions for each ranked list Prec(r) = precision at rank r (a) Extracted mention at r matches any true mention (b) There exist no other extracted mention at rank less than r that is of the same entity as the one at r where L = ranked list of extracted mentions, r = rank • Evaluation Procedure(per dataset) • Randomly select threetrue entities and use their first listed mentions as seeds • Expand the three seeds obtained from step 1 • Repeat steps 1 and 2 five times • Compute MAP for the five ranked lists # True Entities = total number of true entities in this dataset

  35. Experimental Results: 3 seeds • Vary: [Extractor] + [Ranker] + [Top N URLs] • Extractor: • E1: Baseline Extractor (longest common context for all seed occurrences) • E2: Smarter Extractor (longest common context for 1 occurrence of each seed) • Ranker: { EF: Baseline (Most Frequent), GW: Graph Walk } • N URLs: { 100, 200, 300 }

  36. Side by side comparisons Telukdar, Brants, Liberman, Pereira, CoNLL 06

  37. Side by side comparisons EachMovie vs WWW NIPS vs WWW Ghahramani & Heller, NIPS 2005

  38. A limitation of the original SEAL

  39. Proposed Solution: Iterative SEAL (iSEAL)(Wang & Cohen, ICDM 2008) • Makes several calls to SEAL, each call… • Expands a couple of seeds • Aggregates statistics • Evaluate iSEAL using… • Two iterative processes • Supervised vs. Unsupervised (Bootstrapping) • Two seeding strategies • Fixed Seed Size vs. Increasing Seed Size • Five ranking methods

  40. ISeal (Fixed Seed Size, Supervised) Initial Seeds • Finally rank nodes by proximity to seeds in the full graph • Refinement (ISS): Increase size of seed set for each expansion over time: 2,3,4,4,… • Variant (Bootstrap): use high-confidence extractions when seeds run out

  41. Ranking Methods Random Graph Walk with Restart • H. Tong, C. Faloutsos, and J.-Y. Pan. Fast random walk with restart and its application. In ICDM, 2006. PageRank • L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. 1998. Bayesian Sets (over flattened graph) • Z. Ghahramani and K. A. Heller. Bayesian sets. In NIPS, 2005. Wrapper Length • Weights each item based on the length of common contextual string of that item and the seeds Wrapper Frequency • Weights each item based on the number of wrappers that extract the item

  42. Little difference between ranking methods for supervised case (all seeds correct); large differences when bootstrapping Increasing seed size {2,3,4,4,…} makes all ranking methods improve steadily in bootstrapping case

  43. Current work • Start with name of concept (e.g., “NFL teams”) • Look for (language-dependent) patterns: • “… for successful NFL teams (e.g., Pittsburgh Steelers, New York Giants, …)” • Take most frequent answers as seeds • Run bootstrapping iSEAL with seed sizes 2,3,4,4….

  44. Datasets with concept names

  45. Experimental results Direct use of text patterns

More Related