1 / 119

Information Extraction, Conditional Random Fields, and Social Network Analysis

Information Extraction, Conditional Random Fields, and Social Network Analysis. Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Aron Culotta, Charles Sutton, Ben Wellner, Khashayar Rohanimanesh, Wei Li, Andres Corrada, Xuerui Wang. Goal:.

lhelmer
Download Presentation

Information Extraction, Conditional Random Fields, and Social Network Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Extraction,Conditional Random Fields,and Social Network Analysis Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Aron Culotta, Charles Sutton, Ben Wellner, Khashayar Rohanimanesh, Wei Li,Andres Corrada, Xuerui Wang

  2. Goal: Mine actionable knowledgefrom unstructured text.

  3. foodscience.com-Job2 JobTitle: Ice Cream Guru Employer: foodscience.com JobCategory: Travel/Hospitality JobFunction: Food Services JobLocation: Upper Midwest Contact Phone: 800-488-2611 DateExtracted: January 8, 2001 Source: www.foodscience.com/jobs_midwest.html OtherCompanyJobs: foodscience.com-Job1 Extracting Job Openings from the Web

  4. A Portal for Job Openings

  5. Job Openings: Category = High Tech Keyword = Java Location = U.S.

  6. Data Mining the Extracted Job Information

  7. IE fromChinese Documents regarding Weather Department of Terrestrial System, Chinese Academy of Sciences 200k+ documents several millennia old - Qing Dynasty Archives - memos - newspaper articles - diaries

  8. IE from Research Papers [McCallum et al ‘99]

  9. IE from Research Papers

  10. Mining Research Papers [Rosen-Zvi, Griffiths, Steyvers, Smyth, 2004] [Giles et al]

  11. What is “Information Extraction” As a familyof techniques: Information Extraction = segmentation + classification + clustering + association October 14, 2002, 4:00 a.m. PT For years, Microsoft CorporationCEOBill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a MicrosoftVP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation

  12. What is “Information Extraction” As a familyof techniques: Information Extraction = segmentation + classification + association + clustering October 14, 2002, 4:00 a.m. PT For years, Microsoft CorporationCEOBill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a MicrosoftVP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation

  13. What is “Information Extraction” As a familyof techniques: Information Extraction = segmentation + classification+ association + clustering October 14, 2002, 4:00 a.m. PT For years, Microsoft CorporationCEOBill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a MicrosoftVP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation

  14. What is “Information Extraction” As a familyof techniques: Information Extraction = segmentation + classification+ association+ clustering October 14, 2002, 4:00 a.m. PT For years, Microsoft CorporationCEOBill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a MicrosoftVP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation * Free Soft.. Microsoft Microsoft TITLE ORGANIZATION * founder * CEO VP * Stallman NAME Veghte Bill Gates Richard Bill

  15. Larger Context Spider Filter Data Mining IE Segment Classify Associate Cluster Discover patterns - entity types - links / relations - events Database Documentcollection Actionableknowledge Prediction Outlier detection Decision support

  16. Outline a • Examples of IE and Data Mining. • Brief review of Conditional Random Fields • Joint inference: Motivation and examples • Joint Labeling of Cascaded Sequences (Belief Propagation) • Joint Labeling of Distant Entities (BP by Tree Reparameterization) • Joint Co-reference Resolution (Graph Partitioning) • Joint Segmentation and Co-ref (Iterated Conditional Samples) • Interactive IE • Two example projects • Email, contact management, and Social Network Analysis • Research Paper search and analysis

  17. Hidden Markov Models HMMs are the standard sequence modeling tool in genomics, music, speech, NLP, … Graphical model Finite state model S S S transitions t - 1 t t+1 ... ... observations ... Generates: State sequence Observation sequence O O O t - t +1 t 1 o1 o2 o3 o4 o5 o6 o7 o8 Parameters: for all states S={s1,s2,…} Start state probabilities: P(st ) Transition probabilities: P(st|st-1 ) Observation (emission) probabilities: P(ot|st ) Training: Maximize probability of training observations (w/ prior) Usually a multinomial over atomic, fixed alphabet

  18. IE with Hidden Markov Models Given a sequence of observations: Yesterday Rich Caruana spoke this example sentence. and a trained HMM: person name location name background Find the most likely state sequence: (Viterbi) YesterdayRich Caruanaspoke this example sentence. Any words said to be generated by the designated “person name” state extract as a person name: Person name: Rich Caruana

  19. We want More than an Atomic View of Words Would like richer representation of text: many arbitrary, overlapping features of the words. S S S identity of word ends in “-ski” is capitalized is part of a noun phrase is in a list of city names is under node X in WordNet is in bold font is indented is in hyperlink anchor last person name was female next two words are “and Associates” t - 1 t t+1 … is “Wisniewski” … part ofnoun phrase ends in “-ski” O O O t - t +1 t 1

  20. Problems with Richer Representationand a Joint Model These arbitrary features are not independent. • Multiple levels of granularity (chars, words, phrases) • Multiple dependent modalities (words, formatting, layout) • Past & future Two choices: Ignore the dependencies. This causes “over-counting” of evidence (ala naïve Bayes). Big problem when combining evidence, as in Viterbi! Model the dependencies. Each state would have its own Bayes Net. But we are already starved for training data! S S S S S S t - 1 t t+1 t - 1 t t+1 O O O O O O t - t +1 t - t +1 t 1 t 1

  21. Conditional Sequence Models • We prefer a model that is trained to maximize a conditional probability rather than joint probability:P(s|o) instead of P(s,o): • Can examine features, but not responsible for generating them. • Don’t have to explicitly model their dependencies. • Don’t “waste modeling effort” trying to generate what we are given at test time anyway.

  22. From HMMs to Conditional Random Fields [Lafferty, McCallum, Pereira 2001] St-1 St St+1 Joint ... ... Ot-1 Ot Ot+1 Conditional St-1 St St+1 ... Ot-1 Ot Ot+1 ... where (A super-special case of Conditional Random Fields.) Set parameters by maximum likelihood, using optimization method on L.

  23. Table Extraction from Government Reports Cash receipts from marketings of milk during 1995 at $19.9 billion dollars, was slightly below 1994. Producer returns averaged $12.93 per hundredweight, $0.19 per hundredweight below 1994. Marketings totaled 154 billion pounds, 1 percent above 1994. Marketings include whole milk sold to plants and dealers as well as milk sold directly to consumers. An estimated 1.56 billion pounds of milk were used on farms where produced, 8 percent less than 1994. Calves were fed 78 percent of this milk with the remainder consumed in producer households. Milk Cows and Production of Milk and Milkfat: United States, 1993-95 -------------------------------------------------------------------------------- : : Production of Milk and Milkfat 2/ : Number :------------------------------------------------------- Year : of : Per Milk Cow : Percentage : Total :Milk Cows 1/:-------------------: of Fat in All :------------------ : : Milk : Milkfat : Milk Produced : Milk : Milkfat -------------------------------------------------------------------------------- : 1,000 Head --- Pounds --- Percent Million Pounds : 1993 : 9,589 15,704 575 3.66 150,582 5,514.4 1994 : 9,500 16,175 592 3.66 153,664 5,623.7 1995 : 9,461 16,451 602 3.66 155,644 5,694.3 -------------------------------------------------------------------------------- 1/ Average number during year, excluding heifers not yet fresh. 2/ Excludes milk sucked by calves.

  24. Table Extraction from Government Reports [Pinto, McCallum, Wei, Croft, 2003 SIGIR] 100+ documents from www.fedstats.gov Labels: CRF • Non-Table • Table Title • Table Header • Table Data Row • Table Section Data Row • Table Footnote • ... (12 in all) Cash receipts from marketings of milk during 1995 at $19.9 billion dollars, was slightly below 1994. Producer returns averaged $12.93 per hundredweight, $0.19 per hundredweight below 1994. Marketings totaled 154 billion pounds, 1 percent above 1994. Marketings include whole milk sold to plants and dealers as well as milk sold directly to consumers. An estimated 1.56 billion pounds of milk were used on farms where produced, 8 percent less than 1994. Calves were fed 78 percent of this milk with the remainder consumed in producer households. Milk Cows and Production of Milk and Milkfat: United States, 1993-95 -------------------------------------------------------------------------------- : : Production of Milk and Milkfat 2/ : Number :------------------------------------------------------- Year : of : Per Milk Cow : Percentage : Total :Milk Cows 1/:-------------------: of Fat in All :------------------ : : Milk : Milkfat : Milk Produced : Milk : Milkfat -------------------------------------------------------------------------------- : 1,000 Head --- Pounds --- Percent Million Pounds : 1993 : 9,589 15,704 575 3.66 150,582 5,514.4 1994 : 9,500 16,175 592 3.66 153,664 5,623.7 1995 : 9,461 16,451 602 3.66 155,644 5,694.3 -------------------------------------------------------------------------------- 1/ Average number during year, excluding heifers not yet fresh. 2/ Excludes milk sucked by calves. Features: • Percentage of digit chars • Percentage of alpha chars • Indented • Contains 5+ consecutive spaces • Whitespace in this line aligns with prev. • ... • Conjunctions of all previous features, time offset: {0,0}, {-1,0}, {0,1}, {1,2}.

  25. Table Extraction Experimental Results [Pinto, McCallum, Wei, Croft, 2003 SIGIR] Line labels, percent correct Table segments, F1 HMM 65 % 64 % Stateless MaxEnt 85 % - CRF 95 % 92 %

  26. IE from Research Papers [McCallum et al ‘99]

  27. IE from Research Papers Field-level F1 Hidden Markov Models (HMMs) 75.6 [Seymore, McCallum, Rosenfeld, 1999] Support Vector Machines (SVMs) 89.7 [Han, Giles, et al, 2003] Conditional Random Fields (CRFs) 93.9 [Peng, McCallum, 2004] error 40%

  28. Named Entity Recognition CRICKET - MILLNS SIGNS FOR BOLAND CAPE TOWN 1996-08-22 South African provincial side Boland said on Thursday they had signed Leicestershire fast bowler David Millns on a one year contract. Millns, who toured Australia with England A in 1992, replaces former England all-rounder Phillip DeFreitas as Boland's overseas professional. Labels: Examples: PER Yayuk Basuki Innocent Butare ORG 3M KDP Cleveland LOC Cleveland Nirmal Hriday The Oval MISC Java Basque 1,000 Lakes Rally

  29. Named Entity Extraction Results [McCallum & Li, 2003, CoNLL] Method F1 HMMs BBN's Identifinder 73% CRFs w/out Feature Induction 83% CRFs with Feature Induction 90% based on LikelihoodGain

  30. Outline a a • Examples of IE and Data Mining. • Brief review of Conditional Random Fields • Joint inference: Motivation and examples • Joint Labeling of Cascaded Sequences (Belief Propagation) • Joint Labeling of Distant Entities (BP by Tree Reparameterization) • Joint Co-reference Resolution (Graph Partitioning) • Joint Segmentation and Co-ref (Iterated Conditional Samples) • Interactive IE • Two example projects • Email, contact management, and Social Network Analysis • Research Paper search and analysis

  31. Larger Context Spider Filter Data Mining IE Segment Classify Associate Cluster Discover patterns - entity types - links / relations - events Database Documentcollection Actionableknowledge Prediction Outlier detection Decision support

  32. Problem: Combined in serial juxtaposition, IE and DM are unaware of each others’ weaknesses and opportunities. DM begins from a populated DB, unaware of where the data came from, or its inherent uncertainties. IE is unaware of emerging patterns and regularities in the DB. The accuracy of both suffers, and significant mining of complex text sources is beyond reach.

  33. Solution: Uncertainty Info Spider Filter Data Mining IE Segment Classify Associate Cluster Discover patterns - entity types - links / relations - events Database Documentcollection Actionableknowledge Emerging Patterns Prediction Outlier detection Decision support

  34. Discriminatively-trained undirected graphical models Conditional Random Fields [Lafferty, McCallum, Pereira] Conditional PRMs [Koller…], [Jensen…], [Geetor…], [Domingos…] Complex Inference and Learning Just what we researchers like to sink our teeth into! Solution: Unified Model Spider Filter Data Mining IE Segment Classify Associate Cluster Discover patterns - entity types - links / relations - events Probabilistic Model Documentcollection Actionableknowledge Prediction Outlier detection Decision support

  35. Larger-scale Joint Inference for IE • What model structures will capture salient dependencies? • Will joint inference improve accuracy? • How do to inference in these large graphical models? • How to efficiently train these models,which are built from multiple large components?

  36. 1. Jointly labeling cascaded sequencesFactorial CRFs [Sutton, Khashayar, McCallum, ICML 2004] Named-entity tag Noun-phrase boundaries Part-of-speech English words

  37. 1. Jointly labeling cascaded sequencesFactorial CRFs [Sutton, Khashayar, McCallum, ICML 2004] Named-entity tag Noun-phrase boundaries Part-of-speech English words

  38. 1. Jointly labeling cascaded sequencesFactorial CRFs [Sutton, Khashayar, McCallum, ICML 2004] Named-entity tag Noun-phrase boundaries Part-of-speech English words But errors cascade--must be perfect at every stage to do well.

  39. 1. Jointly labeling cascaded sequencesFactorial CRFs [Sutton, Khashayar, McCallum, ICML 2004] Named-entity tag Noun-phrase boundaries Part-of-speech English words Joint prediction of part-of-speech and noun-phrase in newswire, matching accuracy with only 50% of the training data. Inference: Tree reparameterization BP [Wainwright et al, 2002]

  40. 2. Jointly labeling distant mentionsSkip-chain CRFs [Sutton, McCallum, SRL 2004] … Senator Joe Green said today … . Green ran for … Dependency among similar, distant mentions ignored.

  41. 2. Jointly labeling distant mentionsSkip-chain CRFs [Sutton, McCallum, SRL 2004] … Senator Joe Green said today … . Green ran for … 14% reduction in error on most repeated field in email seminar announcements. Inference: Tree reparameterization BP [Wainwright et al, 2002]

  42. 3. Joint co-reference among all pairsAffinity Matrix CRF “Entity resolution”“Object correspondence” . . . Mr Powell . . . 45 . . . Powell . . . Y/N Y/N -99 Y/N ~25% reduction in error on co-reference of proper nouns in newswire. 11 . . . she . . . Inference: Correlational clustering graph partitioning [McCallum, Wellner, IJCAI WS 2003, NIPS 2004] [Bansal, Blum, Chawla, 2002]

  43. Coreference Resolution AKA "record linkage", "database record deduplication", "entity resolution", "object correspondence", "identity uncertainty" Output Input News article, with named-entity "mentions" tagged Number of entities, N = 3 #1 Secretary of State Colin Powell he Mr. Powell Powell #2 Condoleezza Rice she Rice #3 President Bush Bush Today Secretary of State Colin Powell met with . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . he . . . . . . . . . . . . . . . . . . . Condoleezza Rice . . . . . . . . . Mr Powell . . . . . . . . . .she . . . . . . . . . . . . . . . . . . . . . Powell . . . . . . . . . . . . . . . President Bush . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rice . . . . . . . . . . . . . . . . Bush . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  44. Inside the Traditional Solution Pair-wise Affinity Metric Mention (3) Mention (4) Y/N? . . . Powell . . . . . . Mr Powell . . . N Two words in common 29 Y One word in common 13 Y "Normalized" mentions are string identical 39 Y Capitalized word in common 17 Y > 50% character tri-gram overlap 19 N < 25% character tri-gram overlap -34 Y In same sentence 9 Y Within two sentences 8 N Further than 3 sentences apart -1 Y "Hobbs Distance" < 3 11 N Number of entities in between two mentions = 0 12 N Number of entities in between two mentions > 4 -3 Y Font matches 1 Y Default -19 OVERALL SCORE = 98 > threshold=0

  45. The Problem Pair-wise merging decisions are being made independently from each other . . . Mr Powell . . . affinity = 98 Y . . . Powell . . . N affinity = -104 They should be made in relational dependence with each other. Y affinity = 11 . . . she . . . Affinity measures are noisy and imperfect.

  46. A Markov Random Field for Co-reference (MRF) [McCallum & Wellner, 2003, ICML] . . . Mr Powell . . . Make pair-wise merging decisions in dependent relation to each other by - calculating a joint prob. - including all edge weights - adding dependence on consistent triangles. 45 . . . Powell . . . Y/N Y/N -30 Y/N 11 . . . she . . .

  47. A Markov Random Field for Co-reference (MRF) [McCallum & Wellner, 2003] . . . Mr Powell . . . Make pair-wise merging decisions in dependent relation to each other by - calculating a joint prob. - including all edge weights - adding dependence on consistent triangles. 45 . . . Powell . . . Y/N Y/N -30 Y/N 11 . . . she . . .

  48. A Markov Random Field for Co-reference (MRF) [McCallum & Wellner, 2003] . . . Mr Powell . . . -(45) . . . Powell . . . N N -(-30) Y +(11) -4 . . . she . . .

  49. A Markov Random Field for Co-reference (MRF) [McCallum & Wellner, 2003] . . . Mr Powell . . . +(45) . . . Powell . . . Y N -(-30) Y +(11) -infinity . . . she . . .

  50. A Markov Random Field for Co-reference (MRF) [McCallum & Wellner, 2003] . . . Mr Powell . . . +(45) . . . Powell . . . Y N -(-30) N -(11) . . . she . . . 64

More Related