1 / 24

Low-Pass Semantics

Low-Pass Semantics. Fernando Pereira with Rahul Gupta, Ni Lao, Amar Subramanya Rest of the Google language understanding team William Cohen, CMU. Let’s go: Mount Shasta, California. Knowledge Graph: linked data. How do I climb Shasta?. . .. way down. I’d like to ski from the summit.

oya
Download Presentation

Low-Pass Semantics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Low-Pass Semantics • Fernando Pereira • with • Rahul Gupta, Ni Lao, Amar Subramanya • Rest of the Google language understanding team • William Cohen, CMU

  2. Let’s go: Mount Shasta, California

  3. Knowledge Graph: linked data

  4. How do I climb Shasta? ... way down ...

  5. I’d like to ski from the summit ... way down ...

  6. Lessons so far • “Things, not strings” • Semantic index • Join keys for composing relations • But most information is still in text • Link “things” and text

  7. Low-Pass Semantics • Learn to translate and link • Natural-language text • Knowledge bases (key spaces) • (Semi-)structured data • Accept lack of finer semantic details • Links ⇒ inference ⇒ meaning

  8. Layers of meaning • Grammatical structure (parsing) • Referring expression classification • Within-document coreference • Entity resolution • Entity co-occurrences ⇒relations

  9. Putting it all together Erik Wesner will talk about his research tomorrow. Though Wesner's parents are from Poland, he was born and raised in Raleigh, NC.

  10. Linguistic analysis matters Relation extraction quality

  11. Detour: Research vs Product • Research explores what is possible • Product is about what is effective • Enough impact to matter to users • Quality does not disappoint users • Dominates incumbent technology • Cost-benefit tradeoff: impact/resources • Most research results fail these tests

  12. Case Study:Entities ⇒ relationsDistributed large-scale learning N. Lao, A. Subramanya, F. Pereira, W. Cohen EMNLP 2012

  13. Joint KB+text inference • Path types: relation chains • Random walks on graph • Learn type weights to favor desired walks

  14. rm ...... r1 ...... rn rk Path Ranking Algorithm (PRA) (I) • Given a directed, edge-labeled graph source s target t r Is the sources connected to the target t by an edge labeledr? Path ranking algorithm (PRA): Lao and Cohen, Machine Learning, 2010

  15. weight feature value r1 r2 rl feature selection s t ...... PRA (II) Path type: Train by regularized logistic regression

  16. Case study: extend Freebase • Freebase: 21M concepts, 70M edges • Study relations: profession, nationality, parent • Profession stats: • 2M people in Freebase • 0.3M have a recorded profession • Biased statistics (0.24M politicians, actors)

  17. Training data for relation r • Positive examples: sample r edges with frequent targets Negative examples: sample pairs of concepts of types compatible with r but not in r in the knowledge base

  18. Male Gender Concept Resolution Noun-phrase Extraction Coreference Resolution (Haghighi & Klein 2009) Album Kind of Blue most likely concept for named mentions in a coreference cluster Miles Davis Mention Mention Text Processing POS Tagging & Dependency Parsing

  19. Text graph construction • 60M Web pages with relevant concepts • Dominated by frequent concepts • Barack Obama is mentioned about 8.9M times in our corpus • Sample their sentences

  20. Male Gender Album Kind of Blue Miles Davis Freebase and text graph 21M nodes 70M edges Cheju, South Korea Mention Mention Mention Text Graph Kind of Blue Jeju Island Davis nsubj 2B nodes 5B edges dobj South Korea

  21. Scaling up PRA • Path type pruning • Distribute the queries and explore in parallel in the Map step • Aggregate path types in the Reduce step • Index text graph by entity mentions

  22. Musician Musician Male Profession Gender Miles Davis Mention Mention Miles Davis John Coltrane A learned path for profession -1

  23. Human evaluation Known triples Coverage Relation extraction results

  24. A glimpse of meaning • Capable language analysis components • enable interpreting the Web as a huge, noisy knowledge base • that combines weak sources • from text and structured data • to support useful inferences

More Related