1 / 39

The Future of Semantic Web Search

The Future of Semantic Web Search. Oren Etzioni. Information Overload. Well, we have Google…. Source: Carlos Guestrin. Explosive growth of Mobile Devices. Paradigm Shift (Etzioni, Nature ’11). Imagine search over a semantic space Key words, docs  entities, predicates

nijole
Download Presentation

The Future of Semantic Web Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Future of Semantic Web Search Oren Etzioni

  2. Information Overload

  3. Well, we have Google… Source: Carlos Guestrin

  4. Explosive growth of Mobile Devices

  5. Paradigm Shift (Etzioni, Nature ’11) Imagine search over a semantic space • Key words, docs  entities, predicates • TF-IDF, pagerank  relation graph • “10 blue links”  question answering • Google Knowledge Graph (2012) • Facebook Graph Search (2013)

  6. Google’s split personality!

  7. Outline IE(text) = knowledge • Information Extraction (IE): what, how? • IE in action: reviews, news, tweets • Open IE = IE at Web scale (demo) • Future Work Technical Ideas: IE = ML Scalability of IE

  8. IE: sentence  tuple, prob “By all accounts, Edison is credited as the inventor of the light bulb.” invented(Edison, light bulb), 0.99

  9. Context  clues to meaning “You shall know a word by the company it keeps” (Firth, 1957) • Nerja??? • …Nerjamayor… • …DowntownNerja… • Spanish towns such as …Nerja… • Country such as Garth Brooks… Where do clues come from?

  10. Clues Learned from Labeled Text Target Relation = Cities Training examples = labeled sentences • Madrid mayor… • …DowntownMadrid… • Spanish cities such as …Madrid…

  11. Basic Information Extraction Loop • Label sentences with examples of R • Extract frequent contexts of R as Clues • Use Clues to find more examples Options: • Mutual recursion (Snowball) • Negative examples (NELL) • Parser

  12. II. IE in Action---Product Reviews • Hard to aggregate info across reviews • Hard to filter phony reviews & outliers • Hard to focus on key attributes • Impossible to sort on attributes • Difficult to read on phone Pew Internet 2012 study: 24% used phone to look up reviews in store

  13. RevMiner(Huang, Etzioni, UIST ‘12) Input: Seattle restaurant reviews + meta data • Extract key attributes = values • Dim sum = delicious • Server = rude • Cluster attributes into “food”, “service”… Query: “good margaritas” Answer: Sentiment sorted results

  14. Extractive Interface on iphone

  15. No Regrets.

  16. Decide’s Mission Eliminate Buyer’s Remorse

  17. News in Consumer Electronics Thousands of CE “news & rumor” sites, blogs, etc.

  18. Some Current Examples • https://www.decide.com/p/laptops-toshiba-p870-late-2012-satellite-17-3-106509622 • https://www.decide.com/p/video-game-consoles-microsoft-xbox-360-s-xbox-360-slim-4gb-video-game-system-rkb00001-15819300 • Even vacuum cleaners: • https://www.decide.com/p/vacuums-dyson-dc41-ball-dc41-animal-upright-vacuum-dc41animal-29455276

  19. Critique of Conventional IE • Extracting towns tells you nothing about hotels, countries, medications, etc. • Hand-label training examples Difficult to scale!

  20. Big Data

  21. III. Open IE (U.S. Patent 7,877,343) “No sentence left behind” IE • Relations are not pre-specified • No hand labeled examples • Single pass over the corpus • Independent of Genre (Spanish) • Over 1,000 articles

  22. TextRunner First Web-scale, Open IE system (Banko, Cafarella, Etzioni et al IJCAI ‘07) 1,000,000,000 distinct extractions Peak Precision = 0.9 (limited recall)

  23. Open IE Demo Open source extractors: reverb.cs.washington.edu ollie.cs.washington.edu

  24. Open IE Highlights • Scale: 10B extractions • No pre-determined vocabulary • Entity linking & type inference • Graceful degredation • Much work left… Openie.cs.washington.edu

  25. Sample of Extracted Relations

  26. Number of Relations

  27. Event Calendar (2M tweets daily) Alan Ritter (KDD ‘12), source code on github.

  28. IV. Future Work • Open Question Answering • NL QA over arbitrary text (Fader, ACL ‘13) • Robust dialog “..ok, let’s look for Sushi outside of downtown, but I need convenient parking..” • Inference • Often information is implicit • Text meets Linked Open Data

  29. Comprehensive Knowledge Open-ended Learning Tractable reasoning

  30. Conclusions • Keyword search extraction, relations • IE is here: Decide.com, revminer.com • IE = ML • Where does labeled data come from? • Open IE: leverage the structure of language to achieve scale • Open Question Answering & reasoning • Thank you!

  31. Example Opinion Patterns • X is so loving • I love X so much • X makes me laugh • I am in love with X • X did a great job • It was a pleasure working with X • I’d be lost without X • I hate X with a passion • X is an idiot • Death to X • X is gross • X is stupid as hell • I don’t like X one bit • I loathe X • X is my mortal enemy

  32. Robust QA (Example) What prevents X? = • What can people do to prevent X? • How can X be avoided? • What will help prevent X? • What are the steps to reduce X? • How can you prevent X? • What is the prevention of X? • Is there a way to prevent X?

  33. IE Systems Discover Clues

More Related