1 / 13

Real-time population of Knowledge Bases: Opportunities and Challenges

Real-time population of Knowledge Bases: Opportunities and Challenges. Ndapa Nakashole Gerhard Weikum. AKBC Workshop at NAACL 2012. Real-time Data Sources. In news and social media, the implicit query is: What’s happening right now?

haile
Download Presentation

Real-time population of Knowledge Bases: Opportunities and Challenges

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Real-timepopulationof Knowledge Bases: Opportunities and Challenges Ndapa Nakashole Gerhard Weikum AKBC Workshop at NAACL 2012

  2. Real-time Data Sources • In news and social media, the implicit query is: • What’s happening right now? • Batch-oriented KBP methods rely on Web snapshots (e.g., ClueWeb09, ca. 3 years old) • News aggregators present a timely big picture • But they display text snippets and headlines, not relational facts • [Google News, …]

  3. Goal: Real-time KBP • Goal: Timely transformation of text into relational facts • Enabling fine-grained exploration of the big picture as it emerges • The big picture is a series of stories and events • Stories and events are made of facts

  4. Stories and Events Tiger Woods indiscretions revealed latest information in the form of relational facts Our focus is on capturing and producing the latest i Breaking news Francois Hollandeelected as president of France Koffi Annan warns about Syria Goerge Zimmerman arrested in Martin murder case 2012 French Elections Syria crisis Treyvon Martin case

  5. Challenges (1): Relation Discovery • Open Set of Relations • Need to discover and maintain a large, dynamically evolving set of relations • Go beyond common relations such as “bornIn” • Example interesting relations: firedFrom, hadAffairWith, … • Capture only semantically meaningful relations • Discard noisy relations

  6. Challenge (2): Dynamic Entity Discovery • For semantic consistency in the facts we extract • Need to map noun phrases to entities in a KB • E.g. , “Jeff Dean” can mean Google engineer or rock musician • But, KBs are incomplete in the entities they contain • Jeff Dean the Google engineer doesn’t have a Wikipedia page • He is missing in Wikipedia-derived KBs • Open set of entities • Need to recognise and handle out-of-KB entities • But go above the level of noun phrases

  7. Challenge (3): Extraction under Time Constraints • Due to need for timely fact extraction • Need to produce results under time constraints • We would like to report the facts soon after they become available as • not a few weeks down the line

  8. Our Approach

  9. Approach – Relation Discovery:Semantically-typed patterns • To identify meaningful relations • We introduced Syntactic-Lexical-Ontological (SOL) patterns • Syntactic-Lexical – surface words and part-of-speech stags • Ontological – semantic classes as entity placeholders, e.g, <singer, scientist, …> • Example SOL patterns: • <comedian> parodied <person> • <musician> wrote hits for <musician> • <person> headliner at <event>

  10. Approach – Relation Discovery:Semantically-typed patterns (2) • SOL patterns are arranged them into synonyms and a hierarchy of subsumptions • Example subsumptions: • wife of => spouse of • spouse of => knows • We produced ca. 350.000 SOL patterns • Available for download • For details see: Nakashole, Weikum and Suchanek at EMNLP 2012

  11. Approach – Dynamic Entity Discovery: Infer types for new entities • SOL patterns require that entities have types • Need to align new entities along ontological dimension • Proposal: infer entity types from SOL patterns • SOL pattern: <singer> released <album> • Given: X released Y, Is X of type singer? Not always! • Due to: polysemyin syntax • Due to: incorrect dependency paths between entity pairs • But we can approximate likely types

  12. Approach – Time Constraints: Continuous processing model t2 t3 • Continuously process stream of incoming documents • Define a time slice for extraction • Time window • Within time slice, define target recall • Redundancy means need not process all documents in a time slice G. W Bush travels toTexas Elton John performs at Royal Concert … M Shaporavadefeated by V. Azarenka Demi Moore files for divorce from A.Kutcher Martin Scorceseenominated for Oscar … KB

  13. Thanks ! Poster 14

More Related