1 / 23

Presented by: Deepika Sethi

YAGO2: Exploring and Querying World Knowledge in Time, Space, Context, and Many Languages. Johannes Hoffart,Fabian M. Suchanek,Klaus Berberich,Edwin Lewis- Kelham,Gerard de Melo,Gerhard Weikum. Presented by: Deepika Sethi. Introduction Extraction Architecture

webb
Download Presentation

Presented by: Deepika Sethi

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. YAGO2: Exploring and Querying World Knowledge in Time, Space, Context, and Many Languages Johannes Hoffart,Fabian M. Suchanek,KlausBerberich,Edwin Lewis-Kelham,Gerard de Melo,GerhardWeikum Presented by: DeepikaSethi

  2. Introduction Extraction Architecture Temporal Dimension Geo-Spatial Dimension Textual Dimension Demo Conclusion

  3. Introduction • Extension of Yago • Build from Wikipedia, WordNet and GeoNames • Gathers and integrates temporal, spatial and semantic information • 10 million entities and events • 80 million facts

  4. Extraction Architecture • Entity : Each article in Wikipedia Ex : Leonard Cohen -> LeonardCohen • Categories : Describe type information Ex: Leonard Cohen -> Canadian poets • Yago 2 links it to taxonomy of WordNet Ex: Canadian poet -> poet

  5. Manually defined relations like wasBornOnDate , locatedIn etc • Categories & infoboxes deliver instances to these relations • Manually defined patterns mapping categories & infobox attributes to fact templates Ex: Leonard Cohen has attribute born=Montreal => wasBornIn(LeonardCohen,Montreal)

  6. Facts : Represented as triples : subject, predicate, object (SPO) • Reification : Ex : Fact with id #42 extracted from Wikipedia In Yago2: wasFoundIn(#42, Wikipedia)

  7. Factual Rules • Factual rules : Definitions of all relations, domains, ranges, classes that makeup YAGO2 hierarchy of literal types • Add three new classes: yagoLegalActor, yagoLegalActorGeo, yagoGeoEntity • Helps in extraction of classes Ex: “AmericaRockNRollSingers” class in Yago2 linked to WordNet class “singer”

  8. Implication Rules • Deduce new knowledge from existing knowledge • Expressed as a fact : triple • Subject-> Premise of implication • Object -> Conclusion • Ex: $1 $2 $3; $2 subpropertyOf $4;" implies "$1 $4 $3

  9. Replacement rules • Part of source text matches a specific regular expression , should be replaced by a certain string • Cleaning HTML tags, normalizing numbers, eliminating administrative Wikipedia categories & articles , do not want to process • "\{\{USA\}\}" replace "[[United States]]"

  10. Extraction Rules • If a part of the source text matches a specified regular expression, a sequence of facts shall be generated • Apply to patterns found in Wikipedia infoboxes, categories, titles, headings, links, references • "\[\[Category:(.+) births\]\]" pattern "$0 wasBornOnDate Date($1)"

  11. YAGO2: A Temporal Dimension • Derive the temporal properties of objects from the data we have in the knowledge base • Datatype: yagoDate Standard format: YYYY-MM-DD or YYYY-##-## • Four major entity types:People, Groups, Artifacts , Events

  12. Entity-Time relations YagoDate InstanceOf endsExistingOnDate startsExistingOnDate subpropertyOf diedOnDate wasBornOnDate happenedOnDate Fact BobDylanwasBornOnDate 1941-05-24, an implication rule creates the second fact BobDylanstartsExistingOnDate 1941-05-24.

  13. Facts with an extracted time • Occurrence time : time span when the fact occurred • Two relations: occursSince, occursUntil • BarackObamawasInauguratedAsPresidentOfTheUnitedStates with fact id #2, is written as #2 occursOnDate 2009-01-20

  14. Yago 2: A GeoSpatial Dimension • Entities: cities, countries, mountains,rivers • Class : yagoGeoEntity land location body of water track way real property excavation geological formation structure facility

  15. Datatype: yagoGeoCoordinates : (latitude, longitude) • Instance of yagoGeoCoordinates hasGeoCoordinates Geographical coordinates

  16. Matching locations: Two sources: Wikipedia & GeoNames • Matching classes: Two sources: WordNet & GeoNames An automated matching algorithm based on shallow noun phrase parsing of original YAGO to detect and match candidate class names

  17. Entities and Location • Entities are associated with location • Events: happenedIn relation • Groups: isLocatedIn relation • Artifacts: isLocatedIn relation

  18. Facts and Location • The relation occursIn holds between a fact and a geo-entity. Ex: The fact #1: LeonardCohenwasBornOnDate 1934 its location can be written as #1 occursIn Montreal

  19. Yago2: Textual Dimension • Contains contextual information • hasWikipediaAnchorText links an entity to a string that occurs as anchor text in the entity's article. • hasWikipediaCategory links an entity to the name of a category in which Wikipedia places the article. • hasCitationTitle links an entity to a title of a reference on the Wikipedia page.

  20. Demo • 6 tuple representation • SPOTLX -> joining in the occurrence spans of facts, locations of facts and context of involved entities

  21. Critique • Lack of examples • No results to prove that Yago2 is accurate • About 100 properties cannot cover everything like notBornOnDate

  22. Conclusion • Methodology for enriching large knowledge bases of entity-relationship-oriented facts along dimensions of time and space • YAGO2: 80 million facts • Valuable gazetteer for geographical, temporal and semantic data

  23. Questions ???

More Related