230 likes | 403 Views
YAGO2: Exploring and Querying World Knowledge in Time, Space, Context, and Many Languages. Johannes Hoffart,Fabian M. Suchanek,Klaus Berberich,Edwin Lewis- Kelham,Gerard de Melo,Gerhard Weikum. Presented by: Deepika Sethi. Introduction Extraction Architecture
E N D
YAGO2: Exploring and Querying World Knowledge in Time, Space, Context, and Many Languages Johannes Hoffart,Fabian M. Suchanek,KlausBerberich,Edwin Lewis-Kelham,Gerard de Melo,GerhardWeikum Presented by: DeepikaSethi
Introduction Extraction Architecture Temporal Dimension Geo-Spatial Dimension Textual Dimension Demo Conclusion
Introduction • Extension of Yago • Build from Wikipedia, WordNet and GeoNames • Gathers and integrates temporal, spatial and semantic information • 10 million entities and events • 80 million facts
Extraction Architecture • Entity : Each article in Wikipedia Ex : Leonard Cohen -> LeonardCohen • Categories : Describe type information Ex: Leonard Cohen -> Canadian poets • Yago 2 links it to taxonomy of WordNet Ex: Canadian poet -> poet
Manually defined relations like wasBornOnDate , locatedIn etc • Categories & infoboxes deliver instances to these relations • Manually defined patterns mapping categories & infobox attributes to fact templates Ex: Leonard Cohen has attribute born=Montreal => wasBornIn(LeonardCohen,Montreal)
Facts : Represented as triples : subject, predicate, object (SPO) • Reification : Ex : Fact with id #42 extracted from Wikipedia In Yago2: wasFoundIn(#42, Wikipedia)
Factual Rules • Factual rules : Definitions of all relations, domains, ranges, classes that makeup YAGO2 hierarchy of literal types • Add three new classes: yagoLegalActor, yagoLegalActorGeo, yagoGeoEntity • Helps in extraction of classes Ex: “AmericaRockNRollSingers” class in Yago2 linked to WordNet class “singer”
Implication Rules • Deduce new knowledge from existing knowledge • Expressed as a fact : triple • Subject-> Premise of implication • Object -> Conclusion • Ex: $1 $2 $3; $2 subpropertyOf $4;" implies "$1 $4 $3
Replacement rules • Part of source text matches a specific regular expression , should be replaced by a certain string • Cleaning HTML tags, normalizing numbers, eliminating administrative Wikipedia categories & articles , do not want to process • "\{\{USA\}\}" replace "[[United States]]"
Extraction Rules • If a part of the source text matches a specified regular expression, a sequence of facts shall be generated • Apply to patterns found in Wikipedia infoboxes, categories, titles, headings, links, references • "\[\[Category:(.+) births\]\]" pattern "$0 wasBornOnDate Date($1)"
YAGO2: A Temporal Dimension • Derive the temporal properties of objects from the data we have in the knowledge base • Datatype: yagoDate Standard format: YYYY-MM-DD or YYYY-##-## • Four major entity types:People, Groups, Artifacts , Events
Entity-Time relations YagoDate InstanceOf endsExistingOnDate startsExistingOnDate subpropertyOf diedOnDate wasBornOnDate happenedOnDate Fact BobDylanwasBornOnDate 1941-05-24, an implication rule creates the second fact BobDylanstartsExistingOnDate 1941-05-24.
Facts with an extracted time • Occurrence time : time span when the fact occurred • Two relations: occursSince, occursUntil • BarackObamawasInauguratedAsPresidentOfTheUnitedStates with fact id #2, is written as #2 occursOnDate 2009-01-20
Yago 2: A GeoSpatial Dimension • Entities: cities, countries, mountains,rivers • Class : yagoGeoEntity land location body of water track way real property excavation geological formation structure facility
Datatype: yagoGeoCoordinates : (latitude, longitude) • Instance of yagoGeoCoordinates hasGeoCoordinates Geographical coordinates
Matching locations: Two sources: Wikipedia & GeoNames • Matching classes: Two sources: WordNet & GeoNames An automated matching algorithm based on shallow noun phrase parsing of original YAGO to detect and match candidate class names
Entities and Location • Entities are associated with location • Events: happenedIn relation • Groups: isLocatedIn relation • Artifacts: isLocatedIn relation
Facts and Location • The relation occursIn holds between a fact and a geo-entity. Ex: The fact #1: LeonardCohenwasBornOnDate 1934 its location can be written as #1 occursIn Montreal
Yago2: Textual Dimension • Contains contextual information • hasWikipediaAnchorText links an entity to a string that occurs as anchor text in the entity's article. • hasWikipediaCategory links an entity to the name of a category in which Wikipedia places the article. • hasCitationTitle links an entity to a title of a reference on the Wikipedia page.
Demo • 6 tuple representation • SPOTLX -> joining in the occurrence spans of facts, locations of facts and context of involved entities
Critique • Lack of examples • No results to prove that Yago2 is accurate • About 100 properties cannot cover everything like notBornOnDate
Conclusion • Methodology for enriching large knowledge bases of entity-relationship-oriented facts along dimensions of time and space • YAGO2: 80 million facts • Valuable gazetteer for geographical, temporal and semantic data