300 likes | 644 Views
YAGO-NAGA Project. Presented By: Mohammad Dwaikat To: Dr. Yuliya Lierler CSCI 8986 – Fall 2012. Agenda. What is YAGO-NAGA? Why YAGO-NAGA? How YAGO-NAGA Works? Demonstration YAGO-NAGA Sub-Projects. Agenda. What is YAGO-NAGA? Why YAGO-NAGA? How YAGO-NAGA Works? Demonstration
E N D
YAGO-NAGA Project Presented By: Mohammad Dwaikat To: Dr. YuliyaLierler CSCI 8986 – Fall 2012
Agenda • What is YAGO-NAGA? • Why YAGO-NAGA? • How YAGO-NAGA Works? • Demonstration • YAGO-NAGA Sub-Projects
Agenda • What is YAGO-NAGA? • Why YAGO-NAGA? • How YAGO-NAGA Works? • Demonstration • YAGO-NAGA Sub-Projects
What is YAGO-NAGA? • Harvesting, Searching, and Ranking Knowledge from the Web. • Building a conveniently searchable, large-scale, highly accurate knowledge base of common facts in a machine-processable representation. • Harvested knowledge about millions of entities and facts about their relationships, from Wikipedia and WordNet with careful integration of these two sources.
What is YAGO-NAGA? • Its vision is a confluence of Semantic Web (Ontologies), Social Web (Web 2.0), and Statistical Web (Information Extraction) assets towards a comprehensive repository of human knowledge.
YAGO • Yet Another Great Ontology (YAGO) Knowledge base. • It is a huge semantic knowledge base, derived from Wikipedia, WordNet, and GeoNames. • It knows almost 10 million entities (e.g. persons, organizations, cities), and 120 million facts about these entities. • It has a manually confirmed accuracy of 95%. • YAGO is an ontology that is anchored in time and space. • It attaches a temporal dimension and a spacial dimension to many of its facts and entities.
YAGO • It contains all the entities and ontological facts extracted from Wikipedia (from 2010-08-17), with categories mapped to the WordNet class hierarchy. • It also contains multi-lingual data from the Universal WordNet (UWN).
YAGO • It contains all the entities and facts from GeoNames - (from a dump of August 2010). • It also contains textual and structural data from Wikipedia. • All links+anchor texts between the YAGO entities. • All Wikipedia category names. • The titles of references.
YAGO • It is particularly suited for disambiguation purposes, as it contains a large number of names for entities. It also knows the gender of people. • YAGO is the resulting knowledge base, the facts are represented as RDF triples (Resource Description Framework). • Methods and prototype systems have been developed for querying, ranking, and exploring knowledge.
NAGA • Not Another Google Answer (NAGA) is a new semantic search engine which provides ranked answers to queries based on statistical models. • It can operate on knowledge bases that are organized as graphs with labeled nodes and edges, so called relationship graphs. • As of now, NAGA uses a projection of YAGO as its knowledge base. • The underlying query language supports keyword search for the casual user as well as graph-based queries with regular expressions for the expert user.
Agenda • What is YAGO-NAGA? • Why YAGO-NAGA? • How YAGO-NAGA Works? • Demonstration • YAGO-NAGA Sub-Projects
Consider These Questions • Which German Nobel laureate survived both world wars and outlived all four of his children? • The answer is Max Planck. • Which politicians are also accomplished scientists? • The German chancellor Angela Merkel and Benjamin Franklin. • How are Max Planck, Angela Merkel, Jim Gray, and the Dalai Lama related? • All four have doctoral degrees from German universities.
Why YAGO-NAGA? • Three major research: • Semantic-Web-style knowledge repositories. • Such as SUMO, OpenCyc, and WordNet. • Large-scale information extraction. • Social tagging and Web 2.0 communities that constitute the social Web. • Wikipedia is another example of the Social Web paradigm. • The challenge is how to extract the important facts from the Web and organize them into an explicit knowledge base that captures entities and semantic relationships among them.
Agenda • What is YAGO-NAGA? • Why YAGO-NAGA? • How YAGO-NAGA Works? • Demonstration • YAGO-NAGA Sub-Projects
How YAGO-NAGA Works? • YAGO adopts concepts from the standardized SPARQL Protocol and RDF Query Language for RDF data but extends them through more expressive pattern matching and ranking. • The prototype system that implements these features is NAGA.
A big US city with two airports, one named after a World War II hero, and one named after a World War II battle field?
Structured Knowledge Queries • A big US city with two airports, one named after a World War II hero, and one named after a World War II battle field? Select Distinct ?c Where { ?c type City . ?c locatedIn USA . ?a1 type Airport . ?a2 type Airport . ?a1 locatedIn ?c . ?a2 locatedIn ?c . ?a1 namedAfter ?p . ?p type WarHero . ?a2 namedAfter ?b . ?b type BattleField . }
Web sources YAGO Gatherer YAGO Gatherer YAGO Scrutinizer YAGO Gatherer Hypotheses YAGO knows all entities focus on facts Growing the Knowledge Base + Word Net Wikipedia YAGO Core Extractors YAGO Core Checker YAGO Core G r o w i n g
YAGO Knowledge Base • Combine knowledge from WordNet & Wikipedia. • Additional Gazetteers (geonames.org).
Searching & Ranking RDF Graphs in NAGA Rankingbased on confidence, compactnessandrelevance Discoveryqueries: hasWon diedOn Nobel prize $a $x type bornIn Kiel $x scientist > hasSon diedOn $y $b Connectednessqueries: type * German novelist Thomas Mann Goethe Querieswithregularexpressions: hasFirstName | hasLastName type Ling $x scientist (coAuthor | advisor)* worksFor locatedIn* $y Zhejiang Beng Chin Ooi
Agenda • What is YAGO-NAGA? • Why YAGO-NAGA? • How YAGO-NAGA Works? • Demonstration • YAGO-NAGA Sub-Projects
YAGO Server: UI & API YAGO-UI • Interactive online demo • RDF with time, space & provenance annotations • SPARQL + keywords YAGO-API Two basic WebServices: • processQuery (String query) • getYagoEntitiesByNames (String[] names) www.mpi-inf.mpg.de/yago-naga/demo.html
YAGO • Browse through the YAGO knowledge base. • https://d5gate.ag5.mpi-sb.mpg.de/webyagospotlx/Browser • Ask queries on YAGO using SPOTLX patterns. View the results on a map and timeline. • https://d5gate.ag5.mpi-sb.mpg.de/webyagospotlx/WebInterface
Agenda • What is YAGO-NAGA? • Why YAGO-NAGA? • How YAGO-NAGA Works? • Demonstration • YAGO-NAGA Sub-Projects
YAGO-NAGA Sub-Projects • More than 13 sub-projects of YAGO-NAGA. • AIDA: is a method, implemented in an online tool, for disambiguating mentions of named entities that occur in natural-language text or Web tables. • https://d5gate.ag5.mpi-sb.mpg.de/webaida/
Names, Surface Patterns & Paraphrases NN VBD VBN IN NNP/LOC Which chemist was born in London? • (I) Named entity disambiguation • chemist wordnet_chemist, wordnet_pharmacist • born Bertran_de_Born, Born_Identity_(Movie), Born_(Album) • London London_UK, London_Arkansas, Antonio_London • (II) Mapping surface patterns onto semantic relations • <person>was_born_in<location> bornIn(<person>, <location>) • <person>was_born_in<date> bornOn(<person>, <date>) • (III) Paraphrases of questions <person>[was] born in<location> <location>-born <person> bornIn(<person>,<location>)
References • YAGO-NAGA Project: • http://www.mpi-inf.mpg.de/yago-naga/ • YAGO: • http://yago-knowledge.org • NAGA: • http://www.mpi-inf.mpg.de/yago-naga/naga/demo.html