280 likes | 635 Views
YAGO. Reporter: Qi Liu. What is YAGO?. A semantic web A knowledge base A combination of WordNet and wikipedia. Semantic web. Advocated by W3C( World Wide Web Consortium ) Aimed at reconstructing the WWW A standard framework: RDF( Resource Description Framework ). What is YAGO?.
E N D
YAGO Reporter: Qi Liu
What is YAGO? • A semantic web • A knowledge base • A combination of WordNet and wikipedia
Semantic web • Advocated by W3C(World Wide Web Consortium) • Aimed at reconstructing the WWW • A standard framework: RDF(Resource Description Framework)
What is YAGO? • A semantic web • A knowledge base • A combination of WordNet and wikipedia
Knowledge base • To be: • A special database for knowledge management • To do: • Provides a means for collecting, organising, searching and utilising information • Three types: • Machine-readable knowledge bases(DBpedia) • Human-readable konwledge bases(Wikipedia) • Knowledge base analysis and design
What is YAGO? • A semantic web • A knowledge base • A combination of WordNet and wikipedia
WordNet • To be: • A lexical database for English since 1985 • To do: • Groups words into synsets • Provides short, general definitions • Records the semantic relations between these synsets • 25 basic noun groups & 15 verb groups
Key Concepts • Ontology vs Taxonomy • Lexicon:the bridge between a language and the knowledge expressed in that language • Syntactic (there vs their) • Semantic (sight vs site) • Pragmatic (infer vs imply)
Semantics of YAGO • Five relations: • Domain • Range • subRelationof • Type • subClassOf • Entities: • Domain • Relation • Range • Literal • ......
Reasoning rules • correctness and completeness
The YAGO system • Knowledge extraction • YAGO storage • Enriching YAGO
Knowledge extraction • TYPE relation • SUBCLASSOF relation • MEANS relation • Other relations • Meta-relations
TYPE relation extraction • The Wikipedia Category System • Types: conceptual, administrative, relational, thematic • Identifying Conceptual Categories • Conceptual TYPE • Adm and relational ones: excluded by hand • Employ a shallow linguistic parsing(Noun Group Parser) of the left two categories • E.g. Naturalized citizens of United States • domain and range extracted at the same time
SUBCLASSOF relation extraction • Wikipedia categories • DAG(directed acyclic graph) • Reflect merely the thematic structure • Use only the leaf categories of Wikipedia • Integrating WordNet Synsets • Match or prefer WordNet • Establishing subClassOf • American people in Japan • Exceptions • Correct manually
Means relation extraction • Exploiting WordNet Synsets • A synset{urban center,metropolis, city} • Attach a class for the synset ‘city’ • Exploiting Wikipedia Redirects • Search “Einstein, Albert”, redirected to “Albert, Einstein” • Parsing Person Names • givenNameOf subRelationOf means • familyNameOf subRelationOf means
Other relations extraction • BornInYear & DiedInYear • EstablisedIn & LocatedIn • WrittenInYear • PolitionOf • HasWonPrize • Filtering the Results
Meta-relationsextraction • Descriptions • Individual DESCRIBES URL • Witness • Fact FoundIn URL(of its witness page) • ExtractedBy • Context • Linkages btw A&B: A Context B
Knowledge extraction • TYPE relation • SUBCLASSOF relation • MEANS relation • Other relations • Meta-relations
The YAGO system • Knowledge extraction • YAGO storage • Enriching YAGO
YAGO storage • Model independent of storage • Storage: • Text files, XML, database tables, RDF
Enriching YAGO • Add the fact(x,r,y) • Map x,y to existing entities(word sense disambiguation) • If mapping failed, add new entity. • Map r to YAGO ontology • If mapping successed, add a FoundIn relation • If mapping failed, add a new fact!
Summary on YAGO1 • 1M entities & 5M facts • Accuracy around 95%
YAGO2: In Time, Space and Many Languages • YAGO: about 100 manually defined relations • Build YAGO2 architecture based on such rules: • Factual rules • E.g. Exceptions,definition of all relations, domains, ranges and classes • Implication rules • Inferring rules from the facts in the database • Replacement rules • Normalize numbers, tags and other formats • Extraction rules • Extracting facts from a given source text
Temporal Dimension • People wasBornOnDate & diedOnDate • Groups wasCreatedOnDate&wasDestroyedOnDate • Artifacts(buildings, songs,cities) [same as above] • Events startedOnDate & endedOnDate =>startExistingOnDate&endExistingOnDate • Facts • Entities in a fact =>subjectStartRelation&objectStartRelation
GEO-SPATIAL Dimension • All physical objects have a location in space! • Define it with geographical coordinates, i.e. Latitude and longtitude =>yagoGeoCoordinates, =>hasGeoCoordinates • Two sources: • Wikipedia • GeoNames • locatedIn & hasGeoCoordinates & <location,TYPE,class>
Textual Dimension • hasWikipediaAnchorText • hasWikipediaCategory • hasCitationTitle • subClassOf hasContext Integrating UWN to including 200 languages