520 likes | 650 Views
2012 INTERNATIONAL ASIAN SUMMER SCHOOL IN LINKED DATA IASLOD 2012, August 13-17 , 2012, KAIST, Daejeon , Korea. Identity and schema for Linked Data. Hideaki Takeda National Institute of Informatics takeda @ nii.ac.jp. How to put the data into computer?. How to describe the data?
E N D
2012 INTERNATIONALASIAN SUMMER SCHOOL IN LINKED DATA IASLOD 2012, August 13-17, 2012, KAIST, Daejeon, Korea Identity and schema for Linked Data Hideaki Takeda National Institute of Informatics takeda@nii.ac.jp
How to put the data into computer? • How to describe the data? • The way to describe individual data • Schema/Class/Concept • The way to describe relationship among schema/class/concept • Ontology/Taxonomy/Thesaurus • How to refer the data? • The way to identify individual data • Identifier • Relationship among identifiers
Architecture for the Semantic Web • The world of classes (Ontologies) • The world of instances (Linked Data) Tim Berners-Leehttp://www.w3.org/2002/Talks/09-lcs-sweb-tbl/
Layers of Semantic Web • Ontology • Descriptions on classes • RDFS, OWL • Challenges for ontology building • Ontology building is difficult by nature • Consistency, comprehensiveness, logicality • Alignment of ontologies is more difficult Descriptions on classes Ontology インスタンスに関する記述 Linked Data Tim Berners-Leehttp://www.w3.org/2002/Talks/09-lcs-sweb-tbl/
Layers of Semantic Web • Linked Data • Descriptions on instances (individuals) • RDF + (RDFS, OWL) • Pros for Linked Data • Easy to write (mainly fact description) • Easy to link (fact to fact link) • Cons for LinkedData • Difficult to describe complex structures • Still need for class description (-> ontology) Descriptions on classes Ontology Description on instances Linked Data Tim Berners-Leehttp://www.w3.org/2002/Talks/09-lcs-sweb-tbl/
Importance of Identifiers for Entities • Everything should be identifiable! • Human can identify things with vague identifiers or even without identifiers with help from the context around things • On the web, the context is usually not available and the computer can seldom understand the context even if it exists • So we need identifiers for all things
Identification System • Identification is one of the primary functions for human information processing • Naming: e.g., names for people, pets, and some daily things • OK if the number of things is not so big • Systematic Identification • e.g., phone number, post-code, passport number, product number, ISBN • If the number of things is big enough • Requirements for Systematic Identification • Identifier is stable and sustainable • Uniqueness is guaranteed • Identifier publisher is reliable and sustainable
Identification system for Web • Not so different from conventional identification systems • Difference • Cross-system use • Truly digitized • Requirements for Systematic Identification for web • Identifier is stable and sustainable (even after an entity may disappear) • Uniqueness is guaranteed over all systems • Description on should be associated to identifiers • since entities may not accessible • Identifier publisher is reliable and sustainable
Solutions for the Requirements by LOD • Requirements for Systematic Identification for web • 1. Identifier is stable and sustainable (even after an entity may disappear) • (up to each identifier publisher) • 2. Uniqueness is guaranteed over all systems • URI (not URN) • 3. Description on should be associated to identifiers • Dereferenceable URI • If URI is accessed, a description associated to it should be returned • 4. Identifier publisher is reliable and sustainable
Some examplesISBN(International Standard Book Number) • Abstract • a unique numeric commercial book identifier • 13 digits • Prefix: 978 or 979 (for compatibility with EAN code) • Group(language-sharing country group): 1 to 5 digits • Publisher code: • Item number: • Check num: 1 digit • Management: two layers • National ISBN Agency – Publisher • Requirement Satisfaction • 1. (Stable ID) Maybe (versioning often matters, and sometimes publisher may re-use ISBN) • 2. (Unique ID) Uniqueness is guaranteed but not URI • 3. (Dereferenceable) No mechanisms (amazon does instead!) • 4. (Reliable publisher) Yes
Some examplesDOI (Digital Object Identifier) • Abstract • An identifier for scientific digital objects (mostly scientific articles) • An unfixed string: “prefix/suffix” • Prefix: assigned for publishers • Suffix: assigned for each object • Management: three layers • IDF (International DOI Foundation) – Registration Agency – Publisher • Requirement Satisfaction • 1. (Stable ID) Yes (not re-usable) • 2. (Unique ID)Uniqueness is guaranteed and URI accessible (http://dx.doi.org/”DOI”) • 3. (Dereferenaceable)Mapping to object pages but no RDF • 4. (Reliable publisher) Maybe
Some examplesDbpedia (as Identifier) • Abstract • A wikipedia page • Name of wikipedia page • Maintained manually • Disambiguation page • Redirect page • Requirement Satisfaction • 1. (Stable ID) maybe (sometimes disappear, sometimes change names, sometime change contents) • 2. (Unique ID) Uniqueness is mostly guaranteed and URI accessible • 3. (Dereferenceable) RDF • 4. (Reliable publisher) Maybe
Identification of relationship between identifiers • Co-existence of multiple identification systems on a field • Difference of coverage • Difference of Viewpoint • An entity can have multiple identifiers • Need for mapping between identifiers in different identification systems • Method: Use special properties • owl:sameAs, (rdfs:seeAlso, skos:exactMatch) • http://sameas.org • Some problems • Logical inconsistency with owl:sameAs • Maintainance
Summary for ID • Identification is the crucial part in LOD • Data availability • Data inconsistency • Data interoperability • Establishment of a good identification system leads a reliable and sustainable LOD.
Structuring Information • A wide range of structuring information • Keywords, tags • A freely chosen word or phrase just indicating some features • Controlled vocabulary • Mapping to the fixed set of words or phrases • e.g., the list of countries, the name authorities • Classification • System for classifying entities. Often hierarchical. Class may not carry meaning. • Taxonomy • Hierarchical term system for classification. Upper/lower relation usually means general/specific relation • e.g., the subject headings of LC • Thesaurus • System for semantics. More different types of relations: (hypersym, hyposym), synonym, antonym, homonym, holonym, meronym • Ontology • System of concepts. Concepts rather than words. More various relations, the definitions of concepts
Examples in Library Science • Many systems in the library community • Classification • Universal Decimal Classification (UDC) • Controlled Vocabulary • the authority files for person names, organizations, location names • Library of Congress : 8 Million records, MADS &SKOS • British Library: 2.6 million records, foaf & BIO (A vocabulary for biographical information) • National Diet Library (Japan): 1 million records, foaf • Deutsche Nationalbibliothek (DNB, Germany): 1.8 & 1.3 million records (names & organization), • Virtual International Authority File (VIAF): 4 million records • Taxonomy • Subject Heading: LC, NDL, • Library of Congress: MADS &SKOS • British Library: • National Diet Library (Japan): 0.1 million records, SKOS • Deutsche Nationalbibliothek (DNB, Germany): 0.16 million records
UDC as Linked Data <skos:Conceptrdf:about="http://udcdata.info/025553"> <skos:inSchemerdf:resource="http://udcdata.info/udc-schema"/> <skos:broaderrdf:resource="http://udcdata.info/025461"/> <skos:notationrdf:datatype="http://udcdata.info/UDCnotation">510.6</skos:notation> <skos:prefLabelxml:lang="en">Mathematical logic</skos:prefLabel> <skos:prefLabelxml:lang="ja">記号論理学</skos:prefLabel> <skos:relatedrdf:resource="http://udcdata.info/000016"/> </skos:Concept> 69,000 records 40 Languages http://udcdata.info/
http://id.loc.gov/authorities/names/n79084664.html <http://id.loc.gov/authorities/names/n79084664> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.loc.gov/mads/rdf/v1#PersonalName> . <http://id.loc.gov/authorities/names/n79084664> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.loc.gov/mads/rdf/v1#Authority> . <http://id.loc.gov/authorities/names/n79084664> <http://www.loc.gov/mads/rdf/v1#authoritativeLabel> "Natsume, Sōseki, 1867-1916"@en . <http://id.loc.gov/authorities/names/n79084664> <http://www.loc.gov/mads/rdf/v1#elementList> _:bnode7authoritiesnamesn79084664 . _:bnode7authoritiesnamesn79084664 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> _:bnode8authoritiesnamesn79084664 . _:bnode7authoritiesnamesn79084664 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> _:bnode010 . _:bnode8authoritiesnamesn79084664 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.loc.gov/mads/rdf/v1#FullNameElement> . _:bnode8authoritiesnamesn79084664 <http://www.loc.gov/mads/rdf/v1#elementValue> "Natsume, Sōseki,"@en . _:bnode010 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> _:bnode11authoritiesnamesn79084664 . _:bnode010 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> . _:bnode11authoritiesnamesn79084664 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.loc.gov/mads/rdf/v1#DateNameElement> . _:bnode11authoritiesnamesn79084664 <http://www.loc.gov/mads/rdf/v1#elementValue> "1867-1916"@en . <http://id.loc.gov/authorities/names/n79084664> <http://www.loc.gov/mads/rdf/v1#classification> "PL812.A8" . <http://id.loc.gov/authorities/names/n79084664> <http://www.loc.gov/mads/rdf/v1#hasExactExternalAuthority> <http://viaf.org/viaf/sourceID/LC%7Cn+79084664#skos:Concept> . <http://id.loc.gov/authorities/names/n79084664> <http://www.loc.gov/mads/rdf/v1#isMemberOfMADSCollection> <http://id.loc.gov/authorities/names/collection_NamesAuthorizedHeadings> . <http://id.loc.gov/authorities/names/n79084664> <http://www.loc.gov/mads/rdf/v1#isMemberOfMADSScheme> <http://id.loc.gov/authorities/names> . <http://id.loc.gov/authorities/names/n79084664> <http://www.loc.gov/mads/rdf/v1#isMemberOfMADSCollection> <http://id.loc.gov/authorities/names/collection_LCNAF> .
Some examplesScientific Names for Species and Taxa • Abstract • Names for biological species and other taxa (kingdom, divison, class, order, family, tribe, genus) • A string • Binomial name for species • Academic societies maintain taxon names individually • E.g., Papiloxuthus (Asian Swallowtail, ナミアゲハ,호랑나비) • Requirement Satisfaction • 1. Mostly yes (sometimes disappear, change names, change contents) • 2. Uniqueness is generally guaranteed but precise speaking some ambiguity because of change. • 3. No. Many systems exists but none covers all species • 4. Maybe
Ontology An ontology is an explicit specification of a conceptualization [Gruber] • An ontology is an explicit specification of a conceptualization. The term is borrowed from philosophy, where an Ontology is a systematic account of Existence. For AI systems, what "exists" is that which can be represented. When the knowledge of a domain is represented in a declarative formalism, the set of objects that can be represented is called the universe of discourse. This set of objects, and the describable relationships among them, are reflected in the representational vocabulary with which a knowledge-based program represents knowledge. Thus, in the context of AI, we can describe the ontology of a program by defining a set of representational terms. In such an ontology, definitions associate the names of entities in the universe of discourse (e.g., classes, relations, functions, or other objects) with human-readable text describing what the names mean, and formal axioms that constrain the interpretation and well-formed use of these terms. Formally, an ontology is the statement of a logical theory.
object on_desk(A) on(A, B) put(A,B) box red box blue box yellow box object on(A/box, B/object) put(A/box,B/object) object on_desk(A) on(A, B) put(A,B) box desk box box color:{red, blue, yellow} box color:{red, blue, yellow} There are many possible ways to conceptualize the target world Trade off between generality and efficiency Conceptualization
Types of Ontologies • Upper (top-level) ontology vs. Domain ontology • Upper Ontology: A common ontology throughout all domains • Domain Ontology: An ontology which is meaningful in a specific domain • Object ontology vs. Task ontology • Object Ontology: An ontology on “things” and “events” • Task Ontology: An ontology on “doing” • Heavy-weight ontology vs. light-weight ontology • Heavy-weight ontology: fully described ontology including concept definitions and relations, in particular in a logical way • Light-weight ontology: partially described ontology including typically only is-a relations
Top-level ontology • Ontology which covers all of the world! • Very…. Difficult • e.g., how does a thing exist? • A thing is four dimensional existence? • A thing exists three-dimensionally over time? • Common requirements • A small number of concepts can cover the world • Concepts can be used in lower ontologies • Concept should be general and abstract
Top-level ontology • Three approaches • Formal approach • Logical formalization • Fully Abstract • Pros: clean • Cons: hardly understandable • e.g., Sowa’s top-level ontology, DOLCE • Linguistic approach • Use and extension of linguistic concepts • Partially abstract and partially general • Pros: understandable • Cons: limitation to the linguistic world • e.g., Penman Upper Model, WordNet • Empirical Approach • Use and extension of everyday concepts • Mostly general • Pros: understandable and applicable to all the world • Cons: lack of solid foundation • e.g. SUMO, Cyc, EDR
Empirical top-level ontology • SUMO(Suggested Upper Merged Ontology) • Collection and organization of concepts used frequently • Simple relationship between concepts
Formal Ontology: DOLCE • DOLCE(a Descriptive Ontology for Linguistic and Cognitive Engineering) • Intended to a reference system for top-level ontology • Logical definition • Particular (DOLCE) vs. Universal • Particular: ontology about things, phenomena, quality… • Universal: ontology for describing particular like categories and attributes
Formal Ontology: DOLCE • Concepts • Endurant / Perdurant / Quality / Abstract • Endurant: • “Things” • An existence over time • May change its attribute • Perdurant • “process” • No change over time • May switch a part to the other • Relations • Parthood (abstract or perdurant) • Temporally Parthood (endurant) • Constitution (endurant or perdurant) • Participation between perdurant and endurant
Linguistic top-level ontology • WordNet • A lexical reference system • “Link-based electronic dictionary” • Concepts • synset • Noun 79,689 • Verb 13,508 • Relations • synonym • hypernym/hyponym (is-a) • holonym/meronym (a-part-of) http://www.cogsci.princeton.edu/cgi-bin/webwn
Linguistic top-level ontology • WordNet • Top-level • { entity, physical thing (that which is perceived or known or inferred to have its own physical existence (living or nonliving)) } • { psychological_feature, (a feature of the mental life of a living organism) } • { abstraction, (a general concept formed by extracting common features from specific examples) } • { state, (the way something is with respect to its main attributes; "the current state of knowledge"; "his state of health"; "in a weak financial state") } • { event, (something that happens at a given place and time) } • { act, human_action, human_activity, (something that people do or cause to happen) } • { group, grouping, (any number of entities (members) considered as a unit) } • { possession, (anything owned or possessed) } • { phenomenon, (any state or process known through the senses rather than by intuition or reasoning) }
Summary for structuring information • Keywords, tags/Controlled vocabulary /Classification/Taxonomy /Thesaurus/Ontology • The difference is not clear, not important • The trend is to go more structured ones • The same requirements to Identification systems
Summary • Requirements for Successful Structuring Systems • 1. Entity is stable and sustainable • 2. Uniqueness is guaranteed over all systems • 3. Description on should be associated to entity • 4. System publisher is reliable and sustainable • Learn from success in the library community LOD Tech. can help
Schema/Vocabulary for LOD • Class/Concept description • Axiom of a concept in ontology • Database schema for a table in Relational database • Object definition in Object-Oriented Programming/DB • Class description in Semantic Web • RDFS/OWL description for a class • RDFS: Simple class system • OWL: Description Logic-based • Class description in Linked Data • Mostly RDFS-based (exception: owl:sameAs) • Simple Structure (mostly property-value pair)
Schema/Vocabulary for LOD • The importance of sharing schema • Interoperability • Generic applications • Some famous and frequently used shemata • Dublin Core • FOAF (Friend-Of-A-Friend) • SKOS (Simple Knowledge Organization System)
Usage of Common Vocabularies LDOW2011 Presentation, Christian Bizer (Freie Universität Berlin), 2011
(Simple) Dublin Core • Started from the library community • Now maintained by DCMI (Dublin Core Metadata Initiative) • (Simple) Dublin Core • Just 15 elements • Simple is best • No range restriction • http://purl.org/dc/elements/1.1/ • 15 elements • Title • Creator • Subject • Description • Publisher • Contributor • Date • Type • Format • Identifier • Source • Language • Relation • Coverage • Rights
dc terms • Qualified Dublin Core • Domain & Range • More precise terms • Extension of simple dc
http://dublincore.org/documents/dcmi-terms/ http://www.kanzaki.com/docs/sw/dc-domain-range.html
The Friend of a Friend (FOAF) • Metadata describe persons and their relationship • Voluntary project Classes: | Agent | Document | Group | Image | LabelProperty | OnlineAccount | OnlineChatAccount | OnlineEcommerceAccount | OnlineGamingAccount | Organization | Person | PersonalProfileDocument | Project | Properties: | account | accountName | accountServiceHomepage | age | aimChatID | based_near | birthday | currentProject | depiction | depicts | dnaChecksum | familyName | family_name | firstName | focus | fundedBy | geekcode | gender | givenName | givenname | holdsAccount | homepage | icqChatID | img | interest | isPrimaryTopicOf | jabberID | knows | lastName | logo | made | maker | mbox | mbox_sha1sum | member | membershipClass | msnChatID | myersBriggs | name | nick | openid | page | pastProject | phone | plan | primaryTopic | publications | schoolHomepage | sha1 | skypeID | status | surname | theme | thumbnail | tipjar | title | topic | topic_interest | weblog | workInfoHomepage | workplaceHomepage | yahooChatID | @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . <#JW> a foaf:Person ; foaf:name "Jimmy Wales" ; foaf:mbox <mailto:jwales@bomis.com> ; foaf:homepage <http://www.jimmywales.com/> ; foaf:nick "Jimbo" ; foaf:depiction <http://www.jimmywales.com/aus_img_small.jpg> ; foaf:interest <http://www.wikimedia.org> ; foaf:knows [ a foaf:Person ; foaf:name "Angela Beesley" ] . <http://www.wikimedia.org> rdfs:label "Wikipedia" .
SKOS (Simple Knowledge Organization System) • Metadata for taxonomy • Hierarchical structure of concepts • Invented to represent taxonomy such as subject heading • =/= subclass relationship among classes • W3C Recommendation 18 August 2009
SKOS (Simple Knowledge Organization System) • SKOS Core (hierarchical concept structure) • skos:semanticRelation • skos:broaderTransitive • skos:narrowerTransitive • skos:broader • skos:narrower • skos:related • skos:preflabel • skos:altlabel • skos:hiddenlabel subPropertyOf
SKOS (Simple Knowledge Organization System) • SKOS Mapping • skos:mappingRelation • skos:closeMatch • skos:exactMatch • skos:broadMatch • skos:narrowMatch • skos:relatedMatch subPropertyOf
Linked Open Vocabulary (LOV) • A technical platform for search and quality assessment among the vocabularies ecosystem • Register schemata • Search schemata • http://labs.mondeca.com/dataset/lov/
More Info. • http://www.w3.org/2005/Incubator/lld/wiki/Vocabulary_and_Dataset