790 likes | 817 Views
Ontology-based approaches to providing a semantic infrastructure for Linked Data. Mark Bide Godfrey Rust Rightscom Limited. Agenda. Who are we? – a brief introduction What are the issues as we see them? “Joined-up semantics”: two use cases. Who are we?. Godfrey Rust
E N D
Ontology-based approaches to providing a semantic infrastructure for Linked Data Mark BideGodfrey Rust Rightscom Limited
Agenda Who are we? – a brief introduction What are the issues as we see them? “Joined-up semantics”: two use cases
Who are we? • Godfrey Rust • Director/Chief Data Architect Rightscom • 30 years data modelling/management in the content industry • Builder of the National Discography (a long time ago) • Mark Bide • Director/Senior Consultant Rightscom • Executive Director of EDItEUR • A publisher (a long time ago) • Rightscom • A specialist London-based media consultancy • Specialists in the management of content online • Particular expertise in issues of identity and metadata management
Linked Data:the issues as we see them Mark Bide
The “research challenges” of Linked Data From: Bizer, Heath & Berners Lee (2009) Linked Data - The Story So Far International Journal on Semantic Web and Information Systems (Special Issue) User interfaces – how to present data from heterogeneous sources to users Application architectures – scalability of “on the fly” link traversal Schema mapping and data fusion Link maintenance – unmaintained URIs Licensing – automating access and use management Trust, quality and relevance - representation of provenance and trustworthiness Privacy – integration of personal data from multiple sources
The “research challenges” of Linked Data From: Bizer, Heath & Berners Lee (2009) Linked Data - The Story So Far International Journal on Semantic Web and Information Systems (Special Issue) User interfaces – how to present data from heterogeneous sources to users Application architectures – scalability of “on the fly” link traversal Schema mapping and data fusion Link maintenance – unmaintained URIs Licensing – automating access and use management Trust, quality and relevance - representation of provenance and trustworthiness Privacy – integration of personal data from multiple sources
Schema mapping… • Data needs to be “integrated in a meaningful way before it is displayed to users” • Requires “mapping of terms from different vocabularies” • W3C recommendations…“define basic terminology…[but are] too coarse grained to properly transfer data between schemata” • Structural and semantic heterogeneity • Requirement: “languages for more fine-grained mappings” • Including capability to manage partial mappings to cover cases where data sources mix terminology
…and data fusion • “The process of integrating multiple data items representing the same real world object into a single consistent and clean representation” • “Main challenge”: the resolution of data conflicts • Different values for the same property • [although is this the “main challenge”?] • All of these issues are perhaps less acute for human mediated metadata driven discovery processes • But metadata isn’t only about human-mediated discovery
The challenge: data integration • “The ultimate goal of Linked Data is to be able to use the Web like a single global database.” [Bizer, Heath & Berners Lee] • Challenges of integration are the same whether looking at silos of information within an enterprise or more widely at silos of information between enterprises • The key problem lies in the data… • …not in the software (and even less in the hardware) • The “main challenge” in system integration is invariably consistency of the semantics of different systems • You would not attempt to create a single enterprise database by simple amalgamation
Linked Data: a new label for an old idea? • All databases are Linked Data • Linking primary to foreign keys • The <indecs> definition of metadata [2000] • “An item of metadata is a relationship which someone claims to exist between two referents” [Rust & Bide] • Semantic Web tools (RDF, OWL etc) provide a level of sophistication for (meta)data management which has been missing in relational models • Make the links explicit: “first class objects” • Overcome restrictions of fixed predefined objects/entities
But these tools take us only so far • RDF, OWL deal only with infrastructure and logical relations • …not meaning • The equivalent of a database software platform • Oracle, SQL • Nothing to say about the data that populates the database • Existing web namespaces (eg dc: foaf:) of some value • …but there are challenges
A simple example: dc:creator • Beethoven’s Fifth Symphony • dc:creator: Ludwig van Beethoven • dc:publisher: Bärenreiter • A recording of Beethoven’s Fifth Symphony • dc:creator: Ludwig van Beethoven (or dc:contributor?) • dc:creator: Herbert von Karajan (or dc:contributor?) • dc:creator: Berlin Philharmonic (or dc:contributor?) • dc:publisher: Deutsche Grammophon (or…what?) • dc:publisher: Bärenreiter (or…what?) • Crowd sourced solutions? • Gracenote demonstrates both the potential and the challenges…. • Ultimately meaning is always contextual not universal • Managing semantics is at the root of successful integration whether at enterprise or network level
The semantic challenge in general • Different names (and codes/languages) for the same value (your "Author" = my "Creator") • Different specializations ("Editor", "Contributing Editor", "Managing Editor", "Copy Editor", "Sub Editor", "Guest Editor", "Editor-in-chief", "Film Editor", "Magazine Editor", "Series Editor" etc) • Different ways of expressing the same concept ("Edit", "Editor", "Edited By", "Has Editor", "Edition", "Editing") • Approximate matches (dc:creator or dc:contributor?) • Different structures and levels of indirection (your "ComposerName=Beethoven" = my "Name [Link] of Composer [Link] of Work [Link] in Recording")
The semantic challenge…continued • How do you deal with semantic gaps? • “Your schema has nothing that even vaguely matches something in mine” • …before we start on issues of • Authority – who said that this was so? • Time – is this still true? • Place – where is this statement valid? • …etc…etc • These challenges are the commonplace challenges of one-to-one schema mapping in an enterprise…. • …and they don’t disappear with Linked Data – they simply get more severe because of scale
The Web of Data as a single database? • An “enterprise data model” for the web is not a real solution • Could never be imposed • Would never satisfy everyone’s requirement • Must allow for the use of many existing standards – and for many new ones • Many different metadata standards already in use even within communities: MARC (and its many variations); DC; FRBR; RDA…. • ….and the whole point of Linked Data is that it provides mechanisms to integrate (link) between different domains
Towards some answers? • Demonstrate how “joined-up semantics” may be achieved within Linked Data: first by mapping different vocabularies into a contextual ontology • vmf (Vocabulary Mapping Framework) • and using semantic web tools within an enterprise • coati
Joined-up semantics:Two use cases Godfrey Rust
? Intended meaning ? Data model structure OWL/RDFS Logical semantics RDF Triple syntax Semantic standards in Linked Data ‘Joined-up semantics’ “Joined-up semantics”: Identities can be linked without the meanings connecting properly. Joined-up semantics is when the meanings of concepts are well matched throughout the chain of links.
Missing semantics in Linked Data – data structure Metadata is much more complicated than a group of simple triples like this – it needs a richer structure to make good sense of it id:123 IsSomethingToDoWith id:456 In other systems we use “objects”, “tables” or “schemas” to give us structure – so how do you do find some common structure in a sea of linked triples? (beyond the logical semantics of RDF/OWL)
Missing semantics in Linked Data – intended meaning The key to meaning in linked data lies in the intended meanings of the relators and allowed values id:123 HasFormatFilm Meaning is granular and contextual. People will never all use the same metadata standards – in fact we are becoming more, not less, diverse in choice of schemes and vocabularies. Lack of structure + multiplicity of meaning = semantic challenge (in a single enterprise or in a community).
Towards joined-up semantics – two use cases • Use cases show: • (1) meanings mapped through an ontology (vmf) • (2) data transformed into a common semantic structure (coati) • Together they demonstrate: • A rich semantic model for linked data, with a small number of core elements (structure) and an ontology (meaning). • A way to automatically transform data from any existing schema or model (triples or not) in or out of this common triple structure, which can then be used for any purpose. • The same model (COA) is used in both. • Both are in development – coati being implemented with the first client, vmf successfully completed “proof of concept” in Dec 2009.
Use case 1: the Vocabulary Mapping Framework Background: In 2005 BL and EDItEUR supported the “RDA/ONIX Framework for Resource Categorization”, a joint initiative across those two library and publishing metadata standards. It was published in 2006 and successfully used as a basis for developing RDA categories. A more ambitious follow-up was envisaged. This emerged as the Vocabulary Mapping Framework (vmf) in 2009, backed by representatives of many major content metadata schemes. The VMF is contained in the VMF matrix, an RDF/OWL ontology.
Automatically compute the “best fit” mappings between any two pre-defined vocabularies. vocab 1 vocab 2 vocab 3 vocab 4 vmf VMF goal
vocab 1 vocab 2 vocab 3 vocab 4 vmf VMF to date Created the matrix and mapped initial vocabularies SPARQL queries to generate scheme to scheme mappings (successful proof of concept, Dec 2009). Next task: establish VMF as an ongoing resource (working with International DOI Foundation and others). VMF has Advisory Board including representatives from RDA, DC, ONIX, DDEX, FRBR, MARC, DOI.
Initial schemes, partially mapped RDA (libraries) ONIX (book/serials publishing) DDEX (recorded music) Dublin Core (web metadata) FRBR (libraries) LOM SCORM (education) MARC21 (libraries) DOI (any content) CIDOC CRM (museums and archives) MPEG21 RDD (digital rights) RDA ONIX Framework
Matrix stats Approximately: 10 schemes 53 vocabularies mapped in whole or part 1000+ concept families 20000+ unique terms 100,000+ RDF triples This is quite a large ontology – because VMF automatically generates most of its own terms.
vocab 1 vocab 2 vocab 3 vocab 4 vmf How does VMF work? Terms are mapped into an ontology (the VMF matrix) built up from “families” of concepts based on verbs. An ontology is a structured data dictionary, where carefully defined concepts are linked by logical relationships that allow meaning to pass from one to another within a computer system. An ontology is way of making terms behave themselves. The matrix can be queried to get the “best fit” match from one term or vocabulary to another. For example…
A Concept Family A Concept Family starts with a verb Context Create (or Creating Event)
A Concept Family Context Create (or Creating Event) Create (to Make something (as a human being) The definition of the verb provides the core meaning of the concept
A Concept Family Context Create (or Creating Event) Parent Make (to bring something into existence) Create (to Make something (as a human being)
A Concept Family Context Create (or Creating Event) Parent Make (to bring something into existence) Create (to Make something (as a human being) Children Conceive, Originate, Derive, Create Work, Create Perceivable Resource, Create with Tool, Create with Material, Direct, Contribute etc…
A Concept Family Context Create (or Creating Event) Parent Make (to bring something into existence) Create (to Make something (as a human being) Children Conceive, Originate, Derive, Create Work, Create Perceivable Resource, Create with Tool, Create with Material, Direct, Contribute etc…
A Concept Family Context Create (or Creating Event)
A Concept Family Context Create Agent Creator Resource Creation
A Concept Family Context Create Agent Creator Resource Creation Relator Creator_Creation
A Concept Family Context Create Agent Creator Resource Creation Relator Creator_Creation Creation_Creator
A Concept Family Context Create Agent Creator Resource Creation Relator Creator_Creation Creation_Creator Relator Create_Creation Creation_Create Relator Create_Creator Creator_Create
A Concept Family provides a complete set of terms that describe a type of Event or State (“Context”), always based on a verb. A Concept Family Context Create Agent Creator Resource Creation Relator Creator_Creation Creation_Creator Relator Create_Creation Creation_Create Relator Create_Creator Creator_Create Relator Creator_Creator Relator Creation_Creation All relationships (and so most meanings) are based on Events, so this is a good place to start. Every term in the VMF matrix is a member of a Concept Family.
A Concept Family Context Create Agent Creator Resource Creation Relator Creator_Creation Creation_Creator Relator Create_Creation Creation_Create Relator Create_Creator Creator_Create Relator Creator_Creator Relator Creation_Creation How do these terms relate to the vocabularies we are mapping?
onix:CodeList17 Createdby A Concept Family Context Create Agent Creator Resource Creation Relator Creator_Creation Creation_Creator Relator Create_Creation Creation_Create Relator Create_Creator Creator_Create Relator Creator_Creator Relator Creation_Creation
onix:CodeList17 Createdby A Concept Family marc21:Relationship Creator Context Create Agent Creator Resource Creation Relator Creator_Creation Creation_Creator Relator Create_Creation Creation_Create Relator Create_Creator Creator_Create Relator Creator_Creator Relator Creation_Creation
onix:CodeList17 Createdby A Concept Family marc21:Relationship Creator Context Create Agent Creator Resource Creation Relator Creator_Creation Creation_Creator Relator Create_Creation Creation_Create Relator Create_Creator Creator_Create Relator Creator_Creator Relator Creation_Creation lom:role_lifecycle author
onix:CodeList17 Createdby A Concept Family marc21:Relationship Creator Context Create Agent Creator Resource Creation Relator Creator_Creation Creation_Creator Relator Create_Creation Creation_Create Relator Create_Creator Creator_Create Relator Creator_Creator Relator Creation_Creation lom:role_lifecycle author Dc:dc15 Creator
onix:CodeList17 Createdby A Concept Family marc21:Relationship Creator Context Create Agent Creator Resource Creation Relator Creator_Creation Creation_Creator Relator Create_Creation Creation_Create Relator Create_Creator Creator_Create Relator Creator_Creator Relator Creation_Creation lom:role_lifecycle author Dc:dc15 Creator crm:property was created by
onix:CodeList17 Createdby A Concept Family marc21:Relationship Creator Context Create Agent Creator Resource Creation RelatorCreator_CreationCreation_Creator Relator Create_Creation Creation_Create Relator Create_Creator Creator_Create Relator Creator_Creator Relator Creation_Creation lom:role_lifecycle author Dc:dc15 Creator crm:property was created by crm:property has created
onix:CodeList17 Createdby A Concept Family marc21:Relationship Creator Context Create Agent Creator Resource Creation RelatorCreator_CreationCreation_Creator Relator Create_Creation Creation_Create Relator Create_Creator Creator_Create Relator Creator_Creator Relator Creation_Creation lom:role_lifecycle author Dc:dc15 Creator crm:property was created by crm:class Man-made object crm:property has created
onix:CodeList17 Createdby A Concept Family marc21:Relationship Creator Context Create Agent Creator Resource Creation RelatorCreator_CreationCreation_Creator Relator Create_Creation Creation_Create Relator Create_Creator Creator_Create Relator Creator_Creator Relator Creation_Creation lom:role_lifecycle author frbr: Endeavour Dc:dc15 Creator crm:property was created by crm:class Man-made object crm:property has created
onix:CodeList17 Createdby A Concept Family marc21:Relationship Creator rdd:verbs Make Context Create Agent Creator Resource Creation RelatorCreator_CreationCreation_Creator Relator Create_Creation Creation_Create Relator Create_Creator Creator_Create Relator Creator_Creator Relator Creation_Creation lom:role_lifecycle author frbr: Endeavour Dc:dc15 Creator crm:property was created by crm:class Man-made object crm:property has created
onix:CodeList17 Createdby A Concept Family marc21:Relationship Creator rdd:verbs Make Context Create Agent Creator Resource Creation RelatorCreator_CreationCreation_Creator Relator Create_Creation Creation_Create Relator Create_Creator Creator_Create Relator Creator_Creator Relator Creation_Creation lom:role_lifecycle author frbr: Endeavour Dc:dc15 Creator crm:property was created by crm:class Man-made object crm:property has created ddex: (nothing) rda: (nothing) frad: (nothing)