210 likes | 227 Views
Explore the process of building a conceptual model for statistical metadata, starting from a glossary and progressing towards an ontology. Discover the importance of semantic relations, concept mapping, and the use of RDF and SKOS in this journey.
E N D
Metadata Common Vocabularya journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar (sergio.bacelar@ine.pt) Statistics Portugal Joint UNECE/Eurostat/OECD Work Session on Statistical Metadata (METIS) Lisbon, 11 – 13 March, 2009
Definitions SDMX and SDMX Content-Oriented Guidelines (COG) Metadata Common Vocabulary (MCV) Concepts and related definitions used in structural and reference metadata of international organizations and national data producing agencies. Content Oriented Guidelines = MCV+ Cross Domain Concepts (subset of MCV) + Statistical Subject-matter Domains Last version (2009): 397 terms. Goal: uniform understanding of standard metadata concepts.
ESSnet on SDMX • Objective • Further development of SDMX • Further development and improvement of the SDMX Content-oriented Guidelines • Metadata Task Force on SDMX (Statistics Portugal) • WP Proposal: MCV Ontology • Metadata Common Vocabulary (MCV) • Semantic univocity design of a conceptual model of the domain • Detecting eventual inconsistencies, redundancies or incompleteness of the glossary • Lack of structure, flat list, non-hierarchic relations between terms • No semantic relations between terms
Conceptual system Building a glossary implies usually a previous design of a conceptual model of the respective domain. • Proposal for a revision of MCV • Starting with the existent terms and definitions • creating semantic relations between terms based on the definitions of the MCV terms • (bottom-up or middle-out strategy): • Goal: reveal the latent conceptual system, detecting eventual structural incongruence or redundancies.
Conceptual system and Concept Map • Main goals • find redundancies, inconsistencies, omissions, terms belonging to other domains different from statistical metadata (justified by the complex and interdisciplinary nature of metadata). • To find omitted terms (important and relevant), is necessary to analyze the definitions of the concepts. • Bearing this in mind we built a “Concept Map” representing about 20% of the terms in MCV (draft version). • A concept map is a diagram showing the relationships among terms/concepts. Concepts are connected with labeled arrows, in a downward-branching hierarchical structure. • Visualization (graphical): difficult since there is a great number of terms and relations.
Using Resource Description Framework (RDF) RDF is a framework for representing information in the Web. RDF is particularly concerned with meaning. RDF is a collection of triples, each one consisting of a subject, a predicate and an object: e.g. “MetadataExchangeis-a DataAnd MetadataExchange”
Middle range solution Using SKOS (Simple Knowledge Organization System) - currently developed within the W3C framework Bridging technology between “chaos” and more rigorous logical formalism of ontology languages (like OWL). It is an application of the Resource Description Framework (RDF) providing a model for expressing the basic structure and content of concept schemes such as thesauri.
SKOS example: concept -data <rdf:RDF ........... <skos:Concept rdf:about=http://www.mycom/#data> <skos:definition>Characteristics or information, usually numerical, that are collected through observation</skos:definition> <skos:prefLabel>data</skos:prefLabel> <skos:altLabel></skos:altLabel> <skos:broader rdf:resource="http://www.my.com/#information"/> <skos:related rdf:resource="http://www.my.com/#Characteristic"/> <skos:scopeNote>Data is the physical representation of information in a manner suitable for communication, interpretation, or processing by human beings or by automatic means (Economic Commission for Europe of the United Nations (UNECE), "Terminology on Statistical Metadata", Conference of European Statisticians Statistical Standards and Studies, No. 53, Geneva, 2000).</skos:scopeNote> </skos:Concept> </rdf:RDF>
Ontologies Ontology = explicit formal specifications of the terms in the domain (statistical metadata) and relations among them. It is a model of reality in the world (created using an iterative design) Using an editing and modeling system of ontologies like Protégé (open source software in http://protege.stanford.edu )
Ontologies reasoning It is essential to provide tools and services (reasoners) to help users answer queries over ontologies and classes and instances, e.g.: find more general/specific classes; retrieve individual matching an existing query ex. Is there any survey with trimestralfrequency that uses any classification system and has a dissemination format as an on-line database?
Ontologies - methodology Developing an ontology: 1. Defining classes 2. Arranging classes in a taxonomic hierarchy (classes and subclasses) 3. Defining slots (same as roles or properties) 4. Describing allowed values for these slots (facets, role restrictions) 5. Filling in the values for slots for instances (individuals)
Ontology - Classes Just a first try to build an ontology of statistical metadata: main classes created from MCV (According to SDMX Content-Oriented Guidelines: Framework, Draft March 2006, p.6) 1. General metadata (derived from ISO, UNECE and UN documents); 2. Metadata describing Statistical methodologies; 3. Metadata describing Quality assessment; 4. Terms referring to Data and metadata exchange (SDMX information model and data structure definitions, etc.).
Classes and subclasses Quality
Properties (e.g. “Quality according to Eurostat, has a dimension called relevance”) Class relevance Property
Codification - Ontology Web Language (OWL) ………………….. <owl:Ontology rdf:about=""> <rdfs:comment >Metadata Common Vocabulary (MCV) ontology.</rdfs:comment> </owl:Ontology> ……………………… // Object Properties <!-- http://www.semanticweb.org/ontologies/2008/8/MCV.owl#uses --> <owl:ObjectProperty rdf:about="#uses"> <owl:inverseOf rdf:resource="#isUsedBy"/> </owl:ObjectProperty> ……………………….. // Classes <!-- http://www.semanticweb.org/ontologies/2008/8/MCV.owl#ComputerAssistedInterviewing --> <owl:Class rdf:about="#ComputerAssistedInterviewing"> <rdfs:subClassOf rdf:resource="#DataCollection"/> </owl:Class>
Conclusion Since Ontology is a very strict, rigorous and formal language to represent knowledge, mapping a glossary like Metadata Common Vocabulary into a Statistical Metadata Ontology can help to reduce eventual inconsistencies, incompleteness and lack of structure; This may facilitate harmonization of concepts describing data (semantic univocity) to the SDMX users.