150 likes | 232 Views
METADATA MANAGEMENT AT ISTAT: CONCEPTUAL FOUNDATIONS AND TOOLS. Istituto Nazionale di Statistica ITALY. Some remarks about metadata SDOSIS and the OSI model SDOSIS: the inquiry environmet. Concepts about metadata.
E N D
METADATA MANAGEMENT AT ISTAT: CONCEPTUAL FOUNDATIONS AND TOOLS Istituto Nazionale di Statistica ITALY
Some remarks about metadata • SDOSIS and the OSI model • SDOSIS: the inquiry environmet
Concepts about metadata • Statistical Metadata: information required to retrieve, understand and properly use datafor statistical analysis • Data semantics + other information • Statistical Metadata are context-dependent • they support different statistical activities (data production, data dissemination…) but are required to be exchangeable • they are exchanged among subjects performing different statistical activities (data producers, data users…)
Concepts about metadata MODELS and TERMINOLOGIES for statistical metadata • We have a lot of MODELS for each relevant metadata class, in particular for specifying data semantics at a conceptual level • We have TERMINOLOGIES which list statistical concepts (such as ‘Variable’, ‘Classification’) with their definitions. They are better described as META-TERMINOLOGIES. Such meta-terminologies are always based on an underlying model (or language), that should made explicit
Concepts about metadata MODELS and META-TERMINOLOGIES for statistical metadata • The existing MODELS and META-TERMINOLOGIES are based on the implicit ONTOLOGIES of the various subjects that have defined them • In order to allow such subjects to exchange metadata A REFERENCE GENERAL ONTOLOGY SHOULD BE DEFINED • this is possible because all the statistical activities ultimately support the use of datafor statistical analysis • Metanet: a first important effort
Concepts about metadata Which relationship between MODELS and ONTOLOGIES? • A MODEL is a set of related concepts which is used for producing a structured specification of some area of interest, in particular, a metadata class (it is something similar to a language) • MODELS and ONTOLOGIES are similar conceptual tools, but ONTOLOGIES are generally the result of an effort towards a more ‘general purpose’ specification of an area of interest (something similar to a specification of what we are speaking about)
Concepts about metadata • In orderto define such a REFERENCE GENERAL ONTOLOGY FROM: Which are the relevant classes of statistical metadata for my activity (data production, data dissemination…)? TO:Which classes of knowledge are basically required for understanding and properly using data for statistical analysis?
Concepts about metadata Some remarks on STATISTICAL METADATA QUALITY • DIMENSIONS of METADATA QUALITY: • CONCEPTUAL SOUNDNESS, in particular comparability and exchangeability • ACCURACY • CONCEPTUAL SOUNDNESS is ensured by a reference general ontology • ACCURACY: see Statistics Sweden’s paper!
The ISTAT strategy • Two centralised systems for metadata management, SIDI and SDOSIS • SIDI and SDOSIS are based on proper metadata models, SDOSIS is based on the OSI model • They disseminate metadata to both data users and survey designers, they work as metadata servers for data management systems and software tools
SDOSIS and the OSI model • The first version of SDOSIS (2004) documents the information content of ISTAT surveys • The survey TERMINOLOGY • Thesurvey INFORMATION FRAMES • the data semantics, specified by means of information frames, is defined in terms of observed real world objects, specified by the survey terminology • The survey TERMINOLOGY and the survey INFORMATION FRAMES are specified according to the OSI model • Future versions will document the information content of SIS and support the integration activities
SDOSIS and the OSI model • The survey TERMINOLOGY is a network of connected terms, each one describing an observed real world object or a derived object • Kinds of terms, according to the OSI model: • STATISTICAL UNIT, NUMERICAL VARIABLE, CLASSIFICATION VARIABLE, CLASSIFICATION, CLASSIFICATION SYSTEM, IDENTIFIER, IDENTIFIER_SET, ASSOCIATION • STATISTICAL TABLE • Each term has a NAME and a DEFINITION involving other terms • TERM BUILDING CONSTRUCTS are used for specifing those transformations by means of which new terms are derived
SDOSIS and the OSI model • The survey INFORMATION FRAMES describe the issueddata as structured collections of terms • Kinds of information frames, according to the OSI model: • INDIVIDUAL DATUM • specifies collections of individual items (microdata) • SUMMARY DATUM • specifies totally aggregated data (macrodata) or semi-aggregated data • corresponds to a STATISTICAL TABLE • Each information frame is referred to a TIMESET
Main features of SDOSIS SDOSIS components: • the Metadata specification environment • the survey manager specifies the survey terminology and the survey information frames • the survey manager may declare, for each survey term, a correspondence with a standard term • for this purpose, SDOSIS manages terminologies belonging to official standards as well as local area standards • the Database • it is structured as a network of terms repositories • the Classifications repository • it stores the modalities of survey and standard classifications and their correspondences • the Documents repository • the Inquiry environment
The SDOSIS Inquiry environment • Inquiry by name • the end user specifies the name of a survey, a standard, a SIS • the end user views the terminology and information frames of the chosen survey (or standard, or SIS) • Inquiry by terms • the end user specifies a term • when the term is a statistical unit, the end user may refine the choice, by means of • selecting a subset of the specified term • selecting a set of connected terms (numerical and classification variables as well as classifications) • the end user views the surveys (or the standards or the SIS) which have the selected term/terms in their terminology • the end user chooses a survey (or a standard or a SIS) • the end user views the terminology and information frames of the chosen survey (or standard, or SIS)
The SDOSIS Inquiry environment • the SDOSIS Inquiry environment exploits a distinguished network of thesauri • such a network is built by the system manager, on the basis of the SDOSIS database, by means of proper functionalities which take into account: • the existence of alphabetical synonyms among terms • the declared correspondences between survey terms, as well as between survey terms and standard terms • the SDOSIS Inquiry environment will be integrated inside the ISTAT system for data dissemination, so as to allow the end user to retrieve the data collections issued by the ISTAT surveys