140 likes | 155 Views
CONCEPTUAL MODELLING OF STATISTICAL METADATA AND METADATA DATA MODEL IN CoSSI. Heikki Rouhuvirta, Statistical Methodology R&D. heikki.rouhuvirta@stat.fi. Geneva, 3-4 April 2006. Points of departure in conceptualising statistical information. Nature of statistical data
E N D
CONCEPTUAL MODELLING OF STATISTICAL METADATA AND METADATA DATA MODELIN CoSSI Heikki Rouhuvirta, Statistical Methodology R&D heikki.rouhuvirta@stat.fi Geneva, 3-4 April 2006
Points of departure in conceptualisingstatistical information • Nature of statistical data • statistical data are fully defined as they are created • being fully defined, statistical information describes itself (provided information is not loss at some stage of process) => points of departure being that statistical data are defined and describe themselves exhaustively • Relationship with reality • we are not modelling reality or its processes, neither are we modelling the process of statistics production, but statistical information instead • therefore, we need tools and methods suited for information analysis and modelling => the target of the modelling is not the real world but statistical information instead Heikki Rouhuvirta
Modelling of statistical information • The basic problem:how to process, manage and present statistical information as a single entity, • so that the producer of statistics while producing statistical numerical information can check and verify its meaning and intended purpose of use • the user of statistics while searching statistical information and having received or seen numerical statistical information can check and verify its meaning and intended purpose of use => practical importance of finding a solution to the problem has gained emphasis since the mid-1990s as the Internet has facilitated easy dissemination of complete statistical data • The solution: Common Structure of Statistical Information – CoSSI • the goal is management of statistical information as an entity • the producers know what kind of data are being processed and analysed on any given occasion • the user can specify the sought for data and knows what kind of data he or she is using and can determine how to interpret or use them => the practical significance of the solution becomes concretised, for example, in that the data contents of statistical information can be fully and simultaneously included and exploited in searching for information on the Internet and in displaying search results • CoSSI Definition Descriptions available on the web at: http://www.stat.fi/cossi Heikki Rouhuvirta
Common Structure of Statistical Information (CoSSI) – parts and entity Heikki Rouhuvirta
The typology of metadata in CoSSI • (1) Statistical metadata that are content-specific and necessary for the interpretation of numerical statistical data. • (2) Metadata relating to the identification and archiving of datafiles, which form document metadata. • (3) Metadata concerning processing, of which some belong to statistical metadata as statistical and methodological process data and some belong to the process description as technical metadata required by the used applications. • (4) Technical metadata concerning the process, which contain the technical data required by applications and the metadata used or created in the steering of the project. Heikki Rouhuvirta
Statistical metadata variable centricconcepts in CoSSI Heikki Rouhuvirta
Statistical Metadata -Logical Concept Model (I) Heikki Rouhuvirta
Statistical Metadata -Logical Concept Model (II) Heikki Rouhuvirta
Statistical Metadata -Logical Concept Model (III) Heikki Rouhuvirta
Metadata Modules in the CoSSI Model • metadata on statistical information content (statmeta.dtd) • quality evaluation (qualitydeclaration.dtd) • file metadata (docmeta.dtd) • metadata on inquiry (question.dtd) • metadata on register information (e.g. Taxmeta.dtd) • process metadata (e.g. procmeta.dtd). Heikki Rouhuvirta
… what do the results look like in respect of an individual item of statistical data, then Heikki Rouhuvirta
Income distribution statistics – statistical metadata (I) Heikki Rouhuvirta
Income distribution statistics – statistical metadata (II) Heikki Rouhuvirta
SOME CONCLUSIONS • Adequacy of the model • easily extendable • Advantages • technological advantages (XML...) • productional advantages (elimination of overlapping production of same data) • standardisation of production and dissemination • Present status quo • presentation on the subject is given below • PcAxis extensions • Future • implementation of CoSSI definitions into various statistical software applications (SAS, SuperStar...) • elaboration on CoSSI version 2.0 Heikki Rouhuvirta