310 likes | 332 Views
Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions. Heikki Rouhuvirta, Statistical Methodology R&D. heikki.rouhuvirta@stat.fi. Geneva, 8-10 May 2007. Approaches to Statistics Production.
E N D
Developing Statistical Information Systems and XML Information Technologies- Possibilities and Practicable Solutions Heikki Rouhuvirta, Statistical Methodology R&D heikki.rouhuvirta@stat.fi Geneva, 8-10 May 2007
Approaches to Statistics Production • Sources to statistics – Data Processing • Sources to statistics – Statistical Methodology • Statistics as Information Heikki Rouhuvirta
registers Inquiries other statistical data Compilation / combining of data logical verifications Datum tilasto- aineisto Dirty data processing into statistical concepts Imputation etc. quality control and approval of data for the purpose of statistics compilation protection of unit-level data reporting analyses further processing reporting release release IT in Statistics Production Heikki Rouhuvirta
Methodological processing of statistical data In statistics production Heikki Rouhuvirta
Statistical Information Heikki Rouhuvirta
Challenge: • create solutions that unite the foregoing point of views • the solutions offer the services that statistic production needs • the solutions are easy recognizable by a user and • offer an adequate informative basis for each individual task • by solutions the entity of tasks is manageable for the statistician Key for Solution: • exploitation of XML Technology Heikki Rouhuvirta
Basic of XML XML Spesification for Statistical Information Common Structure of Statistical Information (CoSSI) Heikki Rouhuvirta
… the result from a statistics standpoint … Heikki Rouhuvirta
Statistics Production and Statistical Information 0. Defining • Collecting • Editing • Producing public statistics • Using Stages of Processing Model of Data Organisation condensed format table and description basic format datamatrix and description descriptions in different documents condensing interpreting matrix model including statmeta table model including statmeta statistical metadata model matrix module table module statmeta module Heikki Rouhuvirta
… case studies of XML in statistics production … Heikki Rouhuvirta
XML Database and Statistical Information Heikki Rouhuvirta
Retrieval of Statistical Metadata for a Variable - Simple User Interface Heikki Rouhuvirta
Turn over the Documents in XML Database Heikki Rouhuvirta
Saving Documents to XML Database Heikki Rouhuvirta
Event log of XML Database /db /system admin dba /config admin dba users.xml admin dba rwurwu--- /Tilastot admin dba /logs admin dba contents.xml admin dba rwurwur-- /db/logs/contents.xml ... <event timestamp="2007-03-02T10:57:47.941+02:00"> <type>STORE</type> <path>/db/Tilastot/Arbortext-koulutus/Julkaisut/Julkaisu4.xml</path> </event> <event timestamp="2007-03-02T10:57:48.235+02:00"> <type>STORE</type> <path>/db/Tilastot/Arbortext-koulutus/Julkaisut/Julkaisu4_001.gif</path> </event> <event timestamp="2007-03-02T10:57:48.898+02:00"> <type>STORE</type> <path>/db/Tilastot/Arbortext-koulutus/Julkaisut/Julkaisu4_002.gif</path> </event> <event timestamp="2007-03-02T10:57:49.89+02:00"> <type>STORE</type> <path>/db/Tilastot/Arbortext-koulutus/Julkaisut/Julkaisu4_002.png</path> </event> <event timestamp="2007-03-02T10:58:35.741+02:00"> <type>STORE</type> <path>/db/Tilastot/Arbortext-koulutus/Julkaisut/Julkaisu4_eq_00.gif</path> </event> <event timestamp="2007-03-02T11:26:28.432+02:00"> <type>UPDATE</type> <path>/db/Tilastot/Arbortext-koulutus/Julkaisut/Julkaisu1.xml</path> </event> </events> Heikki Rouhuvirta
Tabulation Application Architecture in SAS Heikki Rouhuvirta
Tabulation Wizard User Interface in SAS EG Heikki Rouhuvirta
SAS Data Editing Process Heikki Rouhuvirta
Logical schema of an XML file Statistical data Heikki Rouhuvirta
Archiving and Backuping to XML Heikki Rouhuvirta
Example of Xquery/SQL Heikki Rouhuvirta
Content of XML file Heikki Rouhuvirta
Production and Dissemination of Tables in Publishing Process Heikki Rouhuvirta
XML Publication Editor - User Interface Heikki Rouhuvirta
Retrieval of Statsitical Information Heikki Rouhuvirta
… and statistical information in tables Heikki Rouhuvirta
Table 1. Statistical Metadata in a informative statistical table (I) Statistical metadata: title, subtitle, footnote, metadata reference (quality declaration) Document metadata elements: subject, keywords, content description, date, identifier Variable 2 Variable 2 Variable 3 Variable 3 Variable 1 Variable 1 Statistical metadata elements: -name, specification, concept definition, concept definition description, operational definition, operational definition description, calculation name, calculation formula, calculation description, measurement unit, measurement description Statistical figure 6 Statistical figure 6 Class value 1 Class value 1 Statistical figure 1 Statistical figure 1 Statistical figure 2 Statistical figure 5 Statistical metadata elements: -note Class value 2 Class value 2 Statistical figure 3 Statistical figure 7 Statistical figure 4 Register metadata elements: name, concept definition, formation intsruction, law, interpretation of law, lawcases, etc. Statistical figure 8 Statistical metadata elements: -code, name, description Document metadata elements: -classification id, type, author, date Heikki Rouhuvirta
Table 1. Statistical Metadata in a informative statistical table (II) Variable 2 Variable 2 Variable 3 Variable 3 Variable 1 Variable 1 Quality declaration Quality Indicators: Coefficient of Variation Value=0.92 Statistical figure 6 Statistical figure 6 Class value 1 Class value 1 Statistical figure 1 Statistical figure 1 Statistical figure 2 Statistical figure 5 Quality Indicators: Coefficient of Variation Value=0.87 Class value 2 Class value 2 Statistical figure 3 Statistical figure 7 Statistical figure 4 Statistical figure 8 Heikki Rouhuvirta
Table 1. Statistical Metadata in a informative statistical table (III) Variable 2 Variable 2 Variable 3 Variable 3 Variable 1 Variable 1 Quality declaration Quality Indicators: Coefficient of Variation Value=0.92 Statistical figure 6 Statistical figure 6 Class value 1 Class value 1 Statistical figure 1 Statistical figure 1 Statistical figure 2 Statistical figure 5 Quality Indicators: Coefficient of Variation Value=0.87 Class value 2 Class value 2 Statistical figure 3 Statistical figure 7 Statistical figure 4 Statistical figure 8 Heikki Rouhuvirta
Conclusions XML Based Service Environment in Statistics Production • The statistics production solution briefly described above gives indications of the kinds of services that could be produced from a statistical information system in future, both for statisticians and the users of statistical data. The foundation (for statistics production) is an XML-based information architecture and standard applications exploiting it. • Basing the implementation of the information architecture on XML allows utilisation of standard and standard-like specifications, but the special characteristics of statistical information should be taken into consideration in their application and implementation. If, for instance, the possibilities of a semantic structural specification are not exploited in the structural analysis and the final structure of statistical data, from the point of information management the solutions become complicated, on the one hand, and ineffective in practice, on the other. From the perspective of application development, it seems especially important that the information architecture itself does not contain application-specific data specifications, because we are unlikely to see a situation where we would have just one monolithic application for both statistics production and information service provision. • A semantically relevant structure helps the statistician and the user of statistics to control the correctness of contents. Heikki Rouhuvirta
Thank you for your attention! Heikki Rouhuvirta