1 / 23

Metadata projects and tasks at Statistics Finland

Learn about the evolution of metadata systems at Statistics Finland, including CoSSI model, XML databases, and the Variable Editor project.

cgeorge
Download Presentation

Metadata projects and tasks at Statistics Finland

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Metadata projects and tasks at Statistics Finland METIS 2010 Saija Ylönensaija.ylonen@stat.fi

  2. Organizational chart Saija Ylönen

  3. Co-operating parties of the metadata tasks: organizational units • IT Management • situated in the Secretariat of the Director General • co-ordinates the general information architecture, of which metadata tasks form one element • Classification and Metadata Services • situated in the IT and Statistical Methods department • operational unit • active role in developing of metadata • Dissemination Services • situated in the IT and Statistical Methods department • develops the metadata connected with the dissemination Saija Ylönen

  4. Metadata Co-ordination Group • Originally a co-operation group for persons working with metadata issues in the support function departments of SF • The objective at present is to intensify the co-operation between the statistics departments and the parties responsible for general metadata work • Comprised of members working on metadata and permanent members from all statistics department • Goal is to widen knowledge about metadata and metadata systems and to give an opportunity to the statistics departments to discuss their metadata needs with metadata specialists Saija Ylönen

  5. CoSSI Steering Group and CoSSI model • Foundation for the metadata system • Modular, xml-based model for describing statistical tables, classifications, concepts, variables, general information on statistical documents, and quality, etc. • Expandable • CoSSI Steering Group is in charge of mastering and developing the model according to user needs in a manner that will not expose its main structure to risk Saija Ylönen

  6. Definition of metadata • 1) Statistical metadata • variable and data descriptions • classifications, concepts • 2) Statistical data quality • quality reports • statistical method descriptions • 3) Metadata of statistical documents or products • producers • publication information • field or subject area Saija Ylönen

  7. Definition of metadata II • 4) Process metadata • a) technical metadata • technical metadata guide the workflow of data production, makes it possible to follow data production and documents the working process. • b) conceptual process metadata • technical information of data and variables which are used in producing data. E.g. minimum or maximum values, various calculation rules or use of certain classification values Saija Ylönen

  8. Metadata systems at Statistics Finland Saija Ylönen

  9. Metadata systems: present situation • We are in a transitional phase from relational databases to an xml-based environment • Relational databases: classifications, concepts and definitions, archiving database • Xml database eXist: publications, classifications, concepts, data descriptions Saija Ylönen

  10. Relational databases • Built in the 1990’s • Used in statistics production but not in all statistical processes or all statistics • Classifications in the relational databases are used in SAS and Superstar • Archiving database is in use in the archiving process • Classifications and concepts are generated from the relational databases to the web pages Saija Ylönen

  11. XML database • At the moment, the xml database is used mostly in the creation of publications with an Arbortext word processor • Classifications and concepts are copied to the xml database from the relational databases and are ready to use • Tools for utilising metadata objects from the xml database are being constructed • The first metadata tool linked to the xml database is the variable editor Saija Ylönen

  12. Variable editor • For creating and maintaining the descriptions of statistical data and variables • At the testing phase • Implementation begins in 2010 • Descriptions are saved as xml documents conforming to the CoSSI model in the eXist/xml database Saija Ylönen

  13. Content and functions of the variable editor • Data descriptions are comprised of a general description of the data, a list of variables and information about an individual variable • General data description includes descriptive information on the entire data document • Variable list interleaf allows management of the list of variables in the data description and selection of the variable whose description needs editing. Saija Ylönen

  14. Variable list interleaf Saija Ylönen

  15. Variable metadata Saija Ylönen

  16. Results from the variable editor project In addition to actual variable editor application the project also created preconditions for: • the development of a consistent information architecture • the construction of production applications in which metadata need not be separately produced or manually added to data when publishing or archiving statistics • information service where excessive time need not be spent on searching for metadata, or on actual reproduction of metadata for special compilation assignments • a system from which table column and row headings can in tabulation applications be retrieved in multiple languages for all statistics using the same methods. Saija Ylönen

  17. Experiences gained during the variable editor project • Various questions concerning standardisation had to be addressed in the project although they were not originally in the projects’ scope of task – they had to be done and they took a lot of time • Because the variable editor project was the first leg in the revision of the metadata system it was subjected to a diversity of expectations • Project was a good test run for the CoSSI model – the data content of the model proved to be exhaustive Saija Ylönen

  18. The planning and building of a classification editor • Reasons for the renewing of the classification system: • the present way of maintaining classifications has been viewed as inflexible by statistics • renunciation of the Sybase relational databases • ICT strategy: in the next few years the agency will introduce a common statistical metadata system based on the CoSSI model • Classification editor project 2010 • 1) definition stage • 2) construction stage Saija Ylönen

  19. Goals of the classification editor project • Analyse the service needs required from a centralised classification system • Create maintenance tools for classifications in connection with the CoSSI/eXist metadata store so that the basic maintenance needs of classifications of individual statistics are met in a user-oriented manner which also allows further development of the classification system • Produce the solutions with which the interoperability of the Sybase classification database and the eXist metadatabase can be ensured • Compile user instructions for the editor • Pilot test the editor Saija Ylönen

  20. Benefits of the new classification system • A classification system which serves well will encourage centralised and structured maintenance of classification • The documentation of classifications will improve, making them easy to find for use in-house and for the provision of information service • The new classification system will support smooth movement between data descriptions, variable descriptions and maintenance of classifications and thus improve the efficiency of the maintenance and use of classifications in statistics Saija Ylönen

  21. General benefits of the common classification system • A centralised classification system eases the workload needed to maintain classifications because classifications are only maintained in one place • Reduces the possibility of errors because classifications are documented in the system consistently so that they are accessible to everybody and easy to find • Improves the efficiency of time use because working hours need not be spent on looking for classifications and trying to find their background information • Makes the classifications used in different statistics visible to everybody and thus creates possibilities for their harmonisation Saija Ylönen

  22. In conclusion: Why do some statistics departments still have their own metadata systems instead of using the centralized system? • Centralised metadata work progresses too slowly from the perspective of individual statistics – We should rethink our construction and implementation strategy • Common attitude still regards the process of an individual set of statistics as unique, and therefore incapable of exploiting systems that are meant for all statistics – We have to get quick results to prove the benefits of the system • Commitment by the Management and their support to the work is crucial – We have to convince them Saija Ylönen

  23. Thank you for your attention! Saija Ylönen

More Related