480 likes | 758 Views
The importance of Metadata. Marta Melgar García mmelgar@ine.es. Presentation Index. Introduction Statistical Metadata Standards and Terminologies Languages for Statistical Metadata Statistical Metadata in Spain Metadata in European Websites References. Introduction.
E N D
The importance of Metadata Marta Melgar García mmelgar@ine.es
Presentation Index • Introduction • Statistical Metadata • Standards and Terminologies • Languages for Statistical Metadata • Statistical Metadata in Spain • Metadata in European Websites • References
Introduction Metadata Definition: In general: “data about data” Functionally: “structured data about data” Metadata includes data associated with either information object for purposes of description, administration, legar requirements, technical functionality, use and usage, and preservation. Source: Dublin Core Metadata Initiative
Introduction Statistical Metadata is any information that is needed by people or systems to make proper and correct use of the real statistical data, in terms of capturing, reading, processing, interpreting, analysing and presenting the information (or any other use). In other words, statistical metadata is anything that might influence or control the way in which the core information is used by people or software.
Introduction Why metadata are important? • To get a complete picture of the subject matter. • To provide information that makes data understandable and shareable. • To be a repository of knowledge and expertise. • To structure the information and store expert knowledge from subject area specialists (some times unstored). Source:WTO
Introduction Why metadata are important? • For assessing the quality and reliability of data. • To determine the effectiveness of any cross-country analysis. • To highlight differences between countries and deviations from international standards. • They are very important for users in selecting and interpreting data. Source:WTO
Introduction What are the objectives of metadata? • Great customer satisfaction. • Greater productivity. • Better public perception and cooperation.
Introduction Detailed list of metadata: • Definition • Description of dimensions • Coverage (geographical, reference period, exclusions) • Sources • Classification • Methodology (brief description) • Quality assessment
Introduction Problems related to Metadata • Knowledge of the main users is essential. • Metadata are effective when they meet the needs and expectations of users. • Elaborate and very detailed metadata are difficult to keep updated. It is important that the amount of information is kept to a minimum. • This requires judgement from the area specialist on what statistical and methodological aspects are important and which will have considerable impact on how data may be used.
Introduction Problems related to Metadata: • On the other hand it is crucial that metadata are complete. • The effectiveness of metadata depends as well on the easiness of getting the information.
Statistical Metadata Purpose • Statistical metadata or metadata for statistical data and processes is used to enhance users’ search and understanding of statistical data, improve and automate survey processing within each office, and facilitate statistical data harmonization, among many others. • Many offices are using metadata driven systems to automate parts of the survey process. Source: Statistics Canada
Statistical Metadata What is Statistical Metadata? Any information that is needed by people or systems to make proper and correct use of the real statistical data when: • Capturing • Reading • Processing • Presenting • Analysing • Interpreting • Exchanging • Searching • Browsing Source:Andrew Westlake
Statistical Metadata What does Statistical metadata include? • File description • Codebooks • Processing details • Sample designs • Fieldwork reports • Terminology
Statistical Metadata Statistical Metadata can be used: • informally by people who read it. • formally by software to guide the way information is processed.
Statistical Metadata What is Statististical Metadata important for? • Sharing data • Archiving (Secondary users need good information) • Discovery (data can help me to solve a problem) • Automatization (parametrisation of standardised processes) • Quality
Statistical Metadata • Metadata is not and absolute concept. • Data become metadata when they are put into a descriptive relationship with something else (Farance and Gillman, 2005).
Statistical Metadata What stage does the metadata apply to? • Design • Data collection • Data processing • Transformation and analysis • Dissemination • Exchange… Source:Andrew Westlake
Statistical Metadata Statistical production process Archiving Secondary use of data
Statistical Metadata An statistical metadata system is a data processing system that uses, store and produces statistical metadata (UNECE 2000).
Statistical Metadata Quality and metadata: • Product quality for statistics are often described according to Eurostat criteria (Eurostat 1998): • Relevance and completeness. • Accuracy. • Timeliness and punctuality. • Comparability and coherence. • Accesibility and clarity.
Statistical Metadata Systematic information about statistics or statistical metadata are neccesary for: • Satisfy users needs. • Clearness of statistics. • Improve accesibility. • Information about production processes are essential in order for the users to understand the statistics.
Statistical Metadata Further developments: • Develop a system where metadata are directly linked with the data. • Develop also metadata by country or region, when required. • Dissemination of metadata: make the information available to external to the division users
Standards & Terminologies • Dublin Core (DCMI) • SDMX • ISO 11179 • Neuchâtel Terminological Model
Standards (Dublin Core) What is the Dublin Core? • The Dublin Core metadata standard is a simple yet effective element set for describing a wide range of networked resources • The Dublin Core standard includes two levels: Simple and Qualified
Standards (Dublin Core) • The semantics of Dublin Core have been established by an international, cross-disciplinary group of professionals from librarianship, computer science… • Dublin Core has two classes of terms -- elements (nouns) and qualifiers (adjectives)
Standards (Dublin Core) DCMI :goals • Simplicity of creation and maintenance • Commonly understood semantics • International scope • Extensibility
Standards (SDMX) • SDMX:Statistical Data and Metadata eXchange. • The name “Statistical Data and Metadata eXchange” refers to an international initiative aimed at developing and employing more efficient processes for exchange and sharing of statistical data and metadata among international organisations and their member countries. • SDMX is an initiative to foster standards for the exchange of statistical information. • The initiative, started in 2001, is sponsored by 7 international organisations: Bank for International Settlements (BIS), European Central Bank (ECB), Eurostat, International Monetary Fund (IMF), Organisation for Co-operation and Development (OECD), United Nations (UN) and the World Bank (WB).
Standards (SDMX) • The SDMX metamodel is concerned with the structure of data and metadata and with semantics required to understand the meaning of the data and metadata. • The SDMX message formats have two basic expressions, SDMX-ML(using XML syntax) and SDMX-EDI (using EDIFACT syntax and based on the GESMES/TS statistical message. • SDMX specifies registry interfaces based on the SDMX model. Source:http://www.sdmx.org
Standards (SDMX) What are the goals of SDMX? • Standardisation for statistical data and metadata access and exchange. • The objective is to establish a set of commonly recognised standardsto have easy access to statistical data, wherever these data may be, but also access to metadata that makes the data more meaningful and usable.
Standards (SDMX) What kinds of metadata can be exchanged with SDMX? SDMX metadata standards build on the distinction between “structural” and “reference” metadata: • Structural metadata are those metadata acting as identifiers and descriptors of the data, such as names of variables or dimensions of statistical cubes. Structural metadata must be associated with the data, otherwise it becomes impossible to identify, retrieve and browse the data. • Reference metadata are metadata that describe the contents and the quality of the statistical data (conceptual metadata, describing the concepts used and their practical implementation, methodological metadata, describing methods used for the generation of the data, and quality metadata, describing the different quality dimensions of the resulting statistics, e.g. timeliness, accuracy Source:http//www.sdmx.org
Standards (ISO 11179) ISO 11179 : INFORMATION TECHNOLOGY-METADATA REGISTRIES (MDR). • ISO 11179 has an explicit registry metamodel as part of its model. • Standardized data design procedures for supporting electronic data interchange. • It develops a set of principles, methods and procedures for specifying what is needed to document the association between the various types of administered items and one or more classification schemes. • It does not establish a particular classification scheme as preeminent.
Terminologies (Neuchâtel Terminological Model) • It defines the key concepts that are relevant for the structuring of metadata and provides the conceptual framework for the development of a database organising that metadata. • A Terminology lists statistical concepts. • A Model is a set of related concepts which is used for producing a structured specification of some area of interest.
Terminologies (Neuchâtel Terminological Model) • Purpose: to arrive at a common language and a common perception of the structure of classifications. • It is both a terminology and a conceptual model. • It has a two level structure: • First: level of the object types. • Second: the attributes associated with each object type.
Languages for Statistical Metadata (XBRL) XBRL is a language for the electronic communication of business and financial data which is revolutionising business reporting around the world.
Languages for Statistical Metadata (XML) • SDMX makes use of the schema definition language known as W3C XML Schema (XSD). • The combination of statistical metadata and XML (Extensive Markup Language) leads to the creation of a framework for organizing and retrieving statistical information. • Statistical information takes heterogeneous forms which range from textual to numeric, graphs, tables…and even more multimedia. This means different types of data.
Languages for Statistical Metadata (XML) • Such heterogeneitycreates barriers to organising and making statistical data accesible from a Web page. • An ideal solution to such heterogeneous data is to use object-oriented database. • Another solution is to use statistical metadata and XML to construct a framework for organising and searching statistical data. Source: Bi and Murtagh
Statistical metadata in Spain • We already have metadata in different fields (methodologies). • The objective of metadata is to build a tool in a medium term in order to facilitate the integration and co-ordination of the whole information requested by INE to data providers. • Our aim is to produce more harmonised information and more comparable to allow data users get a tool about every statistical operation performed by INE. Source:Blanco and Sánchez-Luengo
Statistical metadata in Spain Metadata: scope, source, frequency, IOE Code
Statistical metadata in Spain Survey Methodology
Statistical metadata in Spain Survey design
Metadata in European Websites: Eurostat Metadata icon
Metadata in European Websites: Romania Metadata icon
References • OECD, Metadata for short-term indicators: International comparisons and best practices, working paper. • OECD, The role of metadata in promoting international comparisons and adherence to international statistical standards, (http://www.oecd.org/std/metarole.htm) • Bureau of Census, United States, Transition plan for unified approach to metadata management at the bureau of the Census, working paper. • UN/ECE Secretariat, Standards for Statistical Metadata on Internet, working paper. • Statistics Canada, The evolution of metadata at Statistics Canada: an integrative approach, working paper. • Statistics New Zealand, examples of metadata in the Survey and Output Information Database and INFOS database at http://www.stats.govt.nz/statsweb.nsf. • Statistics Canada, examples of metadata in Information on Products and Services Catalogue at http://www.statcan.ca/english/search/ips.htm. • http://www.intracen.org/countries/metadata.htm