270 likes | 427 Views
Workshop on Metadata Standards and Best Practices November 19-20 th , 2007 Session 2 Metadata specifications for socio-economic science and supporting initiatives. Pascal Heus Open Data Foundation pheus@opendatafoundation.org http://www.opendatafoundation.org. Outline.
E N D
Workshop on Metadata Standards and Best PracticesNovember 19-20th, 2007Session 2Metadata specifications for socio-economic science and supporting initiatives Pascal Heus Open Data Foundation pheus@opendatafoundation.org http://www.opendatafoundation.org
Outline • Metadata specifications • Key players • Ongoing initiatives • Conclusions / Q&A Open Data Foundation – IZA 2007/11
What is Metadata? Labeled stuff Unlabeled stuff The bean example is taken from: A Manager’s Introduction to Adobe eXtensible Metadata Platform, http://www.adobe.com/products/xmp/pdfs/whitepaper.pdf • Common definition: Data about Data Open Data Foundation – IZA 2007/11
What are XML specifications? (1) • XML is a language that facilitate the capture of descriptive elements and attributes • Different objects carry different characteristics (book, car, weather) • We need to agreed on common set of descriptive elements (semantic) • Just like we used to design database, we have to describe the structure • This modeling process creates a Document Type Definition (DTD) or an XML Schema Open Data Foundation – IZA 2007/11
What are XML specifications? (2) • Specifications are made available to the general public on the web • Usually a URL • Can be turned into a “standard” (ISO) • Typically maintained by a consortium of agencies • Independent model • OASIS, W3C • ISO Open Data Foundation – IZA 2007/11
A suggested set for socio-economic data • Statistical Data and Metadata Exchange (SDMX) • Macrodata, time series, indicators, registries • http://www.sdmx.org • Data Documentation Initiative (DDI) • Microdata (surveys, studies) • http://www.ddialliance.org • ISO 11179 • Semantic modeling, concepts, registries • http://metadata-standards.org/11179/ • ISO 19115 • Geography • http://www.isotc211.org/ • Dublin Core • Resources (documentation, images, multimedia) • http://www.dublincore.org Open Data Foundation – IZA 2007/11
Statistical Data and Metadata Exchange (SDMX) • Purpose: Exchange of statistical information (time series/indicators). • Covers the metadata capture as well as implementation of registries. • Currently version 2.0 and also an ISO standard (17369:2005) • Sponsors: Bank for International Settlements (BIS), European Central Bank (ECB), EUROSTAT, International Monetary Fund (IMF), Organization for Economic Cooperation and Development (OECD), United Nations (UN), World Bank • Can actually be used for many other purposes. It’s a metadata metadata model. • http://www.sdmx.org Open Data Foundation – IZA 2007/11
Data Documentation Initiative 1/2.x • Purpose: Archive and document survey microdata • Effort to establish an international XML-based standard for the content, presentation, transport, and preservation of documentation for datasets in the social and behavioral sciences • Sections: document, survey, files, variables, other material • Used by data archives (producers) and librarians • Sponsors: DDI Alliance • http://www.ddialliance.org Open Data Foundation – IZA 2007/11
Data Documentation Initiative 3.0 • Purpose: Document the survey life cycle • Major shift from DDI 1/2.x • Currently in candidate recommendation, release in 2008 • Sponsors: DDI Alliance • http://www.ddialliance.org/ddi3 Open Data Foundation – IZA 2007/11
DDI & SDMX • Are complementary specifications • DDI 3.0 and SDMX 2.0 have been designed to work with each other • SDMX registries can wrap DDI documents • Microdata: single point in time / geography, high level of details (for statisticians, researchers) • Macrodata: high level indicators across time and geography (fro economists, policy makers) • Using DDI+SDMX allows linkages and drilling down from indicator to its source • See "DDI and SDMX: Complementary, Not Competing, Standards", A. Gregory, P. Heus, July 2007 available at http://www.opendatafoundation.org/?lvl1=resources&lvl2=papers Open Data Foundation – IZA 2007/11
ISO 11179 • Purpose: Manage registries / concepts • international standard for representing metadata for an organization in a Metadata Registry (a central location in an organization where metadata definitions are stored and maintained in a controlled method) • Compliance with this standard is important and both DDI 3.0 and SDMX have mapping mechanisms • Sponsors: ISO/IEC Joint Technical Committee on Metadata Standards • http://metadata-standards.org/ Open Data Foundation – IZA 2007/11
ISO 19115 • Purpose: Capture geography • It is a component of the series of ISO 191xx standards for Geospatial metadata. • ISO 19115 defines how to describe geographical information and associated services, including contents, spatial-temporal purchases, data quality, access and rights to use. • Compliance in DDI 3.0 • Sponsors: ISO/TC 211 Geographic information/Geomatics • http://www.isotc211.org/ Open Data Foundation – IZA 2007/11
Dublin Core • Purpose: describe resources • standard for cross-domain information resource description • widely used to describe digital materials such as video, sound, image, text, and composite media • Small sore set of elements • Used for survey documentation • Sponsors: Dublin Core Metadata Initiative • http://dublincore.org/ Open Data Foundation – IZA 2007/11
Advantages of XML metadata • Metadata is easy to transform • From one standard to another or into different format • DDI to SDMX, Dublin Core, MARC • To other formats fro presentation • HTML, PDF • Metadata is easy to exchange • Web services (SOAP, REST, etc.) • Metadata is searchable • XPath, XQuery • All these are native feature of XML Open Data Foundation – IZA 2007/11
DDI Alliance • Membership based organization • Agencies: ICPSR, World Bank, Open Data Foundation • National data archives: Danish, Finish, Dutch, Norway, Swiss, UK • Germany: Centre for Survey Research and Methodology (ZUMA), German Socio-Economic Panel Study (SOEP), Zentralarchiv fuer Empirische Sozialforschung (University of Koeln) • Universities: Alberta, Berkeley, Guelph, Harvard/MIT, Minnesota, etc. • Steering and Expert Committee • Meets annually at IASSIST • http://www.ddialliance.org Open Data Foundation – IZA 2007/11
ICPSR • The Interuniversity Consortium for Political and Social Research • The world's largest archive of digital social science data • Acquire and preserve social science data • Provide open and equitable access to these data • Promote effective data use • Home of the DDI Alliance • http://www.icpsr.umich.edu Open Data Foundation – IZA 2007/11
International Household Survey Network • Partnership of international organizations seeking to improve the availability, quality and use of survey data in developing countries • United Kingdom Department for International Development (DfID), * International Labor Organization (ILO), Partnership for Statistics in the 21st Century (PARIS21), United Nations Children Fund (UNICEF), United Nations Statistics Division (UNSD), World Health Organization and the Health Metrics Network (WHO/HMN), World Bank • Plays a major role in the adoption of DDI around the globe, active in many developing countries • Developer of the Microdata Management Toolkit • http://www.surveynetwork.org Open Data Foundation – IZA 2007/11
Open Data Foundation • US based non-profit organization • Adoption of global metadata standards and the development of open-source solutions promoting the use of statistical data • Coordination of development efforts • Board of directors, advisors and management group • Open to individual membership, institutional association is through projects • http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11
Metadata Technology • UK based private company • Consulting services and development of tools based on open standards and open source • Training services, registry services, metadata repositories, hosting • Focus on SDMX, DDI and related standards • http://www.metadatechnology.com Open Data Foundation – IZA 2007/11
IASSIST • International Association for Social Science Information Service & Technology • IASSIST is an international organization of professionals working in and with information technology and data services to support research and teaching in the social sciences. • Individual based membership • Primary platform for DDI community • Annual conference • 2008: Stanford, CA, 2009: Tampere, Finland • DDI Alliance annual meeting • http://www.iassistdata.org/ Open Data Foundation – IZA 2007/11
DDI Foundation Tools Program • Initiative aiming at the development of a Foundation Framework and a Toolkit to support the implementation of DDI applications and utilities (open source) • MOU established September 2007, 2-year program (renewable on a annual basis afterwards) • Canada Research Data Centre Network, Danish Data Archive, DDI Alliance, GESIS-ZUMA, National Opinion Research Center (NORC), Open Data Foundation (ODaF), and the UK Data Archive (UKDA) • Web site coming soon Open Data Foundation – IZA 2007/11
UKDA Data Exchange Tools (DExT) • Aim to develop, refine and test models for data exchange for both survey data and qualitative research data based on XML/RDF schema and will develop tools for data import and export • Research the feasibility of developing automated conversion procedures for legacy formats • ODaF currently involved in data conversion tool and qualitative metadata (QuDExT) • http://www.data-archive.ac.uk/dext/ Open Data Foundation – IZA 2007/11
NORC Data Enclave • National Opinion Research Center • provides a secure environment within which authorized researchers can access sensitive microdata remotely from their offices or onsite • Data from National Institute for Standards and Technology’s (NIST) Technology Innovation Program (TIP), the Ewing Marion Kauffman Foundation, and the Economic Research Service at the US Department of Agriculture • Possibly the first virtual data enclave • http://dataenclave.norc.org Open Data Foundation – IZA 2007/11
Canada RDC Project • Consists of 14 Research Data Centres Centres, 6 branch RDCs and the Federal Research Data Centre in Ottawa • Data provided by Statistics Canada • RDC are now connected through a high speed secure network • Project to adopt a DDI 3.0 based metadata framework for survey documentation and research work and sponsor development of tools • ODaF providing technical assistance • http://www.statcan.ca/english/rdc/index.htm Open Data Foundation – IZA 2007/11
EU 7th Research Framework Program • Under Socio-economic Sciences and Humanities – related specific 2007 objectives: to bring together existing research infrastructures to support the efficient provision of essential research services • INFRA-2008-1.1.2.27: promoting European wide access to microdata sets of official statistics for research and leading to a European statistical system open to researchers. • INFRA-2008-1.1.2.28 (through the development, harmonisation and optimal use of indicators and data for economic and innovation research) • INFRA-2008-1.1.2.29 (Developing improved access to historical archives and cultural collections for research purpose). • Call coming out this month (due mid-Feb) • Proposal will be made for RDC networking/remote access, data disclosure and metadata (Germany contact is Stefan Bender at IAB Nurnberg RDC) Open Data Foundation – IZA 2007/11
Conclusions • Metadata specifications available but need tools • Lost of complementary ongoing initiatives and potential synergies • Need coordination and partnerships (ODaF) Open Data Foundation – IZA 2007/11