1 / 79

Data Documentation Initiative: A global phenomenon coming soon to ABS

lamond
Download Presentation

Data Documentation Initiative: A global phenomenon coming soon to ABS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Data Documentation Initiative: A global phenomenon – coming soon to ABS! Wendy Thomas Chair, DDI Technical Implementation Committee 1 September 2009

    2. Acknowledgments Slides provided for use by: Wendy Thomas Pascal Heus Mary Vardigan Peter Granda Nancy McGovern Jeremy Iverson Dan Smith

    3. What is Metadata? Common definition: Data about Data Provide descriptive information about of an object or concept Properties, characteristics (in XML: elements and attributes) It does not alter the content or nature of the object It can be carried around without having to share the underlying object: catalogs, cars, libraries, etc. It is usually public domain (important for sensitive data)Provide descriptive information about of an object or concept Properties, characteristics (in XML: elements and attributes) It does not alter the content or nature of the object It can be carried around without having to share the underlying object: catalogs, cars, libraries, etc. It is usually public domain (important for sensitive data)

    4. Managing data and metadata is challenging! Many actors & communities with different needs and perspectives Users: want open access to high quality and well documented data. Need discovery tools. Public sector, private sector, academics Producers: prepare the data and need to comply with privacy laws Data Archives: need to interface with both communities Policy Makers: need data to measure results and impact and to plan ahead Sponsors: want to support the most relevant data collection Public and Media: want access to simple, easy to understand statistics Solving Information management issues is what ICT & XML are forMany actors & communities with different needs and perspectives Users: want open access to high quality and well documented data. Need discovery tools. Public sector, private sector, academics Producers: prepare the data and need to comply with privacy laws Data Archives: need to interface with both communities Policy Makers: need data to measure results and impact and to plan ahead Sponsors: want to support the most relevant data collection Public and Media: want access to simple, easy to understand statistics Solving Information management issues is what ICT & XML are for

    5. Summary of ABS Metadata Management Principles Life-cycle focus Data supported by accessible metadata Metadata available and useable in context of client’s need Registration authority for metadata element Clear identification, ownership, approval status of metadata elements Describe metadata flow Reuse metadata Capture at source Capture derivable metadata automatically Ensure cost/benefit of metadata Variations from standards tightly documented Make metadata active to the greatest possible extent

    6. NISO: A FRAMEWORK OF GUIDANCE FOR BUILDING GOOD DIGITAL COLLECTIONS Nice community document worth adoptingNice community document worth adopting

    7. Some major XML metadata specifications for data content management Statistical Data and Metadata Exchange (SDMX) Macrodata, time series, indicators, registries http://www.sdmx.org Data Documentation Initiative (DDI) Microdata (surveys, studies), aggregate, administrative data http://www.ddialliance.org ISO/IEC 11179 Semantic modeling, concepts, registries http://metadata-standards.org/11179/ ISO 19115 Geography http://www.isotc211.org/ Dublin Core General resources (documentation, images, multimedia) http://www.dublincore.org This is a set of specifications for socio-economic data When it comes to implementation, these are complemented with commonly used ICT specifications such as the XML family of recommendations, SOAP, OASIS WS-* security specifications, SVG, etc.This is a set of specifications for socio-economic data When it comes to implementation, these are complemented with commonly used ICT specifications such as the XML family of recommendations, SOAP, OASIS WS-* security specifications, SVG, etc.

    8. Metadata provides support for: Survey and data collection preparation Data collection Data processing Analysis Data discovery and access Replication Repurposing (secondary data use or data products)

    9. Metadata Metadata is essential information for research and reuse of data The further data gets from its source, the greater the importance of the metadata Content is critical Structure is becoming increasingly important in a networked world

    10. Why Standards? Standards provide structure for: Accurate transfer of content between systems Increased automation of ingest, reducing costs Interoperability between systems and software Structural base for discovery and comparison

    11. Example: Dublin Core Print card catalogs Standalone databases WorldCat and Google Static stationary Proprietary structure Little cross-site searching Standardized content Cross-site searching

    12. Interacting Standards for Data Dublin Core ISO/IEC 11179 ISO 19115 – Geography Statistical Packages METS PREMIS SDMX DDI Citation structure Coverage Temporal Topical Spatial Location specific information

    13. Interacting Standards for Data Dublin Core ISO/IEC 11179 ISO 19115 – Geography Statistical Packages METS PREMIS SDMX DDI Structure and content of a data element as the building block of information Supports registry functions Provides Object Property Representation

    14. Interacting Standards for Data Dublin Core ISO/IEC 11179 ISO 19115 – Geography Statistical Packages METS PREMIS SDMX DDI i.e., ANZLIC and US FGDC Focus is on describing spatial objects and their attributes

    15. Interacting Standards for Data Dublin Core ISO/IEC 11179 ISO 19115 – Geography Statistical Packages METS PREMIS SDMX DDI Proprietary standards Content is generally limited to: Variable name Variable label Data type and structure Category labels Translation tools used to transport content

    16. Interacting Standards for Data Dublin Core ISO/IEC 11179 ISO 19115 – Geography Statistical Packages METS PREMIS SDMX DDI Digital Library Federation Consistent outer wrapper for digital objects of all type Contains a profile providing the structural information for the contained object

    17. Interacting Standards for Data Dublin Core ISO/IEC 11179 ISO 19115 – Geography Statistical Packages METS PREMIS SDMX DDI Preservation information for digital objects

    18. Interacting Standards for Data Dublin Core ISO/IEC 11179 ISO 19115 – Geography Statistical Packages METS PREMIS SDMX DDI Developed for statistical tables Supports well structured, well defined data, particularly time-series data Contains both metadata and data Supports transfer of data between systems

    19. Interacting Standards for Data Dublin Core ISO/IEC 11179 ISO 19115 – Geography Statistical Packages METS PREMIS SDMX DDI Version 3.0 covers life-cycle of data and metadata Data collection Processing Management Reuse or repurposing Support for registries Grouping & Comparison

    20. Metadata Coverage Dublin Core ISO/IEC 11179 ISO 19115 Statistical Packages METS PREMIS SDMX DDI [Packaging] Citation Geographic Coverage Temporal Coverage Topical Coverage Structure information Physical storage description Variable (name, label, categories, format) Source information Methodology Detailed description of data Processing Relationships Life-cycle events Management information

    21. The Data Documentation Initiative (DDI) International XML based specification Started in 1995, now driven by DDI Alliance (30+ members) Became XML specification in 2000 (v1.0) Current version is 2.1 with focus on archiving (codebook) New Version 3.0 (2008) Focus on entire survey “Life Cycle” Provide comprehensive metadata on the entire survey process and usage Aligned on other metadata standards (DC, MARC, ISO/IEC 11179, SDMX, …) Include machine actionable elements to facilitate processing, discovery and analysis

    22. Intent of DDI Design Facilitate point-of-origin capture of metadata Reuse of metadata to support: Consistency and accuracy of metadata content Provide internal and external implicit comparisons Support external registries of concepts, questions, variables, etc. Metadata driven processing Provide clear paths of interaction with other major standards

    23. Basic Structures DDI 3 used a model similar to SDMX in terms of the following: Indentifiable, Versionable, and Maintainable objects The use of multiple schemas to describe different process sub-sections in the life-cycle Use of schemes to facilitate reuse of common materials

    24. DDI: Full content coverage for survey and administrative data Conceptual coverage Methodology Data Collection Processing – cleaning, paradata Recoding and derivations Variable and tabular content Internal relationships Physical storage Data management

    25. Plus: Relationships between studies Comparison by design Study series can inherit from earlier metadata Capture changes only Data integration Mapping of codes between source and target Capture comparison information Comparison of abstract content models Publication of reusable materials (code schemes, concept schemes, geographic structure, etc.)

    26. Current Areas of DDI Development Controlled vocabularies to improve machine actionability Data collection methodology and process expansion for more depth and detail Qualitative data Increased comparison coverage Tools

    27. DDI 3.0 Metadata Life Cycle Data and metadata creation is not a static process: It dynamically evolved across time and involves many agencies/individuals DDI 2.x is about archiving, DDI 3.0 focuses on the entire “life cycle” 3.0 emphasizes metadata reuse to minimize redundancy and discrepancies, support comparison, and drive the data and metadata creation process Supports multilingual, grouping, geography, and registries 3.0 is extensible

    28. When to capture metadata? Metadata must be captured at the time the event occurs! (not after the facts) Documenting after the facts leads to considerable loss of information This is true for producers and researchers The first figure outlines the various stages of a survey production process The graph on the right illustrates the amount of metadata that is typically recovered if the knowledge capture occurs after the fact (blue line) versus the amount of metadata actually generated throughout the process (red line)The first figure outlines the various stages of a survey production process The graph on the right illustrates the amount of metadata that is typically recovered if the knowledge capture occurs after the fact (blue line) versus the amount of metadata actually generated throughout the process (red line)

    29. Reuse DDI is designed around schemes (lists of items) for commonly reused information within a study such as categories, code schemes, concepts, universe, etc. Items are “used” in multiple locations in a DDI document by referencing the item in the list Enter once, use in multiple locations Items can be versioned for management over time without having to change content in multiple locations

    30. Comparison and Registries Information in DDI schemes can be published in external registries and used by multiple studies Provides implicit comparison both within a study and between studies Supports organizational consistency through the use of agreed content managed in registries Referencing structured lists provides further context to individual items used in a study

    31. Metadata driven processing Capturing metadata upstream can provide over 90% of the building blocks needed to generate the remainder of the metadata DDI supports imbedding command code to run data processing events driving data capture, data processing during after collection, and to support post-collection recoding, derivations, and harmonization maps

    32. Questions to Variables

    33. Working with other standards There is no single standard that does it all DDI was specifically designed to support easy interaction with: Dublin Core – mapping of citation elements and imbedding native Dublin Core ISO/IEC 11179 – working with an editor of the standard to reflect data element model and ISO/IEC 11179-5 naming conventions for registry intended items

    34. Standards continued SDMX – DDI NCubes were revised to incorporate the ability to attach attributes to any area of a cube and map cleanly into and out of SDMX cubes. SDMX has added means of attaching fragments of DDI which provide source and processing information that can be indexed and delivered through SMDX tools. ISO 19115 (ANZLIC) – Geographic elements in DDI are structured to reflect basic discovery elements used by geographic search engines and provide the detailed geographic structure information needed by GIS system to incorporate the data accurately

    35. DDI does not replace good content DDI structures metadata to leverage content Collection and processing Discovery and access Analysis and repurposing Registries Comparison DDI is not a software application Supports and informs software applications DDI is a neutral archival structure Preserving content and relationships

    36. Value Supports consistent use concepts, questions, variables, etc. throughout organization Supports implicit comparison through reuse of content Supports explicit comparison by mapping content between studies and to standard content Retention of explicit relationships between data collection and the resulting data files Early capture of a broad range of metadata at point of creation

    37. Value - continued Interoperability Flexibility in data storage Reuse of element structures Strong data typing Improved data mining between and across systems Improved access to detailed metadata

    38. DDI User Base Archives and data libraries worldwide Catalogs Data delivery Documentation delivery from data systems Research Institutes/Services Data Centers Documentation for data Data search and analysis systems Data management systems International Organizations and National Statistical Agencies Data collection and management

    39. Archives and Data Libraries (examples) Catalogs ICPSR Data Catalog and Social Science Variable Database CESSDA Data Portal The Dataverse Network (former Virtual Data Collection) Data delivery California Digital Library “Counting California” National Geographic Historical Information System Documentation delivery Survey Documentation and Analysis (SDA) Data Liberation Initiative Metadata Collection

    40. Research Institutes/Service Data Centers (examples) Documentation for data German Microcensus (GESIS) Institute for the Study of Labor (IZA) US General Social Survey (NORC) Data search and analysis systems Nesstar Canadian Research Data Centres (RDC’s) Data management systems Questionnaire Development Documentation System (University of Konstanz/GESIS)

    41. Current DDI Products at ICPSR Most existing products currently in DDI 2.1 with new additions moving to DDI 3 DDI-XML variable-level codebooks output as PDF files for downloading by users DDI-XML metadata records created initially by data depositors and edited by ICPSR staff to augment content and include additional fields Increasing use of DDI for special projects: Social Science Variables Database, various harmonization and data processing tasks

    42. Potential Use of DDI 3 at ICPSR Information collected from data producers in pre-collection phase – Concept Metadata output from CAI applications – Data Collection Processor‘s dashboard – Metadata Processing Metadata mining: New faceted search tool to facilitate discovery through more precise searching – Data Discovery Relational database for comparison and harmonization across studies – Repurposing

    43. Potential Use of DDI 3 at ICPSR - 2 Use of DDI in combination with other metadata standards, e.g., Dublin Core, MARC, PREMIS Beginning of FEDORA “object-centered“ implementation concepts into data processing and data preservation strategies Processor‘s dashboard – Data Processing Relationships of study object to file object DDI 3 as “wrapper“ for all ICPSR metadata?

    46. SSVD – The Public Search First batch of variable-level description files uploaded into SSVD: Approx. 3,500 DDI files (one file per dataset), representing Approx. 1,300 ICPSR studies (approx. 18.5 percent of total ICPSR holdings, excluding US Census; approx. 30 percent of holdings with data and setups) Over 1,000,000 individual variable descriptions; 23,000,000 categories

    47. SSVD – The Public Search New database finalized Fall 2008 Built to match DDI 3.0 data model Both DDI 2.x and DDI 3.0 compliant Designed to accept both DDI 2.x and 3.0 input and produce output in both versions ICPSR version currently uploads DDI 2.1 and generates DDI 3.0 individual variable descriptions. DDI 3 AS EXPORT FORMATDDI 3 AS EXPORT FORMAT

    48. SSVD – The Public Search Moving forward… Transition to automated DDI upload DDI uploaded at the time of study publication First quality check performed by study processing staff Acceptable DDI immediately released for public view Problematic DDI suppressed from public view for further review, and upgrade as appropriate

    51. IPUMS at MPC Did not use DDI because DDI 2 cannot handle translation tables Currently in the process of mapping DDI 3 codebook output from IPUMS database Importing DDI 2 files from Microdata Toolkit into processing, validation, and harmonization system

    52. NHGIS Contains historical aggregate data from population, housing, agricultural, and economic censuses as well as BEA data from 1790 to 2000 Runs from DDI 2 nCube descriptions Searches variables, identifies related nCube tables, determines geographic availability Generates data subsets with geographic links to objects in NHGIS shape files, and shape files

    53. Future Plans Funding has been obtained to improve search and extraction system Current limitations of the system reflect limitations of DDI 2 Moving to DDI 3 will support: broader cross survey searching identification of common dimensions between NCubes over time support harmonization instructions as well as common transformations such as calculation of medians

    54. International Organizations and National Statistical Agencies International Household Survey Network (IHSN) Major international organizations involved Coordination of activities Adopted DDI 1/2.x as standard Developed the Microdata Management Toolkit and related tools / guidelines http://www.surveynetwork.org Accelerated Data Program (ADP) World Bank / Paris 21 Implement IHSN activities in developing countries Task 1. Documentation and dissemination of existing survey microdata. Has introduced DDI in national statistical agencies in over 50 countries http://www.surveynetwork.org/adp

    55. INDEPTH/DSS Example 38 Demographic Surveillance Sites in 19 countries spanning Africa, South Asia, Central American and Oceania Diverse yet similar health research portfolios Data management goals: Standardize and harmonize data collection tools Cross-site comparability of information Sharing data effectively and efficiently

    56. Reasons for choosing DDI “It will be ideal to describe our data for the purposes of the Data Repository” “It has really powerful features that will enable us to standardise several facets of our work.” “I originally underestimated the usefulness DDI will have as a means to harmonised data collection between sites.” Ability to expand comparison and harmonization with additional groups such as AIDS research team

    57. Statistical Agencies BLS considering publication of category and coding standards supported by BLS such as NAICS, SOC etc Statistics Canada considering publishing concept schemes in DDI 3 for use by the research community DDI is becoming more widely used for survey and census collection in developing countries (primarily Africa)

    58. MQDS Version 1 Extracted metadata from Blaise data model as XML tagged data Provided user interface for selection of Blaise files Instrument questions and sections Types of metadata to extract Languages to display Style sheet for generation of instrument documentation or codebook

    59. Using MQDS V1 XML: Codebook in Five Languages

    60. MQDS Version 1 Limitations XML not DDI-compliant DDI Version 2 did not have XML tags for all metadata provided by Blaise Did not provide easy means of adding XML tags without becoming noncompliant XML files for complex surveys can be very large (text files) Entire files had to be processed in computer memory Limited ability to fully automate documentation

    61. DDI Version 3 Included extensions proposed by DDI working group on instrument design

    62. MQDS Version 3 Joint SRC and ICPSR venture Goals: Address version 2 limitations Process Blaise instrument of any size Exploit new elements and validate to the recently released DDI version 3 standard Move from processing XML metadata in memory to streaming metadata to a relational database

    63. MQDS Version 3 Relational Database: Import, Export, Transform

    64. MQDS Version 3 Relational database DDI compliant standardized tables Flexibility for SRC and ICPSR to add extensions that meet their specific organizational needs Allows Automated documentation of any Blaise survey instrument Importing and documenting data produced by other software Lower cost development of other tools that facilitate editing and disseminating data

    67. Colectica Feature Overview

    68. Current Focus: Data Collection

    69. Survey Design: Diagram Visually design survey instruments Drag items from the toolbox

    70. Survey Design: Item Editor Edit item details using friendly input forms

    71. Multilingual Support All text fields can be represented in multiple languages

    72. Concept Repository Use built-in or custom concept banks to describe survey items Useful for comparability

    73. Question Repository Share questions across studies Drag previously-used questions or sequences onto new instruments

    74. Import Existing CAI Code Import from: Blaise® CASES CSPro Support for additional languages can be added

    75. Generate CAI Source Code Currently support CASES Blaise® and CSPro coming soon Support for additional CAI systems can be added

    76. Generate Publishable Documentation Generate codebooks and diagrams Output to HTML and PDF

    77. Also: Study Concept & Design Basic support for Study Concept & Design documentation

    78. Generate DDI 3.1

    79. Additional Information Beta available now Web: http://www.colectica.com/ Email: contact@algenta.com

    80. Thank you DDI Alliance http://www.ddialliance.org Wendy Thomas wlt@pop.umn.edu

More Related