500 likes | 615 Views
National Science Foundation. Making the Case for Metadata at SRS-NSF. Division of Science Resources Statistics. Jeri Mulrow, Geetha Srinivasarao, and John Gawalt FedCASIC Workshops, BLS March 17, 2010 National Science Foundation Division of Science Resources Statistics
E N D
National Science Foundation Making the Case for Metadata at SRS-NSF Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John Gawalt FedCASIC Workshops, BLS March 17, 2010 National Science Foundation Division of Science Resources Statistics www.nsf.gov/statistics/
National Science Foundation 1984 Division of Science Resources Statistics
National Science Foundation 1,984 Division of Science Resources Statistics
National Science Foundation 1 Division of Science Resources Statistics
National Science Foundation 1 9 Division of Science Resources Statistics
National Science Foundation 1 9 8 Division of Science Resources Statistics
National Science Foundation 1 9 8 4 Division of Science Resources Statistics
National Science Foundation Today’s Talk Division of Science Resources Statistics • A bit about SRS • Historical perspective of data and metadata dissemination • Metadata users and their metadata needs • Standardization efforts • Challenges and future vision
National Science Foundation A bit about the Division of Science Resources Statistics (SRS) Division of Science Resources Statistics • Federal Statistical agency within NSF • 11 periodic data collections on the • U.S. Science and Engineering enterprise • Data dating back to the 1950s
National Science Foundation Historical Perspective of SRS data and metadata dissemination Division of Science Resources Statistics • 1950s – early 1990s paper only • Detailed statistical tables with • minimum metadata as footnotes • Publications included • Highlights about the survey • Scope and method of survey • Questionnaire • Cover letters
National Science Foundation Example -- 1950s publication Division of Science Resources Statistics
National Science Foundation 1990’s thru 2000’s Division of Science Resources Statistics • 1992 – electronic format • Detailed statistical tables in spreadsheets • with minimum metadata as footnotes • Kept paper, added electronic text • Survey Methodology, Limitations to the data, • Definitions, Historical revisions, List of tables • PDF added Questionnaire, Cover letters, • Instructions
National Science Foundation Example --1993 PDF Division of Science Resources Statistics
National Science Foundation Example – 1991 Electronic spreadsheet Division of Science Resources Statistics
National Science Foundation Example – 1991 text Division of Science Resources Statistics
National Science Foundation Today Division of Science Resources Statistics • Source data tables in Excel with footnotes • HTML / PDF • Highlights of the survey • Links to references • Survey description • PDF • Survey Questionnaire • Instructions • Definitions
National Science Foundation Example – 2007 Excel spreadsheet Division of Science Resources Statistics
National Science Foundation Example -- 2007 SIRD1 Division of Science Resources Statistics
National Science Foundation Example – 2007 HTML Division of Science Resources Statistics
National Science Foundation Example – 2007 PDF Division of Science Resources Statistics
National Science Foundation BUT THAT’S NOT ALL Division of Science Resources Statistics • Electronic databases • Create and download your own customized aggregate tables • Public use files • Access to some microdata series
National Science Foundation Division of Science Resources Statistics
National Science Foundation Metadata in WebCASPAR …. Division of Science Resources Statistics
National Science Foundation Metadata in WebCASPAR Division of Science Resources Statistics • Variable specific metadata available under Infolink • Metadata not tightly integrated with the data itself – does not get downloaded with the data
National Science Foundation WebCASPAR Taxonomy Division of Science Resources Statistics • Survey specific taxonomies • NCES IPEDS Classification of Instructional program codes (CIP) • Integrated taxonomy for querying across surveys • http://webcaspar.nsf.gov/
National Science Foundation Division of Science Resources Statistics
National Science Foundation Division of Science Resources Statistics
National Science Foundation Metadata in SESTAT Division of Science Resources Statistics • Metadata Explorer is separate from the data • Individual variable information • Description • Question • Domain/Availability – history • Valid response categories • Keywords • Metadata is not tightly integrated with the data itself – it does not get downloaded with the data https://sestat.nsf.gov/sestat/sestat.html
National Science Foundation Example -- Public Use file Division of Science Resources Statistics
National Science Foundation Example -- Public Use file Division of Science Resources Statistics
National Science Foundation Summary – Where are we? Division of Science Resources Statistics • Different surveys have evolved differently • Varying levels of details/metadata • Not in an standardized structure • Hodge-podge
National Science Foundation Division of Science Resources Statistics Metadata Users & Their Metadata Needs • Not a one-to-one relationship, but many-to-many • They occur at all stages of the survey process
National Science Foundation Division of Science Resources Statistics Survey Process Define Scope Define research objectives Choose mode of collection Choose sampling frame Construct and pretest questionnaire Design and select sample Develop Sample Design Develop Survey Instrument Recruit and measure sample Collect Data Code and edit data Process Data Make postsurvey adjustments Disseminate Data Performanalysis Source: Survey Methodology (2009) Groves, Fowler, Couper, Lepkowski, Singer & Tourangeau.
National Science Foundation Define Scope Division of Science Resources Statistics Users Metadata Data User General Survey Manager Topic Subject Matter Expert Population of interest Statistician Other data sources Survey Methodologist Specific Respondent Frame options Sample design options Historical info/data User needs Federal Register notices
National Science Foundation Develop Survey Instrument Division of Science Resources Statistics Users Metadata Data User Questions Survey Manager Answer choices Subject Matter Expert Definition of terms Statistician Instructions Survey Methodologist Logic flow of questions Respondent Cognitive work Validity assessments Reliability assessments Functionality testing Alternative questions Instrument design specs – paper, web, CATI
National Science Foundation Develop Sample Design Division of Science Resources Statistics Users Metadata Data User Population of interest Survey Manager Sampling frame / Universe specs Subject Matter Expert Update schedule Statistician Sample design specs Desired criteria Sample selection techniques Historical information on performance of designs Estimation methods
National Science Foundation Collect Data Division of Science Resources Statistics Users Metadata Data User Variable names and formats Survey Manager Variable data types Subject Matter Expert Physical storage Statistician Tables and relationships Database Administrators Mapping of questions to Software Developers variables and definitions Logic flow of questions Response rates over time Paradata Cover letter
National Science Foundation Process Data Division of Science Resources Statistics Users Metadata Data User Item response rates Survey Manager Zero vs. null vs. missing Subject Matter Expert Edit specifications Statistician Imputation specifications Database Administrators Recode specifications Software Developers Data table specifications Changes across survey cycles
National Science Foundation Data Dissemination and Publication Division of Science Resources Statistics Users Metadata Data User History of changes Survey Manager Methodology report Subject Matter Expert Public use files with Statistician documentation Database Administrators Author/contact source Software Developers Who can access what Archivist Type of product Content format URL; Keywords Relationships Metadata schema
National Science Foundation Who are the Metadata Users? Division of Science Resources Statistics • Data users • Basic & advanced Analysts • General public • Respondent • Survey Manager • Survey Methodologist • Statistician • Subject Matter Expert • Software Developer • Database Administrator • Archivist
National Science Foundation Need for Standardization of Metadata is Apparentis Critical Division of Science Resources Statistics
National Science Foundation Standardization Efforts Division of Science Resources Statistics • Dublin Core • SDMX (aggregate level) • DDI 3.0 (record level)
National Science Foundation Recent SRS Efforts Division of Science Resources Statistics • Data Repository (Oracle) • Inclusion of some metadata • SAS/ACCESS User Interface for internal users • Evaluating external user interfaces
National Science Foundation SRS Efforts -- Working with Commercial Contractors Division of Science Resources Statistics • Requirements for Data / Metadata delivery • Examples document • Standardcontracting language • Checklist
National Science Foundation SRS AdoptedBasic Operating Procedures Division of Science Resources Statistics • Using Oracle to store microdata and metadata • Collecting metadata in whatever format • Keeping it all organized
National Science Foundation Challenges Division of Science Resources Statistics • Getting all the players on the same page • Many different users • Many different uses • Many different providers • Many different products • Many different formats • Cost • Keeping it all straight
National Science Foundation Near Future Vision Division of Science Resources Statistics Data and Metadata Data & Metadata Dissemination SRS Data Repository DDI 3.0, SDMX… Taxonomy Efforts Analytic tools
National Science Foundation Near Future Vision Division of Science Resources Statistics Paradata Data and Metadata Data & Metadata Dissemination SRS Data Repository DDI 3.0, SDMX… Taxonomy Efforts Analytic tools
National Science Foundation 1984 Division of Science Resources Statistics
National Science Foundation Thank you! Division of Science Resources Statistics