1 / 38

CHARMCATS: Harmonisation demands for source metadata and output management

This presentation discusses the aims of the CHARM CATS project, including functional requirements for the Question Database and Harmonisation Platform. It also explores the harmonisation process and the core elements of a harmonisation project.

lonnieg
Download Presentation

CHARMCATS: Harmonisation demands for source metadata and output management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CHARMCATS: Harmonisation demands for source metadata and output management CESSDA Expert Seminar: Towards the CESSDA- ERIC common Metadata Model and DDI3 Ljubljana, 09.11.2009 Alex Agache

  2. CHARM CATS • Cessda • HARMonization of • CATegories and • Scales • Markus Quandt (Team leader) • Martin Friedrichs (R&D, programming) • And CESSDA PPP - WP9 team (last slide) • Current Status: Prototype/desktop • Future: Online workbench Question Database & Harmonisation Platform

  3. Aims of this presentation • Functional requirements: CHARMCATS • Demands for source metadata (Portal & QDB) • Scenarios for feeding backenhanced metadata oncomparability Question Database & Harmonisation Platform

  4. Functional requirements: CHARMCATS Elements of the Metadata Model Question Database & Harmonisation Platform

  5. Harmonisation: Basic Scenario • A researcher wants to create comparative statistics for employment across European countries, year 2008 • (Hypothetical) classification on employment (ex - post) Harmonisation: make data from different sources comparable Question Database & Harmonisation Platform

  6. Harmonisation - Hydra • How to proceed? • Ouput = conversion syntax (e.g., SPSS, SAS) • What coding decisions were made? • Why these decisions were made? Targeted contributing users/research knowledge: • Experts in data issues • Expertsin comparative measurement [+ Question(naire) development] • Expertsin conceptual issues of measurements Question Database & Harmonisation Platform

  7. Core of Functional Requirements Publishing (ex-post) harmonisation = metadata on 3 working Steps => Harmonisation Project Question Database & Harmonisation Platform

  8. 3. Dimensions ? Employment status Employment regulation Cross country/Time universal 1 Employed full time 2 Employed half time 3 Self employed 4 Unemployed A. Conceptual Step: What to measure/harmonize? Core Elements of a Harmonisation Project 1. Concept = Define Employment 2. For what universes? 4.Define an (universal) Typology of Employment B. Operational Step: How to measure/harmonize? 5. Ideally = Country specific Indicators/Questions- functionally equivalent C. Data Coding Step: How to find and recode data? 6. Reality = Country and Datasetspecific Variables/Questions 7. Classification = HarmonizedVariable Question Database & Harmonisation Platform

  9. CHARMCATS: (3) Data-Coding Step Question Database & Harmonisation Platform

  10. Summary: Metadata in CHARMCATS (1) • HP Components: harmonized Classification, Scales, Indexes • Study components: Variables, Questions, Universes, etc. • Type of HP: depending on completness Type of retrieved metadata: Functionality: • Ex-ante output & Ex-post harmonisation • Support creation of harmonisation routines • Support data users in undestanding datasets Question Database & Harmonisation Platform

  11. Summary: Metadata in CHARMCATS (2) Standard Format: • Sources (expected): DDI2/3 • CHARMCATS: DDI3 Location source: • CESSDA Portal • Question Database (QDB) • User‘s Studies in DDI2/3 .xml Question Database & Harmonisation Platform

  12. 2. Demands for source metadata • CESSDA Portal (Studies incl. Variables/Questions) • Studies: 3383 • Variables: 160.644(incl. doublettes) • Variableswith Question Text: ca. 85% • Variables with Labels or Frequencis: ca. 95% Question Database & Harmonisation Platform

  13. Required but not necessary Required Input Elements for CHARMCATS • Question and variables connected to concepts • Metadata from comparative studies by design • Identification of variables and questions measured ex-ante as part of harmonized measurement instruments within a study • Context information attached to variables • Elements linked/tagged via Thesaurus (ELSST) • Contextual databases (aggregate level) • Bias: conceptual, methodological (and data) • Validity of specific source variables/questions (e.g., psychometric inf.; cog. interviews) Question Database & Harmonisation Platform

  14. Summary: Required QDB Metadata • Literal question text + answer categories • English translation • Multiple questions (Q. batteries) • Position in + link to original Questionnaire • Study context Nice to have: • Concept tagging • Methodological Information • ‚Proven Standard‘ Scales/Questions(e.g., Life satisfaction, post- materialism) Question Database & Harmonisation Platform

  15. Vision: CHARMCATS_QDB Services • Search/online access • Questions used in both applications -> question (questionnaire) development - > ex-ante harmonisation • CHARMCATS: users offer information on comparability of questions • QDB: supports comparability analysis • QDB: similarity matching-> commonality wheights Question Database & Harmonisation Platform

  16. Scenarios for feeding backenhanced metadata oncomparability - Starting points for discussion - Question Database & Harmonisation Platform

  17. First phase: Charmcats will ‚read‘ metadata from CESSDA holdings but not write back • Subsequent stages: write to other serves or expose for searches through standardized interfaces material into the CESSDA infrastructure What material? Question Database & Harmonisation Platform

  18. Core metadata on comparability • Groups of harmonized variables • Harmonized variables in form of partial datasets • Coding routines • Functional equivalent questions-variables (Universe/Concepts-Dimensions) • International Standard Classifications and Scales • Degrees of comparability (charmcats) + ? Commonalityweights (QDB) Question Database & Harmonisation Platform

  19. Proposal for group discussions (Tuesday) Thought experiments: • Additional Metadata created in charmcats on quality of harmonized measures • Use Case ISCED-97: Working Steps • Re-use/Impact of inf. on data coding (meas. error) in data analysis • Additional Metadata = prior inf. in Bayesian analysis • DB on quality of measurements 2. Interim solution for using DDI2/3- via a linking Shell <- Charmcats/QDB Question Database & Harmonisation Platform

  20. Additional Information Web: www.cessda.org and of PPP Docs: • Bourmpos, Michael; Linardis, Tolis (with Alexandru Agache, Martin Friedrichs, and Markus Quandt) (2009, September): D9.2 Functional and Technical Specifications of 3CDB. • Hoogerwerf, M. (2009): Evaluation of the WP9 QDB Tender Report. • Krejci, Jindrich; Orten, Hilde and Quandt, Markus (2008): Strategy for collecting conversion keys for the infrastructure for data harmonisation, http://www.cessda.org/ppp/wp09/wp09_T93report.pdf • Quandt, M., Agache, A., & Friedrichs, M. (2009, June). How to make the unpublishable public. The approach of the CESSDA survey data harmonisation platform. Paper presented at the NCESS 5th International Conference on e-Social Science, 24th – 26th June 2009, Cologne. Accessible at:http://www.ncess.ac.uk/resources/content/papers/Quandt.pdf Forthcoming: • Friedrichs, M., Quandt, M., Agache, A.The case of CHARMCATS: Use of DDI3 for publishing harmonisation routines. 1st Annual European DDI Users Group Meeting: DDI - The Basis of Managing the Data Life Cycle, 4th December 2009. Question Database & Harmonisation Platform

  21. WP 9, CESSDA-PPP • Nanna Floor Clausen (DDA) • Maarten Hoogerwerf (DANS) • Annick Kieffer (Réseau Quetelet ) • Jindrich Krejci (SDA) • Laurent Lesnard (CDSP) • Tolis Linardis (EKKE) • Hilde Orten (NSD) http://www.cessda.org/project/index.html http://www.cessda.org/project/doc/wp09_descr2.pdf Question Database & Harmonisation Platform

  22. Proposal for group discussions (Tuesday) Thought experiments: • Additional Metadata created in charmcats on quality of harmonized measures • Use Case ISCED-97: Working Steps • Re-use/Impact of inf. on data coding (meas. error) in data analysis • Additional Metadata = prior inf. in Bayesian analysis • DB on quality of measurements 2. Interim solution for using DDI2/3- via a linking Shell <- Charmcats/QDB Question Database & Harmonisation Platform

  23. Group Discussion: Harmonisation/Comparable Data Thought experiments: • Harmonisation Platoform – (Additional) Metadata - Quality of harmonized measures Use Case ISCED-97: Working Steps: • Additional Metadata = measurement error • Re-useImpact of inf. on data coding (meas. error) in data analysis b. Additional Metadata = Priors in Bayesian analysis c. DB on quality of measurements Question Database & Harmonisation Platform

  24. Example of harmonisation on education: ISCED-97 with ESS Round 3 data Scenario: • Data ESS 03 (2006): 10 European country samples • Same source variables: country specific education degrees • Two variants of reclassification into ISCED-97: • ESS team harmonized variable: EDULVL • WP9.2 harmonized variable Other coding into ISCED of the same data (not considered here): Schneider, 2008 Question Database & Harmonisation Platform

  25. Classification on Education: ISCED - 97 1. Conceptual Step • Conceptof education: Broad definition • Dimensions: Level of education, orientation of the educational program (general-vocational), position in the national degree structure • Universe: Initially developed for OECD countries • New variant of ISCED: 1997 Typologyresulting in 7 classes of education: 0. Pre-primary education 1. Primary education 2. Lower secondary 3. Upper secondary 4. Intermediate level 5. Tertiary education 6. Advanced training Source: OECD (2004) Question Database & Harmonisation Platform

  26. 2. Operational Step ISCED-97 Mapping (ISCED Manual): • Guidelines on measurement in survey research? (proposals year 2000<) • Problems in coding: Codings for respondents with educational certificates received before 1997/data collection- little information on coding procedures • The hydra not visible here: how does a specific educational certificates measure themultiple and interelated dimensions Question Database & Harmonisation Platform

  27. 3. Data Coding: Result of Mapping/Coding Question Database & Harmonisation Platform

  28. Storing within a database the 2 harmonized variables • 2 different target variables - same classification • Both target var use the same source var and same operational and conceptual • Calculated „Agreement“ between two outputs of coding- same classification: 2 different harmonized variables • Kappa= 0.67 • Other measures for quality of coding/reliability? (e.g., ICCs) • Ignore or consider when using one of the harmonized variables? Question Database & Harmonisation Platform

  29. Next slides: Impact on data analysis – „Quality“ of coding (Reliability) 2 basic Status attainment models • Without measurement erorr • With measurement error • Quality of coding: how to relate to Validity? Question Database & Harmonisation Platform

  30. ISCED R‘s ISCED R‘s ISCED father ISCED father Age Age Gender Gender Household Income Household Income Harmonisation metadata: active reuse in data analysis ESS data 2006: Respondents aged 25-64 Model Specification: ISCED Error = 0 /Erro= .33 (1- Reliability) Germany without error: .354(.03) with error: .581(.04) without error: 276.091 (28.387) with error: 414.83(41.93) Norway without error: .33(.03) with error: .44(.04) without error: 206.304 (29.08) with error: 288.22(39.42) SEM notation: Covariances and residuals not shown(Unstandardized estimates) Question Database & Harmonisation Platform

  31. R‘s Education- > Income Mean = 286.966 Example: Bayesian SEM Analysis with ISCED ESS data, 2006: Norway, Repondents aged: 24-65 • Bayesian approach • Test of Hypothesis (probability of a hypothesis being true given the data) • Use of priovous published/expert knowdledge in the field for specifying informative priors on specific parameters of a model • Few but rising applications with cross-national data Posterior p = .50 MCMC samples = 82.501 Question Database & Harmonisation Platform

  32. DB: Quality of measurements DB Harmonisation Reliability: Aggregated across similar harmonizations/different country data sets Validity of harmonized/latent variables Comparability- measurement equivalenceAnalysis results New DB on Quality of compartive measurements Model SpecificationPriors on specific parameters DBExpertknowledge:Guidelines Question Database & Harmonisation Platform

  33. DB on Quality of measurements: User likeability • Currently: • low incentives for researcher to publish new findings on validity of measurements in an ‚open access‘ database (before and after publications in journals) • mostly likelihood statistical methods employed Ioannidis (2005): Why most published research findings are false: • „The hotter a scientific field (with more scientific teams involved), the less likely the research findings are to be true” • - New regulations for registering data: (e.g,International Standard Randomised Controlled Trial Number in U.K: www.isrctn.com) • Future potential: • Self contributions from research groups (e.g., ESS, EVS, ISSP) • Meta-analysts (avoiding publication bias) • Contribution of bayesians Question Database & Harmonisation Platform

  34. Questions? Thanks to: CESSDA, WP9 team ISCED-97 Coding: Annick Kiefer; Vanita Matta Question Database & Harmonisation Platform

  35. Proposal for group discussions (Tuesday) Thought experiments: Interim solution for using DDI2/3- via a linking Shell <- Charmcats/QDB Question Database & Harmonisation Platform

  36. Repository A Repository B Repository C CHARMCATS Application QDB Application Other App CESSDAHoldings T-Shell DDI2 & DDI3 Use of V1 for Harmonization purposes V1 V2 V3 V4 V999 DDI2 Registry V1 V4 Request for V1 +V4 CESSDA Portal Thought experiment 2: DDI2/3 linking Shel only DDI3 Question Database & Harmonisation Platform

  37. Commonality weights (c.w.) Scenario: • Example C.W. = 0- 100 (weight for similarity or probability belonging to ad hoc comparability group x) • Search by different criteria (XXX) • Similarity matching algorithm provides c.w. (example: Lewensthein Algorithm) • Learning algorithm! • Bayesian prediction Question Database & Harmonisation Platform

  38. Contributors and Source data is requiered for intitial implementaiton -> QDB Any Conclusions Question Database & Harmonisation Platform

More Related