260 likes | 449 Views
DDI 3 Comparison Test-Case at ICPSR. Sanda Ionescu Documentation Specialist ICPSR. DDI 3 - Comparison “Research Questions”. How can we use DDI 3 to document comparability, and support data harmonization projects? Explore use of Comparative module (information coverage, functionality)
E N D
DDI 3 Comparison Test-Case at ICPSR Sanda Ionescu Documentation Specialist ICPSR
DDI 3 - Comparison“Research Questions” • How can we use DDI 3 to document comparability, and support data harmonization projects? • Explore use of Comparative module (information coverage, functionality) • Compare use of Comparative module and use of inheritance through grouping: are both methods as effective in capturing necessary information? • Can we build a tool to assist in documenting comparability and data harmonization in DDI 3? What would such a tool look like?
DDI 3 Comparison test-caseBackground • DDI 3 markup was applied to the “Adult Demographics” variables of three nationally representative surveys on mental health, integrated in the Collaborative Psychiatric Epidemiology Surveys (CPES): • The National Comorbidity Survey Replication (NCS-R) • The National Latino and Asian American Study (NLAAS) • The National Survey of American Life (NSAL) http://www.icpsr.umich.edu/CPES/
DDI 3 Comparison test-caseBackground • CPES studies : • Conducted individually but with comparison in mind. • May be analyzed independently. • NOT longitudinal design (all collected 2001-2003) • Comparison intended across populations, or subpopulations, of the USA: • NCSR – US national probability sample • NLAAS – target populations: Latino and Asian-American • NSAL – target populations: African-American and Afro-Carribean • Comparability could be documented using either group and inheritance, or the comparative module.
DDI 3 Comparison test-caseBackground • Choosing between use of Group/Inheritance or Comparison module • Comparison by design vs. post-hoc comparison: sometimes not a clear-cut distinction, suggesting possibility of using either method (?) • Important to know what are the practical implications of using either method – advantages, disadvantages, issues related to applying markup and/or processing: test by documenting the same example in both ways.
DDI 3 Comparison test-caseBackground • A typical harmonization process workflow was outlined based on an ongoing ICPSR project seeking to produce a harmonized dataset of ten U.S. family and fertility surveys, belonging to three different, but related, series of longitudinal data: • Growth of American Families, 1955 and 1960 • National Fertility Survey, 1965 and 1970 • National Survey of Family Growth, Cycles I-VI (1973, 1976, 1982, 1988, 1995, and 2002) (Integrated Fertility Survey Series – IFSS: http://www.icpsr.umich.edu/IFSS/)
DDI 3 Comparison test-case • Harmonization procedure: • Datasets are searched (by keyword or concept, if available). • Potentially comparable variables are selected. • Complete variable descriptions are extracted from existing documentation: • Variable name (and label) • Question text / textual description of variable • Physical representation (values, value labels, etc.) • Universe • Question context (preceding questions)
DDI 3 Comparison test-case • Harmonization procedure (continued): • Similarities/differences in listed elements are examined. • A harmonized variable is projected based on the findings in the step above (there are no fixed rules, this is done on a case-by-case basis). • A decision is made regarding the action on the component variables (recode, or simply add). • Statistical software commands are generated and applied to data to create new harmonized dataset.
DDI 3 Comparison test-case • Harmonized dataset is documented. • New variables description includes: • Information about source variables. • Information about aggregation procedure (recodes, etc.) • Information about similarities and differences in source variables compared with the harmonized one (usually in the form of a note).
DDI 3 Comparison test-case How does DDI 3 fit in the harmonization procedure? • When a harmonized dataset is being produced, documenting pairwise comparisons between source variables in DDI as an intermediary step (pre-harmonization) appears to be superfluous: • It does not assist in the decision-making process, which takes a more holistic approach, assessing candidate variables as a group • It would involve an expense of time and effort that would not be justified by its limited/transitory utility (since the harmonized variable would capture the comparability among sources anyway)
DDI 3 Comparison test-case How does DDI 3 fit in the harmonization procedure? • When a harmonized dataset is being produced, there is greater benefit in using the comparison module to document similarities and differences between the harmonized variable and each of its sources (post-harmonization) : • This kind of documentation is required by harmonization best-practices anyway • Information about the comparability among source variables may also be recreated by parsing their pairwise comparison with the harmonized one.
DDI 3 Comparison test-case • How does DDI 3 fit in the harmonization procedure? Post-harmonization: DDI 3 Documentation Individual studies Search Display Examine Harmonize data Document harmonized dataset and source comparison in DDI 3 Discover Analyze Display Disseminate
DDI 3 Comparison test-case How does DDI 3 fit in the harmonization procedure? • If a harmonized dataset is NOT being produced, then it is useful to document the comparability of “original” variables to assist data users in analysis. NO harmonization: DDI 3 Documentation Individual studies Search Display Examine Document comparability in DDI 3 Discover Analyze Display Disseminate
DDI 3 Comparison test-case How can a tool assist in documenting comparability in DDI 3 ? • (Projected) Tool: • Searches DDI documentation of individual studies with full variable descriptions • Allows narrowing down results to customized selection • Provides same page display of selected variables’ descriptions (ideally complete with concept and universe statements) • Search results are saved, and may be retrieved, to facilitate variables evaluation, decisions about harmonizing them, and ultimately help develop a translation table • Steps above available in ICPSR SSVD – Internal Search
DDI 3 Comparison test-case(Projected) Tool: Example customized selection
DDI 3 Comparison test-case • Potential/Projected Tool: • On the selected search results list, allows further pairwise selection and display of variables with full descriptions • Interactive feature allows user to flag as similar or different the elements in the variables descriptions • Based on the information entered in the step above, • DDI 3 Comparison module is created. • Elements flagged as similar or different are listed in the <Correspondence><Commonality> or <Correspondence><Difference> fields • The <CommonalityTypeCoded> element may be filled in an automated way based on the information entered above (all common=“identical”; some different=“some”; use of “none”?)
DDI 3 Comparison test-caseUse of the Comparison Module • The Comparison Module: Structure • Maps: Concepts, Variables, Questions, Categories, Codes, Universes. • MAP: SourceSchemeReference (M) TargetSchemeReference (M) Correspondence (M) ItemMap: SourceItem (M) TargetItem (M) Correspondence: Commonality (M) Difference (M) CommonalityTypeCoded (O, NR) CommonalityWeight (O,NR) UserDefinedCorrespProperty (O,R)
DDI 3 Comparison test-case • Used by ICPSR in CPES markup example: • Commonality • Difference Are mandatory. If the list of elements is structured and used consistently, may become machine-actionable, eliminating the need for the User Defined Correspondence (Should we enable an optional CV to allow interoperability? -Such a list would only apply to one type of map – variables, in our case) • CommonalityTypeCoded with the proposed CV: • Identical • Some • None
DDI 3 Comparison test-caseHTML view of Variable Map in DDI 3 Comparison Module
DDI 3 Comparison test-case • Using XSLT to (re)create the variables cross-walk from the pairwise comparisons: • If we compare sources with a harmonized variable, the latter will always be the “target”. • A -> H • B -> H • C -> H • In this case the crosswalk will be relatively easy to create.
DDI 3 Comparison test-case • Using XSLT to (re)create the variables cross-walk from the pairwise comparisons: • If we compare individual variables for analysis purposes, creating a cross-walk can become very difficult/labor intensive: • A->B • B->C • A->C • A->D • B->D • C->D • There is nothing in the discrete pairs to indicate their relationship; parsing done by multiple iterations results in duplications that need to be cleaned up; “source” and “target” denotations become irrelevant, but give the relationship a directionality which makes it more difficult to process
DDI 3 Comparison test-case • Recreating the variables cross-walk from the pairwise comparisons: • Same structure used for handling two different types of comparison (pre-harmonized and post-harmonized) • Do we need a different model / structure for comparing “original” (individual) variables ? • Or some additional element that would provide a key for the pairs needing to be linked? Explore possibility to use of ItemMap@alias? • Use a different solution than XSLT to create cross-walk? (more sophisticated programming may be needed to capture complex relationships)
DDI 3 Comparison test-case • Use of Comparison Module: Questions/Comments • We normally include items (i.e., variables in our case) that have some degree of comparability. “None” would not be routinely used. • Use of CommonalityWeight is optional: a scale of weights would have to be defined • UserDefinedCorrespondenceProperty may replace CommonalityTypeCoded in user-specific cases • Map structure identical (except for codes) but items compared are organically different : not all elements are relevant in all maps. (For variables we find it necessary to list similar and different components of their description, but for universes, or questions, etc., comparison would be at a more conceptual level)
DDI 3 Comparison test-case • Use of Comparison Module: Questions/Comments Comparing non-harmonized variables: • Is there a rationale for documenting comparability between their components as well (in addition to flagging them as similar or different)? • The Comparison module does not provide links between items included in different maps, and the same item (question, universe, code scheme) may be used by multiple variables that are part of different mappings • The complete variable descriptions may be pulled from the Logical Product
DDI 3 Comparison test-case • Use of Comparison Module: Questions/Comments Comparing harmonized variables with their sources: • The GenerationInstruction sequence in Code Map allows referencing source variable(s) and may document the recodes performed to harmonize it. • This sequence mirrors the Coding:GenerationInstruction section in the Data Collection module. • Coding is Identifiable (may be referenced by the resulting variable), GenerationInstruction is not Identifiable (cannot be referenced).
DDI 3 Comparison test-case • Use of Comparison Module: Questions/Comments • Documentation of comparability is “dissociated” from individual variables descriptions • Could group+inheritance be a more effective way to capture both variable descriptions and their comparability, while at the same time allowing a complete description of individual datasets, including variables that have no comparable counterparts? • Test by documenting the same data in both ways – when V3.1 is published, to allow identification of variable Name (in some instances, the only element that changes)