1 / 34

DDI The Movie 2: Applications of the Architecture (early draft)

DDI The Movie 2: Applications of the Architecture (early draft). By I-Lin Kuo. Table of Contents. Modules and Instrument Documentation The Variable Ontologies and Tagging. Modules and Instrument Documentation. Chapter 1. Suggested Approach to Instrument Documentation.

leiko
Download Presentation

DDI The Movie 2: Applications of the Architecture (early draft)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DDI The Movie 2: Applications of the Architecture (early draft) By I-Lin Kuo

  2. Table of Contents • Modules and Instrument Documentation • The Variable • Ontologies and Tagging

  3. Modules and Instrument Documentation Chapter 1

  4. Suggested Approach to Instrument Documentation • METS has an extremely well-designed structure map which describes the logical structure of its objects of interest. See http://www.loc.gov/standards/mets/presentations/METSIntro2.ppt • Basically, a skeleton of a structure is created which then contains pointers to items. See next slide for the recipe for building METS.

  5. Building a METS Document: 5 key aspects • Expressing the Structure • Linking Structure with Content • Linking Structure with Descriptive Metadata • Linking Structure and Content Files with Administrative metadata • Not covered: Linking behaviors with structures.

  6. Suggested Approach to DDI Instrument Documentation • I recommend that the DDI adopt a similar approach • Create an instrument structure map for each instrument • Link the structure with content contained in <InstrumentItem> • Examples of <InstrumentItem> would be <SimpleQuestion>, <GridQuestion>, <QuestionGroup>, <Computation>, <FlowCheck>, <InterviewerInstr>, etc. • Link the structure and content with display behavior • This approach has the advantage of allowing questions etc. to be reused in different instrument structure maps. This would be useful in a study with separate male and female questionnaires, for example. • I also think (haven’t thought this through completely) that this allows a clean separation between question content and question display. Thus, a multi-mode survey would have identical structure maps linked to different display behavior.

  7. The Variable Chapter 2

  8. DDI 2.0 variable 4.3 var* (ATT == wgt, wgt-var, weight, qstn, files, vendor, dcml, intrvl, rectype, sdatrefs, methrefs, pubrefs, access, aggrMeth, measUnit, scale, origin, nature, additivity, temporal, geog, geoVocab, catQnty) 4.3.1 location* (ATT == StartPos, EndPos, width, RecSegNo, fileid, locMap) 4.3.2 labl* (ATT == level, vendor, country, sdatrefs) 4.3.3 imputation? 4.3.4 security? (ATT == date) 4.3.5 embargo? (ATT == date, event, format) 4.3.6 respUnit? 4.3.7 anlysUnit? 4.3.8 qstn* 4.3.9 valrng* 4.3.10 invalrng* 4.3.11 undocCod* 4.3.12 universe* 4.3.13 TotlResp? 4.3.14 sumStat* (ATT == wgtd, wgt-var, weight, type) 4.3.15 txt* (ATT == level, sdatrefs) 4.3.16 stdCatgry* (ATT == date, URI) 4.3.17 catgryGrp* 4.3.18 catgry* 4.3.19 codInstr* 4.3.20 verStmt* 4.3.21 concept* (ATT == vocab, vocabURI) 4.3.22 derivation? 4.3.23 varFormat? (ATT == type, formatname, schema, category, URI) 4.3.24 geoMap* (ATT == URI, mapformat, levelno) 4.3.25 catLevel* (ATT == levelnm) 4.3.26 notes*

  9. DDI 2.0 var major components • Variable type: @wgt, @intrvl • Reference: @qstn, @wgt-var, @files, @sdatrefs, @methrefs, @pubrefs • Descriptive: <notes>, <universe>, <txt>, <concept>, <derivation>, <qstn>, <geoMap> • Provenance: <verStmt> • Sampling/Measurement: <imputation>, <respUnit>, <anlysUnit>, • Logical Encoding: <valrng>, <invalrng>, <undocCod>, <catgry>, <catgryGrp>, <stdCatgry>, <codInstr>, <catLevel>, <varFormat> • Statistics: <TotlResp>, <sumStat> • Security/Access: <security>, <embargo> • Physical description: @rectype, <location> • Other: @vendor, • Note: some elements and attributes straddle several concerns. In that case, I just picked one.

  10. Some problems of 2.0 • No recoding documentation • One variable, one question • Question contained within variable • No virtual recodes

  11. 3.0 restructuring goals for the variable • Standardize the usage of elements such as security, etc. so that they may be machine-actionable • Standardize the naming of elements and attributes • Reduce redundancy so there is only one way to markup • Compatibility with ISO11179 conception of the variable • Compatibility with statistical tools conception of variable • Compatibility with MetaDater concept of Question/variable • More sophisticated recode documentation • Better documentation of question flow in instrument documentation • More complete classification of variable types • Systematic handling of variable referencing • Support of virtual recodes

  12. 2.0 Classification of variable types • We’ll start with this as this is relatively easy • 2.0 already has attributes wgt, wg-var, qstn but more are needed for a richer [machine-actionable] typology

  13. 3.0 Classification of variable types • Types is actually a misnomer. These should be treated as labels rather than types because they are not exclusive • Raw/question (codes come directly from questions) – this will probably be affected by the ongoing discussion on question typology at DDI-ID • Recodes • Weight • Attrition • Key • Imputation Flag • Time/geog? • Continuous/discrete • Aggregated • Nominal|ordinal|interval|ratio • Virtual recode – a “variable” for display purposes only without corresponding data, such as a continuous variable displayed as a discrete variable • dropped • [Nonexistent] intermediate – an intermediate variable used only for calculation, without data or display. [Nonexistent] instrument – an artifact of the instrument, without data or display • … incomplete

  14. Referencing • Variables should reference their applicable weight variables, or vice versa • Imputation flags should reference their corresponding variables • Variables might need to reference attrition variable in some cumulative dataset • Recodes will need to reference questions, computations, and other variables in their recode descriptions • Directionality of the references remains to be decided

  15. Machine-actionable consequences • Identification of keys enables complex files functionality • Weight, imputation flag, and attrition references may allow statistics to be intelligently calculated on the fly

  16. General approach to compatibility • By compatibility with statistical tools (SPSS, SAS, STATA), we mean that we should be able to do a round-trip from a setup file  DDI  setup file with no loss of information. • It is not realistic to expect as a 3.0 deliverable 3 XSLT stylesheets which transform DDI SPSS, SAS, or Stata setup files. • It may also be possible to have stylesheets which convert from SPSS and SAS proprietary XML formats to DDI, which perform the round-trip without loss of information. This is dependent on whether or not the DDI is rich enough to contain all the info. • By compatibility with ISO11179 and MetaDater, we will suggest a standard way in which <var> may be marked up.

  17. FILE HANDLE DATA / NAME="data-filename" LRECL=66. DATA LIST FILE=DATA / STANUM 8-9 QTYPE 13 VARIABLE LABELS STANUM 'State ID' / QTYPE 'State or National precinct' / VALUE LABELS STANUM 2 'Alaska' / QTYPE 1 'State' 2 'National' / The simple excerpt from an SPSS setup file at left can be round-tripped even with DDI 2.0: Data List column info goes in <location> Variable labels go into var.txt Value labels go into <catgry> More analysis is needed to see what is necessary for round-tripping for the SPSS xml format and/or more complicated setup files. Achim is familiar with the xml. Compatibility with statistical tools: SPSS

  18. Compatibility with statistical tools: Stata _column(8) int STANUM :STANUM %2f "State ID" _column(10) int PRECINCT %3f "Sample precinct number" _column(13) int QTYPE :QTYPE %1f "State or National precinct" _column(16) int BACKSIDE :BACKSIDE %1f "Backside completion flag" _column(17) float WGT %6.3f "Respondent weight" label define STANUM 2 "Alaska" ; label define QTYPE 1 "State" 2 "National" ; • Int/float map to DDI 2.0’s <varFormat>. Q: are all stata’s types map-able into DDI types? • Does “%6.3f” map to DDI? If not, we need to add a place for it. • The notation :STANUM indicates that perhaps formats/categories may be shared by different variables. If this is true, then <catgry> would have to be moved out of <var> • More analysis needed. I’m not too familiar with stata.

  19. PROC FORMAT; VALUE STANUM 2='(2) Alaska' ; VALUE QTYPE 1='(1) State' 2='(2) National' ; INPUT STANUM 8-9 QTYPE 13 LABEL STANUM = 'State ID' QTYPE = 'State or National precinct' FORMAT STANUM STANUM. QTYPE QTYPE. PROC FORMAT map to DDI <catgry> INPUT maps to DDI <location> LABEL maps to DDI var.txt FORMAT associates each variable with a coding format. Multiple variables may be associated to the same format. This will not work with 2.0 for the same reason 2.0 cannot associate multiple variables with the same question. Thus, <catgry> needs to be taken out of <var> for 3.0 Compatibility with statistical tools: SAS

  20. Compatibility with MetaDater • Still a lot of reading yet to do on this one….

  21. Compatibility with ISO11179 • Harmonization steps based on Dan Gilman’s 2003 presentation http://www.iassistdata.org/conferences/2003/presentations/ • Goal: seek to harmonize with ISO11179 at the variable model level so that DDI may be used as a transport/exchange format for ISO11179.

  22. ISO/IEC 11179 - Core Model corresponds to DDI 2.0 tag/concepts … Data Element Concept Conceptual Domain pointer ISO11179 ontology or concept registry <concept> and/or <universe> Ontologies also do not exist in DDI 2.0 Conceptual pointer Representational Data Element Value Domain <var> <catgry> <concept> … <catgry> Variables Values However, the catgry.concept does not exist in DDI 2.0

  23. ISO11179 Harmonization Steps • 3.0 harmonization with the ISO11179 model on previous slide • Move <catgry> out of <var>, as different data elements may point to the same value domain. This is not possible if value domain is contained within data element. • Add a <concept> to <catgry> or some means of pointing to the reference domain. • Add a way of pointing to an ontology or registry from the <concept>. This will be explained in the section on “Ontologies”

  24. Additional analysis needed • Changes in the structure for the variable have to be analyzed for its impact on other concerns: • Nested categories • N-Cubes

  25. Overall restructuring plan • Need to identify those components which are intrinsic to a variable and those which are extrinsic or may be shared between variables • Intrinsic: type(wgt, derivation, txt), <recode> • Extrinsic: <sumStat>, <TotlResp> • Shared: <qstn>, <catgry>, <security>, <embargo>, <verStmt> • Extrinsic and shared elements need to be moved out of <var> • Elements necessary for compatibility with other standards need to be added.

  26. Ontologies and Tagging Chapter ?

  27. Rel-tag microformat • Problem: How can we associate keywords to a web page? • Old solution: “meta” keywords in an html page • 2005 solution: rel-tag microformat, popularized by the technorati blog aggregator to allow blog authors to tag content to aid the technorati search engine. • This isn’t the same as the DDI problem but the solution is instructive.

  28. Rel-tag microformat details • Example: <a href="http://technorati.com/tag/tech" rel="tag">technology</a> • The last segment of the path – “tech” – is the tag • The preceding part – http://technorati.com/tag -- is the space which knows what to do with the tag • “technology” is the visible part of the tag • ‘rel=“tag”’ identifies this as a rel-tag rather than a normal anchor • See http://microformats.org/wiki/reltag or google for more details

  29. DDI ontology problem • Problem: How can we associate words in DDI markup to controlled vocabularies or ontologies such as Madeira, ICPSR social science thesaurus, or ISO11179 concept registry? • Note that the rel-tag microformat already contains 75% of what we need: • The authority • The space • The tag = the keyword • So we can probably modify this to suit our needs

  30. Examples <var> <concept> <a href=http://www.icpsr.umich.edu/socSciThes/crime rel=“ddi”>crime</a> </concept> </var> <catgry> <concept><a href=http://data-archive.ac.uk/ISO11179/marital+status” rel=“ddi>marital status</a></concept> <catValu>3</a> <labl>never been <a href=http://data-archive.ac.uk/Madeira/marriage” rel=“ddi”>married</a> </labl> </catgry>

  31. Rel-tag flexibility • Note that tags can occur anywhere and are not restricted to <concept> • The visible part does not have to match the keyword • Different ontologies may be used simultaneously

  32. Applications of rel-tags • ISO11179: Rel-tags plus the variable restructuring suggested in the previous chapter “The Variable” give the DDI variable a compatibility with the ISO11179 data element/variable model • Comparative data search: Rel-tags provide a way to implement the “upward-pointing” to a controlled vocabulary that Wendy and Jostein talked about last week. This implementation does not conflict with the variable-variable link mechanism needed for Reto. • Madeira: rel-tags allow Madeira to mark up individual words in <catgry>

  33. Shortcoming • As currently used, rel-tags do not allow for nested tags.

  34. Summary • DDI should look into rel-tags or some variant to be used with ontologies

More Related