1 / 15

Documenting Research Project Process for Reproducibility

Documenting Research Project Process for Reproducibility. Larry Hoyle Institute for Policy & Social Research University of Kansas. The challenges. Large (or complex) multi-disciplinary projects Multiple sites, data streams, standards, and practices Complex data preparation procedures

astra
Download Presentation

Documenting Research Project Process for Reproducibility

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Documenting Research Project Process for Reproducibility Larry Hoyle Institute for Policy & Social Research University of Kansas Dagstuhl Presentationn 2012 - Larry Hoyle

  2. The challenges • Large (or complex) multi-disciplinary projects • Multiple sites, data streams, standards, and practices • Complex data preparation procedures • Point and click software used • Documenting as overhead Dagstuhl Presentationn 2012 - Larry Hoyle

  3. Example Project • Farmer's land use decisions related to climate change (e.g. biofuel related crops) • One component of larger NSF grant • Multiple teams, multiple universities • The two main sites are 135 km apart • Multi-disciplinary • Economists, geographers, agronomists, biologists, engineers, climate scientists, anthropologist, sociologist, political scientists, urban planner, GIS experts, photographer Dagstuhl Presentationn 2012 - Larry Hoyle

  4. Example Project Data • Develop substantial geodatabase (ARC SDE) • ground cover, soils, crop statistics, facility locations (e.g. purchaser, processing plant). Weather, climate, watershed and aquifer models, • Sub-(farmer’s) field geographic level • Climate models at different scales • Focus groups and multi wave survey (geocoded) • Interviews coded in NVIVO (geocoded) • Photographs • Large proprietary dataset with time-limited use Challenge - put it all together and document how it was done and how everything relates. Other example: Iassist posting Dagstuhl Presentationn 2012 - Larry Hoyle

  5. Spatial Aspects • Reconciling different spatial schemes at multiple scales across time • Raster images, • model grids at different scales, • weather point sources, other point locations (e.g. biorefineries), • political entity polygons (state, county), • farm field and sub-field polygons, • Attribute data at all these levels, imputed and aggregated data • Harmonizing data from different geographic schemes • Producing new spatial objects • E.G. corners as separate from circle with center-pivot irrigation Dagstuhl Presentationn 2012 - Larry Hoyle

  6. New Polygons Polygons to be extracted from remote sensing imagery Subfield areas sometimes grow different crops (corners are 21% of the square) Dagstuhl Presentationn 2012 - Larry Hoyle

  7. Need to Capture Process Example 1 • Project member with expertise volunteered to process data to produce a spatial dataset (soils data). • Users of the dataset discover anomalies • Expert no longer available, can’t remember quite what he did and has no documentation (used point and click tools) • Ouch Dagstuhl Presentationn 2012 - Larry Hoyle

  8. Process Example 2 • Qualitative analysis • Transcription • Multiple coders, common coding scheme • Coding scheme evolves (capture this?) • Training • Paired coders code each interview • Testing of coder reliability • Integrate this after the fact with geodatabase Dagstuhl Presentationn 2012 - Larry Hoyle

  9. Point and Click • Some tools are only point and click and don’t create a log. • E.g. Some procedures in ArcGIS • How do you document process • Screen capture pasted into Word? • Action recording software • Discoverable? Machine actionable? Dagstuhl Presentationn 2012 - Larry Hoyle

  10. An ArcGIS process (different project) NSFCHEMAnnualDataProcedure.docx AnnualLinksByTime4.avi Dagstuhl Presentationn 2012 - Larry Hoyle

  11. Need Tools • There is a need for tools built on top of standards that make it easy to capture and annotate process Dagstuhl Presentationn 2012 - Larry Hoyle

  12. Need Tools to Capture ProcessOne example – SAS Enterprise Guide • Can modify nodes during development. • Can run the process from any point • But – overall process may involve multiple tools - in this case also R and ArcGIS. In other cases, multiple people in different settings. • Scott Long - The Workflow of Data Analysis Using Stata • http://www.indiana.edu/~jslsoc/web_workflow/wf_home.htm Datasets – Permanent and temporary Dagstuhl Presentationn 2012 - Larry Hoyle

  13. Capturing Process as it is Being Developed • False starts and blind alleys • Does the whole process matter or only a process that reproduces the final result? (learn from my mistakes?) • Description of process gets edited as it evolves • Adding minimal overhead • If the tool requires a lot of attention it won’t get used. • Combining sub-processes • Filling in pieces of overall planned project • Parallel parts • Time as ordinal or interval (or ratio?) Dagstuhl Presentationn 2012 - Larry Hoyle

  14. Tools – The Fantasy • Annotated screen capture – works on top of any software • Text (or audio/video?) annotation • Dealing with IP in captured images • Flow diagram with popups? • Editable • Time stamped Sub process edited separately Persistent identifiers allow (re-)linking Planned overall process Dagstuhl Presentationn 2012 - Larry Hoyle

  15. Final thoughts • Metadata for the audience • Documentation for reproducibility • Documentation in cases of disputed results • Sometimes the researcher is the audience • One researcher commented that having documentation at this level would be very helpful in writing methods sections of papers. • Teaching tool - critique students process • Assists refining methods • Also useful in future similar projects Dagstuhl Presentationn 2012 - Larry Hoyle

More Related