1 / 10

ES 3 Takes the Provenance Challenge

This paper explores challenges in handling data lineage in environmental science research, comparing active lineage wrappers with passive probulators and discussing Probulator flavors, instrumentation, and passive monitoring approaches. Learn about the ES3 Lineage Architecture and probulating IDL. Example shell script included.

ull
Download Presentation

ES 3 Takes the Provenance Challenge

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ES3 Takes the Provenance Challenge JAMES FREW Donald Bren School ofEnvironmental Science and Management University of California, Santa Barbara frew@bren.ucsb.edu

  2. SCAfraction, Sierra Nevada, 07 Mar 2004

  3. Model structure: MODIS SCA / albedo

  4. From Wrappers to Probulators Wrappers: Active Lineage • + • Complete control over what gets recorded • Single language/API for all wrapped events • Not tied to execution • You can even lie about what happened • - • Must explicitly script everything • Scripts can drift from reality • You can even lie about what happened

  5. From Wrappers to Probulators Probulators: Passive Lineage • + • Record what actually happened • Not just what you think happened • Not what didn’t happen • Automatic: don’t have to write new scripts for everything • - • Different flavors for different environments • Can’t just do everything in {…insert favorite language here…}

  6. Probulator flavors • Instrumentation • Insert lineage capture instructions directly into science codes • e.g. “I just created file ‘foo’” • Typical implementation: preprocessor/precompiler • Overriding • Replace standard routines/libraries with lineage-capturing versions • e.g. open(…) → snoopy_open(…) • Typical implementation: modify execution environment • environment variables • configuration files • Passive monitoring • Trace program execution • e.g. “called open() with args = foo, bar, …” • Typical implementation: strace’d shell

  7. ES3 Lineage Architecture

  8. Probulating IDL: Instrumenting the code ;edit pro modscag_cleanse,prefix=prefix,ns=ns,nl=nl HELP, NAMES="*", OUTPUT=ES3_ENVIROMENT & ES3_LOG, $ ENTER="modscag_cleanse", ENVIROMENT=ES3_ENVIROMENT ; clean up {under,over}flow of MODSCAG run ; ; Input: prefix = prefix for all of the MODSCAG output filenames ; ns = number of samples ; nl = number of lines ; Output: rewrite of the MODSCAG files ; ; t.h.painter / 1.19.2005 ; open snow file ES3_openr,1,string(prefix,'snow.pic') snow=fltarr(ns,nl) readu,1,snow [ blah blah blah ] HELP, NAMES="*", OUTPUT=ES3_ENVIROMENT & ES3_LOG, LEAVE="modscag_cleanse", $ ENVIROMENT=ES3_ENVIROMENT END ; modscag_cleanse

  9. Probulating IDL: Results <init time="20050522T234606Z” pid="31002" stime="20050522T234604Z" pstime="20050522T234256Z" ppid="30920" language="idl" user="haavar" hostname="spitting-duck.bren.ucsb.edu"> <enviroment> <variable name="!PATH" value="/home/haavar/probulator//idl: /home/rsi/idl_6.1/lib/hook: […] </enviroment> <mount-points> <mount share="dab15:/ed15/rsi" type="nfs">/home/rsi</mount> </mount-points> </init> <enter region="modscag_cleanse"> <enviroment> <variable type="INT" name="NL" value="2"/> <variable type="INT" name="NS" value="2"/> […] </enviroment> </enter> <exec time="20050522T234610Z" routine="OPENR"> <io> <file read="true">/home/haavar/painter/data/tillsnow.pic</file> </io> </exec>]

  10. Probulating a shell script: Example • http://twiki.ipaw.info/bin/view/Challenge/ES3#Example

More Related