100 likes | 115 Views
This paper explores challenges in handling data lineage in environmental science research, comparing active lineage wrappers with passive probulators and discussing Probulator flavors, instrumentation, and passive monitoring approaches. Learn about the ES3 Lineage Architecture and probulating IDL. Example shell script included.
E N D
ES3 Takes the Provenance Challenge JAMES FREW Donald Bren School ofEnvironmental Science and Management University of California, Santa Barbara frew@bren.ucsb.edu
From Wrappers to Probulators Wrappers: Active Lineage • + • Complete control over what gets recorded • Single language/API for all wrapped events • Not tied to execution • You can even lie about what happened • - • Must explicitly script everything • Scripts can drift from reality • You can even lie about what happened
From Wrappers to Probulators Probulators: Passive Lineage • + • Record what actually happened • Not just what you think happened • Not what didn’t happen • Automatic: don’t have to write new scripts for everything • - • Different flavors for different environments • Can’t just do everything in {…insert favorite language here…}
Probulator flavors • Instrumentation • Insert lineage capture instructions directly into science codes • e.g. “I just created file ‘foo’” • Typical implementation: preprocessor/precompiler • Overriding • Replace standard routines/libraries with lineage-capturing versions • e.g. open(…) → snoopy_open(…) • Typical implementation: modify execution environment • environment variables • configuration files • Passive monitoring • Trace program execution • e.g. “called open() with args = foo, bar, …” • Typical implementation: strace’d shell
Probulating IDL: Instrumenting the code ;edit pro modscag_cleanse,prefix=prefix,ns=ns,nl=nl HELP, NAMES="*", OUTPUT=ES3_ENVIROMENT & ES3_LOG, $ ENTER="modscag_cleanse", ENVIROMENT=ES3_ENVIROMENT ; clean up {under,over}flow of MODSCAG run ; ; Input: prefix = prefix for all of the MODSCAG output filenames ; ns = number of samples ; nl = number of lines ; Output: rewrite of the MODSCAG files ; ; t.h.painter / 1.19.2005 ; open snow file ES3_openr,1,string(prefix,'snow.pic') snow=fltarr(ns,nl) readu,1,snow [ blah blah blah ] HELP, NAMES="*", OUTPUT=ES3_ENVIROMENT & ES3_LOG, LEAVE="modscag_cleanse", $ ENVIROMENT=ES3_ENVIROMENT END ; modscag_cleanse
Probulating IDL: Results <init time="20050522T234606Z” pid="31002" stime="20050522T234604Z" pstime="20050522T234256Z" ppid="30920" language="idl" user="haavar" hostname="spitting-duck.bren.ucsb.edu"> <enviroment> <variable name="!PATH" value="/home/haavar/probulator//idl: /home/rsi/idl_6.1/lib/hook: […] </enviroment> <mount-points> <mount share="dab15:/ed15/rsi" type="nfs">/home/rsi</mount> </mount-points> </init> <enter region="modscag_cleanse"> <enviroment> <variable type="INT" name="NL" value="2"/> <variable type="INT" name="NS" value="2"/> […] </enviroment> </enter> <exec time="20050522T234610Z" routine="OPENR"> <io> <file read="true">/home/haavar/painter/data/tillsnow.pic</file> </io> </exec>]
Probulating a shell script: Example • http://twiki.ipaw.info/bin/view/Challenge/ES3#Example