190 likes | 327 Views
Instant Karma Collecting Provenance for AMSR-E. Beth Plale Director, Data to Insight Center Indiana University Helen Conover Information Technology and Systems Center, University of Alabama in Huntsville Joint AMSR-E Science Team Meeting June 2-3, 2010 Huntsville, AL.
E N D
Instant KarmaCollecting Provenance for AMSR-E Beth Plale Director, Data to Insight Center Indiana University Helen Conover Information Technology and Systems Center, University of Alabama in Huntsville Joint AMSR-E Science Team Meeting June 2-3, 2010 Huntsville, AL
Instant Karma: Applying a Proven Provenance Tool to NASA’s AMSR-E Data Production Stream PI: Michael Goodman, NASA MSFC • Improve the collection, preservation, utility and dissemination of provenance information within the NASA Earth Science community • Customize and integrate Karma, a proven provenance tool into NASA data production • Collect and disseminate provenance of AMSR-E (Advanced Microwave Scanning Radiometer – Earth Observing System) standard data products, initially focusing on Sea Ice • Engage the Sea Ice science team and user community • Adhere to the Open Provenance Model (OPM) • Evaluate current AMSR-E SIPS product generation 06/10 • Extend Karma provenance collection tools for SIPS 09/10 • Enhance Karma Provenance Browser interface 10/10 • Instrument AMSR-E Sea Ice production in Testbed 12/10 • Evaluate with Sea Ice science team 03/11 • Introduce Provenance Browser to NSIDC DAAC 06/11 • Instrument AMSR-E Sea Ice production in Ops 09/11 • Evaluate with AMSR-E Sea Ice user community 02/12 • Instrument other AMSR-E data streams 02/12 • Apply Karma to Sea Ice data production workflows • Customize Karma’s provenance dissemination user interface • Evaluate usefulness of provenance collected • Measure traffic to Karma Provenance Browser • Collect user feedback • Expand use of Karma to other AMSR-E data production streams Thorsten Markus, NASA GSFC; Beth Plale, Indiana University; Rahul Ramachandran, Helen Conover, UAHuntsville TRLcurrent= 7 TRLin= 7 11/09
Types of Provenance Information • Lots of information already available, but scattered across multiple locations • Processing system configuration • Dataset and file level metadata • Processing history information • Quality assurance information • Software documentation (e.g., algorithm theoretical basis documents, release notes) • Data documentation (e.g., guide documents, README files) • Instant Karma project aims to collate and organize information from multiple sources
Sea Ice Processing Flow and Dependencies Daily Processing Script Delivered Algorithm Package Sea Ice 6.25 km Sea Ice Algorithm Default Multi-year Ice Mask One day’s worth of Level-2A Tbs Sea Ice 12.5 km Ice Mask Snow Melt Mask Sea Ice 25 km Ice Mask Snow Melt Mask are running 5 day averages that are updated and replaced daily. Masks generated yesterday are used for today’s products. Mask files
Karma analysis tool suite and portal Karma provenance collection and representation = Optionally installed in future
AMSR-E daily processing workflow • Workflow executes once per day of input files received • Uses configuration files, data files, mask files • Invokes processes, programs, algorithms • Generates data files, images
Karma 3.0 architecture Instrumented apps Graph Viz client Query client Preserv client WS messenger Bus (future) Client Toolkit Client Toolkit Client Toolkit Axis 2 Prov Track lib XML events OPM 1.0 OPM 1.0 RDF Preservation object XML Xregistry (Optional) Axis 2 Axis 2 WSM other Synchronous ingest Web service Subscriber Interface (provenance listener) Query Service RESTful Service XMC Cat metadata catalog (optional) Prov Track lib Prov Track lib Notification Ingester Interface Knowledge discovery: Inferencing, quality, completeness Database Setup script Ingester Implementer Interface Relational store
Karma Architecture • Service Core • Bridge pattern for independent Ingester and IngesterImplementer implementation • Core components for ingesting notifications • Asynchronously shredding raw notifications to populate tables • Axis2 Web Service Layer • API layer to ingest notifications from clients’ push • Also allows another layer to ingest notifications by pulling from message bus • Axis2 Handlers • Gather information by intercepting SOAP message from host services • Minimal intrusiveness and lightweight instrumentation
Scavenging: for Stand-alone Provenance Collection • Collects provenance using scavenging • Use existing collection mechanisms • e.g., logging tool, auditing tool • Low burden on both users and programmers
Open Provenance Model (OPM) • Karma is generic and stand-alone • Not coupled to any particular system • Karma 3.0 Utilizes OPM v1.01 to represent provenance graph • OPM is a standard http://eprints.ecs.soton.ac.uk/16148/1/opm-v1.01.pdf • Enables provenance information exchange with other OPM-compliant tools
Types of Provenance Info (2) • [1] launches • Whom: user ID or name • What: service e.g., service URI • When: launch context, time • [2] consumes and [3] produces • File (e.g., file URL, owner) • Service: program, algorithm • version • [4] invokes • Invoking service • Invoked service • Parameters • Results/faults
Additional types of provenance Information Captured by Karma • Execution Status • Terminated or Failed • Transfer of Data • Sending of results • Receiving of results • Workflow and Program Lifecycles • Unknown Notifications • Stored as raw notifications • Forthcoming: Spatial and temporal information, simple and complex data values, quality information
Simple provenance graph for Sea Ice Brightness Files Brightness Files ….. Uses (role = input) Uses (role = input) SeaIce files Mask Files Uses (role = applyMask) hasQualityFlag hasPGEVersion
Sea ice file Provenance graph for sea ice product Mask file Input files
Provenance can be used to explain difference in images 1/28/2010 and 2/05/2010 as change in sea mask due to missing data
Example • The provenance visualization is obtained using a simulated Karma provenance database and in this use case its aim is to help scientist identify the mask file being used and provenance information about mask file. • The provenance graph gives the user annotated lineage about a sea ice data product: inputs required for its creation, the files created as a result of processing of the file. • Provenance visualization in this form allows for deeper examination. • e. g. : for a recurring error, the scientist can view all related provenance information to get to source of error.
Ongoing work • Better graph layout with detail for each data product and process used generating a sea ice product. • Give nodes different shape and color depending on whether input node or generated output node etc. • The user will be able to add annotations to edges by simply right clicking on them. Thus capturing semantic annotations to the existing causal dependencies. • Forthcoming: Spatial and temporal information, simple and complex data values, quality information • Provenance bundle archived with data or embedded in HDF file, in addition to Karma database
AMSR-E Provenance Use Cases • Browse provenance graphs : convey rich information about final data product details • Spatial location, time of observation, algorithms employed, quality propagation • Answer “Something isn’t right” question • Example illustrated earlier: did not receive data for several days so mask can be inaccurate. • Provenance “bundle” includes relevant science papers • New communication satellites interfere with NASA satellites for certain channels • Identify channels affected by RFI and channels used to generate each product