230 likes | 367 Views
Assembling Large, Multi-Sensor Climate Datasets Using CVO. Brian Wilson, Gerald Manipon, Tom Yunck , and Zhangfan Xing Jet Propulsion Laboratory. Do multi-instrument science by authoring a dataflow doc. for a reusable operator tree. Access scientific data by naming it.
E N D
Assembling Large, Multi-Sensor Climate Datasets Using CVO Brian Wilson, Gerald Manipon, Tom Yunck, and Zhangfan Xing Jet Propulsion Laboratory Do multi-instrument science by authoring a dataflow doc. for a reusable operator tree. Access scientific data by naming it.
Large-Scale Data Fusion • Find Level-2 datasets • Space/time granule query for multiple EOS (“A-Train”) instruments – AIRS, AMSR-E, AMSU, MODIS, Cloudsat, GPS • Co-locate retrievals using space/time metadata • Instantaneous “matchups” in space & time • Read the data • Temperature, water vapor, quality flags, radiances (HDF) • Understand the data • Units, quality control (non-trivial !!), etc. • Publish merged (matchup) products • Temperature & water vapor from GPS, AIRS, & AMSU • Determine instrument biases, understand by stratifying • Publish multi-sensor “fused” products • Improve AIRS retrievals by introducing temperature information from highly accurate GPS refractivity
CVO Software Layers • CVO web application (visual programming) • Guide user in configuring & executing a science analysis workflow • Client works in a browser, pre-configured for various users • SciFlo workflow engine • Server side: executes “backing” workflow (XML) document • Metadata in XML, science data in HDF/netCDF (binary) • Python & command-line glue • Wrap Fortran/C/C++ operators into workflow • Workflow execution available from python or Unix command line • Analysis & visualization operators • Fortran, C, IDL, Matlab, NCAR_Graphics • Algorithms now available as workflow steps & callable services! • Merged (“matchup”) datasets • Merged data in netCDF (simpler than HDF)
CVO Software Layers CVO Builder and underlying services & operators. VizFlow Authoring GUI CVO Builder GUI Matchup & Analysis Services Plotting Services Published Workflows XML Workflows Executed on CVO SciFlo Node SciFlo Web Services Webify To Slice Data Arrays GeoRegionQuery Unix Command Line Python Glue Fortran, C/C++ Operators Metadata Database Python Operators IDL, Matlab Operators Libraries: python, graphics Registered Science Data Files: Remote URL’s or Local Cache = remotely callable service = browser GUI client app. = local codes & data
CVO Browser GUI Configured for GPS-AIRS Matchup
CVO / SciFlo Executing GPS-AIRS Matchup Flowchart Workflow Results
“SciFlo” Workflow Engine • Automate large-scale, multi-instrument science processing by authoring a dataflow document that specifies a tree of executableoperators or services. • VizFlow Visual Authoring Tool (AJAX GUI in browser) • Distributed Dataflow Execution Engine (in python) • Data Grid: Move data “granules” to the operators using FTP/HTTP, or slice variables using OpenDAP URLs . • Compute Grid: Move operators (executables) to the data. • Built-in reusable operators provided for many tasks such as subsetting, co-registration, regridding, data fusion, etc. • Custom operators easily plugged in by scientists. • Publish algorithms as remotely-callable Web Services and then orchestrate services in easily authored workflow. • CVO web app:Guide user in selecting and providing inputs to matchup & analysis workflows.
VizFlow Flowchart GPS-AIRS Matchup & Temp. Profile Comparison • Connect a series of services and operators into a dataflow • Drag services/operators from menu, and drop onto the canvas • Lay out the flowchart by moving nodes • Connect the input/output ports by drawing lines • User guided by matching up port names and types
Each SciFlo processing step is one of: Template for XML (or string) generation REST (http GET) call: e.g. WMS/WCS, DAP URLs SOAP service call: “have WSDL, will call” XPath 2.0 transformation for XML mediation XQuery 1.0 query/transformation Command-line script or executable Python function call Scientist’s custom IDL or MATLAB script Other (What do you need?) Service/Operator Orchestration
Assemble operators by writing XML document Connect SOAP/REST data query/access services to custom, executable algorithms written by scientists IDL, MATLAB, or python codes can become operators SciFlo Engine automatically: Generates web (HTML) form to call the flow Publishes custom flow as a new web service (if desired) Create many Web Services Automatically No glue or SOAP code to write Only write science algorithms, in language of choice Publish Analysis Flows Exchange SciFlo documents Generated products have lineage & user annotations SciFlo As an Authoring Toolkit
CVO/SciFloNetwork Compute & Data grid AIRS NCAR CVO Node Goddard DAAC AIRS UCAR GPS AIRS Science Team NOAA AMSU JPL CVO Nodes (2) JPLGPS
Full SciFloNetwork (More universities coming). U. Michigan Ohio State [NASA Goddard DAAC] UCLA NASA Langley DAAC JPL DAAC JPL GPS, AIRS, & MISR Science Areas U. Alabama Huntsville
www.opendap.org Use a one-line query URL to retrieve a slice of a variable grid from a netCDF or HDF file anywhere in the world Binary wire protocol for fine-grained data transfer OpenDAP URL http://gen-dev.jpl.nasa.gov/genesis/cgi-bin/dods/nph-dods/genesis/data/airs/L2/20030113/airx2ret/AIRS.2003.01.13.171.L2.RetStd.hdf?TAirStd(1:3, 3:6, 4:17) OpenDAP Servers netCDF, HDF, GRIB, FreeForm, JGOFS, other file formats Easy to implement another server OpenDAP clients Matlab and IDL, any web browser Python (pydap or SciFlo) Open Data Access Protocol
Webificationserver: http://w10n.jpl.nasa.gov Drill down into a “deep web” of science data. Simple URL’s to get metadata, slice variables, etc. Lighter weight than OpenDAP, mostly in python HDF group supporting fast HDF5 server in C Support multiple file formats: HDF, netCDF, FITS, GRIB, etc. Returns multiple formats: XML, HTML, JSON, netCDF NetCDF Example http://w10n.jpl.nasa.gov/test/data/nc/coads_climatology.nc (download netCDF file) http://w10n.jpl.nasa.gov/test/data/nc/coads_climatology.nc/ (file metadata) http://w10n.jpl.nasa.gov/test/data/nc/coads_climatology.nc/SST/ (variable metadata) http://w10n.jpl.nasa.gov/test/data/nc/coads_climatology.nc/SST[0:2,45:55,85:95] (slice variable) Webify (Zhangfan Xing)
Multi-Instrument Atmospheric Science AIRS/GPS Co-Registration: Point to Swath Matchup Carbon Cycle AIRS Level2 Swaths over Pacific GPS Level2 Profile Locations
AIRS / GPS Matchups AIRS/GPS Temperature & Water Vapor Comparison Plots
A SciFlo Dataset is: Specified as a space/time query over collections of data products (or retrieved physical variables) GeoRegionQuery(DataProduct, TimeRange, LatLonRegion) GeoRegionQuery(PhysicalVariable, TimeRange, LatLonRegion) Realized as a list of object ID’s or URI’s (permanent names) GeoRegionQuery returns unique objectIds along with geolocation metadata Accessed using a list of URL’s pointing to on-line replicas of the data objects (files). FindDataById(objectIds) URLs (ftp, http, or OpenDAP) Translate unique object ID’s into list of on-line locations in DataPools or any SciFlo node DataPools & SciFlo P2P network are “crawled” to update distributed translation tables Or query ECHO metadata repository SciFlo network is a distributed cache for scientific datasets Space/Time Query in SciFlo
“Smart” Data Grid • Register data collections • Crawl GPS & AIRS/AMSU datasets & extract spatial bbox • Recognize AIRS granules: AIRS.2003.01.02.004.L2.RetStd.hdf • Space/time matchups: GPS point to AIRS/AMSU swath • Perform matchups by spatial lookup of AIRS granules • Save matchup indices • Move & cache data files • Using three AIRS products: L2.RetStd, L2.Support, L1b radiances • Workflow uses cached file or auto-caches remote file • Generate merged products • Desired GPS & AIRS variables in netCDF files • Register merged products as new “recognized” dataset • Run statistics workflows for monthly/seasonal/yearly statistics • Publish merged products, aggregate statistics, plots, etc.
Each SciFlo client/server node is multi-functional: Provides pre-configured SOAP services (e.g. GeoRegionQuery) Serves data via an OpenDAPserver, ftp, and soon Webify Provides a Redirection server: translate objectID -> file URL’s Contains metadata in a relational database (mysql) Contains an XQuery-able XMLdocument store (dbxml) Executes SciFlo documents (dataflow execution engine) Serves flow results on private & shared web pages (wiki) SciFlo Software Bundle All Open Source, Push-Button Install on Linux & Windows Installable by each user, root/admin privileges not required One install provides pre-configured: SOAP services, OpenDAP server, redirection, ftp, mysql, dbxml, dataflow engine, & wiki. Personal Data Center for each scientist Electronic scientific notebook (personal, configurable) Collaborate by sharing wiki pages & exchanging SciFlo docs. SciFlo is Multi-Purpose
SOAPpy – SOAP client & server ElementTree – XML parsing, pseudo-XPath lxml – XML parsing Xpath 1.0 Twisted, openssl – secure web server pyldap, openldap– authorization, roles mysql – relational database Sleepycatdbxml – XML database w/ XQuery 1.0 and XPath 2.0 scipy, numpy, matplotlib, basemap Other scientific libraries with python bindings wiki dojo AJAX library – client dev. Google maps widget, Google Earth animations OpenDAP-- fine-grained data access, “drill down” into files OpenID– simple user credentials Parts of Globus v4 – For Grid Virtualization Globus Security Infrastructure (GSI) GridFTP Open Source in the SciFlo Bundle
More features for CVO browser GUI Choose variables by generic names or by product names Populate more analysis & visualization operators AIRS/AMSU forward model, seasonal-to-yearly trend plots, etc. Extend bias analysis GPS-AIRS comparisons over entire AIRS mission GPS-AMSU comparisons for several NOAA satellites Stratify bias trends by lat/lon, season, day/night, scene Publish merged products for use by community GPS-AIRS & GPS-AMSU variable matchups GPS-AMSU comparisons for several NOAA satellites Documentation and User Guides CVO user guide, installation, security setup (OpenID) Publish observed bias trends and operator algorithms Dual publication: science papers refer to CVO tech. paper CVO 2nd Year Plans