160 likes | 268 Views
2005 December 7. 2005 Fall AGU IN31B-1146. Building from Where We Are: Web Services and Java-Based Clients to Enable Virtual Observatories. Robert M. Candey (Code 612.4) (Robert.M.Candey@nasa.gov) Reine A. Chimiak (Code 583) David B. Han (Code 586) Bernard T. Harris (Code 583)
E N D
2005 December 7 2005 Fall AGU IN31B-1146 Building from Where We Are:Web Services and Java-Based Clients to Enable Virtual Observatories Robert M. Candey (Code 612.4) (Robert.M.Candey@nasa.gov) Reine A. Chimiak (Code 583) David B. Han (Code 586) Bernard T. Harris (Code 583) Rita C. Johnson (QSS, Code 612.4) Colin A. Klipsch (QSS, Code 612.4) Tamara J. Kovalick (QSS, Code 612.4) Nand Lal (Code 612.4) Howard A. Leckner (QSS, Code 612.4) Michael H. Liu (Raytheon ITSS, Code 612.4) Robert E. McGuire (Code 612.4) NASA Goddard Space Flight Center, Greenbelt MD 20771 <http://spdf.gsfc.nasa.gov/>
Abstract The Space Physics Data Facility at NASA Goddard has developed a strong foundation in space science mission services and data for enhancing the scientific return of space physics research and enabling integration of these services into the emerging Virtual discipline Observatory (VxO) paradigm. We are deploying a critical set of foundation components, leveraging our data format expertise and our existing and very popular science and orbit data web-based services, such as Coordinated Data Analysis Web [CDAWeb] and Satellite Situation Center Web [SSCweb]. We have developed web services APIs for orbit location, data finding across FTP sites and in CDAWeb, data file format translation, and data visualizations that tie together existing data holdings, standardize and simplify their use, and enable much enhanced interoperability and data analysis. We describe the technologies we’ve applied, technology choices we continue to struggle with, and ideas for future enhancements and changes.
A Summary of Current SPDF Services • CDAWeb (Web-based Simultaneous multi-mission, multi-instrument selection and comparison of science data via graphics, digital listings, file retrieval and merged/subsetted CDF creation) • CDAWeb comprises >600 datasets of current and past space missions and ground-based facilities (>1M files of science data) • CDAWeb+ Java client • SSCweb (spacecraft orbit locations used for spacecraft/instrument operations and analysis) • OMNIweb (solar wind, magnetic field and plasma data, energetic proton fluxes, and geomagnetic and solar activity indices) • COHOweb/Helioweb (deep space magnetic field, plasma, and spacecraft/planet ephemerides data) • ATMOweb (ionospheric and atmospheric data) • Modelweb (space physics models) • MSQS (Magnetospheric State Query System) • FTP Browser (display of subsets of NSSDC’s ftp-accessible ASCII datasets) • NSSDC Master Catalog (Oracle catalog of information about most spacecraft and instruments) • Search external FTP data sites (PWGdata, NSSDCftp, etc.)
Our Newest Service: Coordinated Data Analysis Web Java Client (CDAWeb+) • CDAWeb+ and its underlying web services and CDAWeb database provide: • Selecting data by combinations of 5 keys (region, mission, instrument type, time span, keyword search) • Simultaneous multi-mission, multi-instrument selection and display • Comparison of science data via graphics, digital listings, file retrieval and merged/subsetted CDF creation • Results from detailed queries across many services at once • Ties together variety of service protocols: FTP, CGI and SOAP web services, and links • Java client; uses Java WebStart for easy client install and IDL on server for data compilation, listing and plotting • Takes advantage of underlying standards (CDF, ISTP/IACG Guidelines) • CDAWeb+ generalizes very popular CDAWeb service extended to call or point to many services (such as SSCweb, VSO, public FTP sites) Technologies We’ve Applied • Java, Java WebStart, JavaSound • Apache, Tomcat • SOAP, Axis • IDL, PERL, Python, C, Fortran • CDF (standard data format) • FTP sites (both local and remote)
Read create Create execute SOAP request/ response HTTP GET data/response Get dir Get dir CDAWeb Server output data file/ plot/listing IDL Get data file get file CDAWeb metadata database Apache Tomcat Apache HTTPD read our WS Java code read non-CDAWeb FTP/HTTP site metadata (XML) HTTP Server FTP Server WS Client Remote Computer Remote Computer Remote Computer CDAWeb+ is Built on Our New Web Services • Services Oriented Architecture for distributed software to software communication • No HTML or human interaction required • Cross-platform and language-independent • Enables others to develop tools and services leveraging core logic and science data and orbit information • Open to other systems and external applications by using standard distributed Application Program Interfaces (APIs) • Based on XML and Simple Object Access Protocol [SOAP] standards and/or HTTP calling interface • Form an integrated system with science utility greater than the sum of its parts • Tie together existing data holdings • Enable much enhanced interoperability and data analysis
The Role of Standards and Agreements • Use of CDF with ISTP/IACG Guidelines is critical to power/utility of CDAWeb • Standard data file formats and metadata • The wide variety of science data file formats impedes science research across instruments and missions by greatly increasing learning and translating time for each dataset and decreasing interoperability and multi-mission data analysis • Using any of the standard formats is dramatically better than custom formats, since translations can be semi-automated • Still large number of standard formats (CDF, netCDF, HDF, FITS, etc.): should these formats converge or some go away? • Should CDF evolve to use HDF 5 as basis? • Need to define cooperative identification process between major services and clients to aid in understanding usage and answering various reporting questions • One scheme is to set user-agent string to describe client • Web services need to be flexible but stay stable for clients • how to update API without breaking existing clients? • add new methods? • add whole new API? Many agreements in place with data providers to facilitate ingest of data files (FTP push accounts, Rsync retrieval) • Sharing metadata population (much is inherently manual and tedious)
A Possible Event List Service • Lists of events when something happens or some condition is true: • Bow shock crossings and other observations of physical signatures • Coronal Mass Ejection (CME) events • Times when spacecraft are in certain geophysical regions (from SSCweb) • Conjunctions of spacecraft along field lines and to ground stations (SSCweb) • Day vs. night for a spacecraft • Periods of low/quiet solar activity • Times used in a particular published study • Any other time-ordered events • Event list service functions (generate, manage and share event lists): • Store lists of events with associated metadata in standard (but expandable) XML format • Attach comments and other information • Combine lists via intersections and unions • Import and export lists as XML or provide URLs to lists in XML • Publish useful lists for others and allow others to add or comment • Possibly plot lists as line segments or highlighted areas on a time line • Possibly provide for automatic re-running of some original queries so lists are live in some sense • Example usage: • Query SSCweb for spacecraft locations and send results to list server • Query OMNIweb for specific geophysical conditions and send results to list server • Intersect the two lists to make a new list • Send resulting URL of output list to CDAWeb for data retrieval or plotting • Append CDAWeb dataset and variable metadata to the list • Prune list based on analysis of plots • Send list of selected events back to SSCweb for geophysical region, ground track and orbit determination or to other tools for further analysis • Make list public, perhaps referenced in your paper
Future Ideas for Distributed Data Access in CDAWeb • OpenDAP <http://opendap.org/> • IDL calls <http://dods.jpl.nasa.gov/idlLoaddods/idl-loaddods.html> • General info at <http://dods.jpl.nasa.gov/> • Would make a good server service for our data if we write a CDF handler (already has netCDF and HDF) • DataSpace Transfer Protocol <http://www.dataspaceweb.net/> • Oriented more for data mining and grids but has fast transfers • PDS Object-oriented Data Technology <http://oodt.jpl.nasa.gov/oodt-site/> • IDL socket calls, as in <http://idlastro.gsfc.nasa.gov/ftp/pro/sockets/webget.pro> and OO routines from Dominic Zarro at <http://orpheus.nascom.nasa.gov/~zarro/idl/sockets.html>
Should We Change Our SOAP Standard? • Currently using RPC-encoded data binding style SOAP • Not compliant with the Web Services Interoperability (WS-I) Basic Profile Version 1.0 (which didn't exist when we first deployed our Web services) • WS-I Basic Profile prohibits the use of the RPC-encoded style; rather it requires the use of a document-literal or RPC-literal style • JavaScript and Perl have difficulty supporting document-literal • Some client libraries have difficulty supporting nullable Date and multi-dimension array parameters
Should We Add a REST-style Interface? • Paul Prescod has been writing some convincing rebuttals to SOAP and recommending using straight HTTP (instead of hiding SOAP info behind HTTP), known as Representational State Transfer (REST) • <http://www.prescod.net/rest/rest_vs_soap_overview/> • <http://www.xfront.com/REST-Web-Services.html> (excellent review) • HTTP/REST <http://en.wikipedia.org/wiki/Representational_State_Transfer> • stateless • self-descriptive (HTML headers) • resource (object)-based (nouns) • delivers binary data easily • standard addressing for all resources (URIs) • output from one service can be easily used as the input to another • user friendly, easy to understand and to gin up a URL call in IDL or other scripting languages • SOAP <http://en.wikipedia.org/wiki/SOAP> • operations-based (verbs) • tunnels through HTTP • RPC middleware • web service wrapper for legacy protocols • method call based (API) • Berners-Lee says the first goal of the Web is to establish a shared information space, which can be defined by REST-style URIs. Every public resource should have a URI, preferrably using noun-type names. • Possible libraries for REST web services: • Tapestry <http://jakarta.apache.org/tapestry/> • Apache Jakarta Betwixt <http://jakarta.apache.org/commons/betwixt/> • Apache Labrador <http://sourceforge.net/projects/xml-labrador> • CognitiveWeb <http://sourceforge.net/projects/cweb> • AJAX-REST <http://sourceforge.net/projects/ajax-rest> • Use of WebDAV’s resource versioning and locking, management of resource collections, and resource metadata management?
Issues in Calling External Services • When one web service calls another, it may lose some functionality in the called service and definitely changes user tracking; possible outcomes: • More comprehensive service calls simpler (subset of features): okay • Simpler service calls more powerful (superset of features): perhaps also provide a pointer to the more powerful service’s main interface to get to additional functionality (for instance, when a service selects one spacecraft in the called service and doesn’t show you could select multiple spacecraft) • Partially overlapping set of features between services: how to merge functionality? • Social issues: • How to get effective credit (and usage statistics) for services when called by other services • How to give credit to other services that you are calling • How to assign responsibility to other services being called • How to call services requiring logins? pass passwords from one service to another? • How to handle incomplete capture of other services and datasets (appear more comprehensive than really are)? • How do you get a complete list of services in a given domain and maintain it? (distributed domains exponentially harder)
Our Newest Clients are Java-based:Is JavaScript (Especially AJAX) a Better Direction? JavaScript application advantages • Doesn’t require separate Java install (although Java on MacOS X and Solaris) often requiring administration privileges • Loads quickly (Java is often long to load first time) Java WebStart advantages • Rich and responsive user interface, including sound and 3D • Highly flexible with many libraries Java3D vs Web3D or Flash • Web3D <http://web3d.org/> requires uncommon download • Flash <http://www.macromedia.com/devnet/flash/> Commercial • Java3D <https://java3d.dev.java.net/> Now open source, requires Java install
Should We Add More Analysis Functionality? • What is the balance between ease of use and functionality (which adds complexity)? • How much interactivity vs presenting the user with useful results up-front? • Does CDAWeb+ sufficiently enhance interoperability and data analysis capabilities? • Is the dataset-centered paradigm effective? • Is it too difficult to use the large lists of results? How else can we shorten it? • Will IT-challenged scientists understand how to use WebStart to start the Java client? • What other search keys should we add? How else to identify data and services? Inventory level or variable level?
Next Steps for SPDF ServicesWe’re Considering? Web Services • Add fulll SSCweb Locator and Query functions (complex multi-spacecraft queries) • Add OMNIweb extended query (search activity indices) functions • Add OpenDAP service (requires coding in CDF support) to facilitate full service of remote data CDAWeb • Add OMNI type functions (parameter search, inter-variable comparison, more plotting options) • Create simple CDAWeb browser option (pre-selected variables and time ranges), especially for educational use CDAWeb+ • Add sonification (for accessibility and as alternative mode for discovery) (see poster ED43B-0850) • Add interactive plotting to the client (need good Java plot library, NOAA SGT?) instead of current server-side IDL plots as GIFs • Does CDAWeb+ sufficiently enhance interoperability and data analysis capabilities? For example: • Is the dataset-centered paradigm effective? • Is it too difficult to use the large lists of results? How else can we shorten it? • Will IT-challenged scientists understand how to use WebStart to start the Java client? • What other search keys should we add? How else to identify data and services? Inventory level or variable level? • Metadata population (much is inherently manual and tedious) [However, the use of separate metadata provides a powerful middle layer that enables an integrated user view] • Exploring calls external web services (VSO, EGSO, CoSEC, VxOs) Other • Add data mining and statistics • Data-model assimilation • Reduce the substantial effort required in working with data providers to document data, collect metadata (perhaps web form filled out with defaults that would encourage providers to change to correct values?)