260 likes | 391 Views
Registries and metadata-driven searches in a solar-terrestrial grid context. Rob Bentley (UCL/MSSL) and the EGSO Team 27-29 October 2004, Greenbelt MD VOs in Space and Solar Physics Workshop. Outline. Overview of EGSO Outline of the query requirements How metadata is used within the system
E N D
Registries and metadata-driven searches in a solar-terrestrial grid context Rob Bentley (UCL/MSSL) and the EGSO Team 27-29 October 2004, Greenbelt MD VOs in Space and Solar Physics Workshop
Outline • Overview of EGSO • Outline of the query requirements • How metadata is used within the system • The EGSO catalogues and Registries • Example query
EGSO – European Grid of Solar Observations • EGSO is a Grid test-bed related to a particular application • Designed to improve access to solar data for the solar physics and other communities • Addresses the generic problem of a distributed heterogeneous data set and a scattered user community • Funded under the Information Society Technologies (IST) thematic priority of the EC’s Fifth Framework Program (FP5) • Started March 2002; duration of 36 months (or so) • Involves 12+ groups in Europe and the US, led by UCL-MSSL • 4 in UK, 3 in France, 2+ in Italy, 1 in Switzerland, 2 in US • Several associate partners, mainly in the US • Objectives include: • Building enhanced search capability for solar data • Support of user community scattered around the world • Provide access to data centres & observatories around the world • Where possible, provide ability to process data at source
The Solar Virtual Observatory family Partners and collaborators provide expertise in solar physics and IT UKUCL-MSSL & UCL-CS, RAL, University of Bradford France IAS (Orsay), Obs. de Paris-Meudon, International Space Univ. (Strasbourg) Italy Istituto Nazionale di Astrofisica, Politecnico di Torino INAf includes Obs. of Turin, Trieste, Florence and Naples SwitzerlandUniversity of Applied Sciences (Aargau) USSDAC at NASA-GSFC, National Solar Observatory SDAC and NSO are also part of the US VSO BelgiumRoyal Observatory of Belgium NetherlandsESA-ESTEC – Solar Group US VSO: Stanford University, Montana State University CoSEC: Lockheed-Martin VSPO: LEP at NASA-GSFC (Lab. Extraterrestrial Physics)
Extracted TRACE data available to EGSO through CoSEC; need to improve links to VSO “sourced” datasets (e.g for SHA, HAO/MLSO, etc.) Want to add access to space plasma data through VHO, VSPO, etc. Planning addition of BASS2000 (France), SolarNet (Italy), plus other optical & radio ground-based observatories in Europe and US Accessible Providers
16JAN03 31JAN03 Heliosphere 2x106 K Corona 8x104 K Chromosphere-TR 6x103 K Photosphere Surface Magnetic Field Use of solar observations • The appearance of the Sun changes dramatically with wavelength • Emissions originate from different layers in the atmosphere and different physical phenomena • For a complete picture we need to use as wide a range of observations as possible • Mixture of multi-wavelength observations from space- and ground-based platforms • Identifying observations that match some User search criteria and then retrieving them are major problems
Linking into the wider context • Increasing desire to use solar data in study problems that span communities • Space weather • heliosphere, magnetosphere, ionosphere… • Climate physics • Planetary physics • Astrophysics • Need to find ways of tying these data together • Single data model covering all solar system not practical • Intersecting data models in a general pool should be possible • EGSO trying to achieve interoperability (at some level) with the space plasma community • Main purpose of bringing the VOs together…
Linking into the wider context • Increasing desire to use solar data in study problems that span communities • Space weather • heliosphere, magnetosphere, ionosphere… • Climate physics • Planetary physics • Astrophysics • Need to find ways of tying these data together • Single data model covering all solar system not practical • Intersecting data models in a general pool should be possible • EGSO trying to achieve interoperability (at some level) with the space plasma community • Main purpose of bringing the VOs together…
Generic Query • Identify suitable observations (many serendipitous) • Want to access as many different types of data as are available • Identification should be possible without accessing the data • Existing cataloguing differs in quality, contents, and dependencies • Data volumes are increasing rapidly - SDO will produce 2 TB/day • User only wants to know if data addressing a problem exists • Locate the data • Data scattered, with differing means of access (some proprietary) • Large and small data providers, with varying resources • Process the data • Involves extraction and calibration of a subset of raw data • Often only need a subset of each data set • Uses code defined by instrument teams (SolarSoft, C…) • Return results to the User • Compare results from different instruments • SolarSoft (IDL) provides a standard platform for analysis Note exchange in order of 3rd and 4th bullets in the Grid expression of the problem
The EGSO Search Engine EGSO is improving the quality and availability of metadata • Enhanced cataloguing describes the data more fully • Standardized versions of observing catalogues (UOC)tie together the heterogeneous data sets • Search Registry is an abstraction of entries in the UOC and allows narrowing of the search in initial stages • New types of catalogue allow searches on events, features and phenomena, not just date & time, pointing, etc… • Solar Event Catalogue (SEC) - derived from published lists • Solar Feature Catalogue (SFC) - generated by feature recognition • Ancillary data used to provide additional search criteria • QLK Server provides Phone book access to images, time-series, derived products, etc.; can also do limited processing • DSO Server gives Yellow Page information on instruments, etc. Similar hierarchical cataloguing required in other data Grid projects
SR • Time coverages • instrument • observing date start • observing date end • observing parameter name • observing parameter value • data source DSO Built from • Database of solar observations • instrument • observatory • EGSO available? • observing location • observing interval • description …. UOC Manually Built • Solar Observations • date start • date end • wavelengths • coordinates • ….many more relevant characteristics needed for searches Data Archives SEC/SFC QKL • Event/Feature Catalogs • catalog name • event name • observing date • description …. Built from Catalogue relationships Objective of the improved metadata, etc. is to be able to pose questions like: Identify events when a filament eruption occurred within 30° of the north-west limb and there were good observations in H, EUV and soft X-rays
Results ARCHIVES EGSO GRID Cat. Provider GUI Consumer Broker Provider GUI Consumer Broker Provider GUI API • Architecture defined in terms of three roles: • Consumer, BrokerandProvider Special Providers After R. Linsolas, IAS Simplified Architecture Consumer supports GUI and API access Archive access can be by FTP, HTTP, Web Services, cgi-bin… through adaptor modules Brokers manage the metadata and decides and allocates resources SEC, SFC, UOC, DSO, QLK, CoSEC
The UOC and Search Registry • Unified Observing Catalogues (UOC) • Unified (metadata) form of observing catalogues used to tie together the heterogeneous data, leaving the data unchanged • Increase interoperability by expressing coordinates in “standard” formats • Self describing, quantized by time and instrument, with no dependencies on ancillary data or proprietary software (and with any errors corrected) • Standards defined for future data sets (e.g. STEREO, ILWS, Solar-B) • Search Registry is an abstraction of entries in the UOC • Registry allows the Broker to identify instruments that: • have data properties matching the search - Static SR • probably have observations during search time interval - Dynamic SR • Reduces need to interact with Data Providers that are unlikely to have data matching the search • Static Search Registry (sSR) is able to support access to different types of data from solar and heliospheric observations • Tries to describe instrument capabilities & observing objectives in common way • Location of observation platform more important as include space plasma data, and for some future solar missions • First step in search - later steps can be dealt with by other VOs • EGSO sSR includes instruments on Ulysses, ACE, Cassini, SDO, STEREO… The reasoning behind the UOC is universal, allows observations to be described in more interoperable way Significant similarities in the way we use data means that the static Search Registry can be used to tie solar and heliospheric data together
Contents of the Static Search Registry • Instrument and Observatory • Includes space and ground based, solar and heliospheric • Observing Domain • What trying to observe • Solar disk, interior, corona, heliosphere, magnetosphere • Observable entity • Photons, particles, fields with sub-divisions • Common terms • Imager, spectrometer, polarimeter, coronagraph • Oscillations, waves, H-alpha, composition, irradiance… • Information related to location of observatories and operating interval of the observatories & instruments in separate table
Finding the data • Data could be located anywhere in the world • User only needs to know observations exist, not where located • System should isolate the user from the intricacies of access • System should be able to optimize use of sources • Handling of replicated data and aggregated sources • Choice of source - most capable, closest, least used, etc. • Must respect any data use policies (proprietary data) and ensure integrity of data providers • Burden on data providers minimized to encourage participation • In EGSO, data sources are interfaced by the Provider Role • Information about the data sources is held in the Data Registry - this is managed by the Broker role • Which instrument data sets are hosted by each data archive • Which data archives interfaced to each Provider Role • Provider Role uses adaptor modules used to handle different access protocols • Standardizes way data source interface appears to the system and simplifies addition of new data sources • Also allows access to data “hosted” by other VOs
EGSO GUI • GUI supports Event Drive and Date Driven queries • Others being added • Series of portlets allow user to tailor their search depending on the thrust of their query • Wavelengths, Instruments… • Advanced static Search Registry currently being deployed • API will access similar capabilities
EGSO GUI • GUI supports Event Drive and Date Driven queries • Others being added • Event Drive search returns list of events the used can examine and then select • A list of instruments that made observations is returned • After instruments are selected, list of files returned
EGSO GUI • GUI supports Event Drive and Date Driven queries • Others being added • Event Drive search returns list of events the used can examine and then select • A list of instruments that made observations is returned • After instruments are selected, list of files returned
EGSO GUI • GUI supports Event Drive and Date Driven queries • Others being added • Event Drive search returns list of events the used can examine and then select • A list of instruments that made observations is returned • After instruments are selected, list of files returned
Query Work Flow • User specifies query through the GUI or API • Static Search Registry narrows the search, based on the criteria specified in the query • Identifies instruments that make the desired type of observations • Search can includes solar, heliospheric, etc… instruments • Dynamic Search Registry determines (at some granularity) which were actually observing • Includes pointing, observatory location… • User returned list of identified instruments to refine selection • Data Registry used to locate archive holding the instrument data and make data request • List of files returned that can be retrieved, used to generate data products, etc. • Convert to processed products if required
Summary • Although mainly presented EGSO approach, structure allows cross-coupling with other VOs • Layered metadata provides several entry points • Commonality with other VOs differ in detail and need • Already exchange of information and resources between the solar VO projects • Approaches of the solar VOs very complementary • Need to better link into the space plasma, etc. VOs • Main objective of this meeting is to find common ground in the way we describe and handle data to facilitate interoperability
Useful URLs • General information about EGSO can be found under URL: http://www.egso.org • The different components of the EGSO system, including the main entry portal, can be accessed from URL: http://www.egso.org/demo
So, where are we? • Release 4a of EGSO recently become available • People welcome to try this… • Flexible GUI using selectable portlets has been deployed • User able to conduct date driven and event driven searches • Event driven search accesses EGSO Solar Event Catalogue • System searches for datasets that match search criteria • User able to make selections at each stage of the search with aid of supporting data • Search Registry (SR) is still being developed • Simplified version currently installed in GUI • More complex version in preparation (Release 4b) • Fully functional Search Registry will allow comprehensive selection of the types of instrument, data, region observed, etc • Interoperable with the STP & heliospheric observations • Support for refining selection of data for time and spatial coordinates using cursors will be added shortly
So, where are we? (cont…) • Special Servers are becoming operational • SEC Server is fully integrated into EGSO • Web Service accessible relational database - SQL in, VOTable out • Several lists are already included: Flares, particle events, CMEs… • Hope to add many more in the future • DSO and SFC Servers functional, but integration not yet complete • Both available as Web interfaces; Web Service interfaces planned • Over 3 years of data processed for Solar Feature Catalogue • QLK Server is still under development • Processing using CoSEC partially integrated • GOES X-ray & energetic particle lightcurves generated for GUI • In process of adding ability to: • generate quicklook images & movies • extract and process certain datasets • create composite plots for use in searches and publications • Several popular data sources have already been integrated • Space-based: Yohkoh, SOHO & RHESSI • Ground-based: NSO & Global H-alpha Network (GHAN) • Planning to add TRACE & numerous GBO sources in near future