210 likes | 327 Views
EGSO European Grid of Solar Observations. Simon Martin Rutherford Appleton Laboratory CDS Users Meeting, NAM 2003. Outline. Introduction Problem, EGSO, Grids EGSO details Enhanced solar catalogues Processing data sets CDS/Spectral data Summary Progress Conclusions
E N D
EGSO European Grid of Solar Observations Simon Martin Rutherford Appleton Laboratory CDS Users Meeting, NAM 2003
Outline • Introduction • Problem, EGSO, Grids • EGSO details • Enhanced solar catalogues • Processing data sets • CDS/Spectral data • Summary • Progress • Conclusions • Questions/suggestions
Generic Problem Of Solar Physics • Observations used to build up a picture of the plasma in multi-dimensional parameter space (incl. x, y, z, t, T & ) • Users need access to as many wavelengths as possible • Data centres and observatories located around the world • Difficult to find and access all relevant data • Very heterogeneous data sets/catalogues • Increasing data volumes (ILWS > 1.0TB/day) • Large and small data centres (with varying resources) • Users scattered around the world • Do not need to know where the data is located • Capabilities/resources of users computing vary greatly
Virtual Solar Observatories – A Solution • Make all archives ‘speak the same language’ • Consistent UI, search and analysis tools • Some objectives: • Users are aware of, and have access to all available data • Searches for required data can be made using metadata alone – the data are only accessed later • The results of a search can be used to retrieve data from several sources simultaneously, preferably in a processed form • New data can be added with little or no impact on the users • Several related projects in the solar community: • The European Grid of Solar Observations (EGSO, EC funded) • US Virtual Solar Observatory (US-VSO, funded by NASA) • Sun-Earth Connector (CoSEC, funded by NASA under ILWS) • Approach and emphasis of the projects differ • EGSO is the largest and is collaborating closely with the others
EGSO • EGSO is a grid ‘testbed’ which will federate solar data centres, forming a single ‘virtual archive’ • EGSO will lay the foundations for a ‘virtual solar observatory’ • EGSO will provide tools for searching and analysing this solar data: • EGSO will improve access to solar data • Users do not need knowledge of individual archives • Ten partners in Europe and the US, led by UCL-MSSL • EC funded project • 3 in UK, 2 in France, 2 in Italy, 1 in Switzerland, 2 in US • Several associate partners, mainly in the US • EGSO and the US VSO planning to collaborate closely
Grids • A grid is essentially a network of shared computers and devices (grid resources) • Grid resources are heterogeneous and distributed • Resources include super computers, laptops, desktops, handheld devices, instruments… • To a user, these resources are available in a simple, transparent and secure way • The grid deals with locations, security, heterogeneity on behalf of user • Analogy with electricity grids
Grids cntd. • Grids must be scalable, fault tolerant, secure • Types of Grids: • Computational e.g. Seti • Data e.g. EU DataGrid • Service – service not provided by single machine e.g. MRI scanner • Collaborative e.g. Access grids • EGSO is largely a data grid, but also a service grid
Obtaining Data From EGSO • Identify suitable observations • Search EGSO catalogues using GUI, search & visualisation tools • Refine search • Locate the data • Grid locates and accesses data • Process the data • Extraction and calibration of data • Custom processing • Retrieve the data • Data returned to user
Identifying Suitable Observations (1) • In order to provide an enhanced search capability, EGSO will improve the quality and availability of metadata • Enhanced “cataloguing” describes the data more fully • Standardized metadata versions of observing catalogues tie together the heterogeneous data sets from different instruments • New types of catalogue allow searches on events, features and phenomena rather than just date & time, pointing, etc… • Ancillary data used to provide additional search criteria • Images, time series, derived products, etc. • Search Registry describes all metadata available for search improving performance.
Identifying Suitable Observations (2) • EGSO’s enhanced catalogues based on new meta-data standards allow detailed searches to be conducted for many instruments, or by event/feature: • Unified Observing Catalogues (UOC) • Metadata form of observing catalogues used to tie together the heterogeneous data, leaving the data unchanged • Self describing (e.g. XML), quantised by time and instrument, with no dependencies on ancillary data or proprietary software and any errors corrected (pre-processing) • Standards defined for future data sets (e.g. STEREO, ILWS, Solar-B) • Solar Event Catalogues (SEC) • Built from information contained in published lists • Flare lists, CME lists, lists in SGD, etc. • Solar Feature Catalogue (SFC) • Lists of the occurrence of events, phenomena and features provides an alternate means of selecting data • Derived using image recognition software
Identifying Suitable Observations (3) • Pre-processing images to eliminate errors in UOC • Image (right) shows a poor quality Meudon H alpha image with circle showing position according to FITS header information • Other problems include defects in data, weather (ground based), image shape…
Identifying Suitable Observations (4) • GUI provided for defining queries: • Simple queries may be date/time, wavelength etc. • Synoptic images may be used to select pointing (next slide) • Results from initial search will be accompanied by Quick-look images to refine query • More complex queries can be formulated since catalogues stored in RDB (SQL) • Can also search by SEC, SFC
Search Registry Nature of searches and User Interface will be derived from Use Cases Observing Catalogue etc., Providers Search Info. Requestor Ancill. Catalogue Warehouse (cache) UOC Search Query Resolver SEC SFC Query Generator Data Requests Exact nature of interface to providers under review (Grid and/or P2P) (G)UI Summary Images, etc. EGSO – Query Resolving
Processing Data • Current archives have amassed large quantities of data, and future missions will generate huge amounts of data • moving this across networks is clearly undesirable • EGSO aims to increase access to solar data • if the user needs to process data, the demands placed on users hardware becomes too great, and uptake may be poor • As far as possible, process the data at source • Involves extraction and calibration of a subset of the raw data • Software for processing defined by instrument team (IDL, C…) • Standard processing e.g. image cleaning, Quick-look • Processing reduces volumes of data moved around • Simplifies requirements on user’s own system • Special processing, including the ability to upload custom code, will also be possible. Users can perform investigations on large data sets on dedicated processing facilities. • e.g. Using image recognition techniques to search for bright points on CDS images • Data mining
CDS/Spectral Data • Can search CDS archive in terms of time and position • Need to have a target in mind • Spectral data is hard to specify in meta-data terms. • Adding meta-data to spectral data would allow more complex queries to be resolved • e.g. searching for data sets with particularly strong spectral lines • What meta-data could be used? • Velocities (derived), emission line strengths, spectral type classification (derived) • Meta-data could be applied to other spectral data including ground based • Processing • Common processing requirements, others (e.g. Chianti?)
EGSO Progress • Just successfully completed first year review in Brussels (EC funded). • System architecture complete, working on: • Image recognition techniques to form new catalogues • Metadata for catalogues • Implementing aspects of architecture • Demonstration due Summer 2003.
Summary • EGSO will improve access to solar data • Scattered heterogeneous datasets can be found and accessed by anyone without detailed knowledge of multiple archives • EGSO can help working with large data sets by: • Reducing the number of files downloaded by use of improved catalogues and search facilities • Reducing the amount of data to be transferred by processing/extraction/calibration ‘at source’ • Offering processing facilities & power (grid services) • Spectral data (e.g. CDS) not so straight forward.
Questions/Suggestions www.egso.org simon.martin@rl.ac.uk