380 likes | 539 Views
The need for Automatic Feature Recognition in the EGSO Project. Bob Bentley, Valentina Zharkova, Jean Aboudarham & the EGSO Team Feature Recognition Workshop 23-24 October 2003, ROB. Outline. Overview of EGSO and its relationship with other projects The problem being addressed by EGSO
E N D
The need for Automatic Feature Recognition in the EGSO Project Bob Bentley, Valentina Zharkova, Jean Aboudarham & the EGSO Team Feature Recognition Workshop 23-24 October 2003, ROB
Outline • Overview of EGSO and its relationship with other projects • The problem being addressed by EGSO • EGSO Search capability • Feature Recognition software • Current status of EGSO
EGSO – European Grid of Solar Observations • EGSO is a Grid test-bed related to a particular application • Designed to improve access to solar data for the solar physics and other communities • Addresses the generic problem of a distributed heterogeneous data set and a scattered user community • Funded under the Information Society Technologies (IST) thematic priority of the EC’s Fifth Framework Program (FP5) • Started March 2002; duration of 36 months • Involves 11 groups in Europe and the US, led by UCL-MSSL • 4 in UK, 2 in France, 2 in Italy, 1 in Switzerland, 2 in US • Several associate partners, mainly in the US • EGSO is interacting with many other projects • US-VSO CoSEC and EGSO working closely together • Also working with the ILWS/SDO Project • Collaborated with ESA’s study project SpaceGRID • Involved with other EC funded Grid projects through GRIDSTART
Objectives of EGSO • Support user community scattered around the world • Current and future projects are international collaborations • EGSO funded by EC, but has US partners • Provide access to solar data centres and observatories around the world • Data available in Europe (or US) not enough for many studies • Build enhanced search capability for solar data • Analysis of solar data is event driven • Search capability linked to this not currently available • Similar approach also required for STP, etc. data • Increasing data volumes, etc. require new methodology • Provide ability to process data at source • Both pipeline and more complex processing Need for these capabilities not unique to solar physics…
The extended EGSO family Partners provide expertise in solar physics and IT • UK • UCL-MSSL, UCL-CS, RAL, Univ. Bradford, Astrium • France • IAS (Orsay), Observatoire de Paris-Meudon • Italy • Istituto Nazionale di Astrofisico, Politecnico di Torino • INAf includes observatories of Turin, Florence, Naples and Trieste • Switzerland • Univ. Applied Sciences (Windisch) • Netherlands • ESA – Solar Group • US • SDAC (NASA-GSFC), National Solar Observatory (VSO) • Stanford University, Montana State University (VSO) • Lockheed-Martin (CoSEC)
Connections with related Projects • Virtual Solar Observatory (US-VSO) • SDAC (NASA-GSFC) and NSO are partners in EGSO • Joe Gurman (SDAC) in VSO Project Scientist • Frank Hill (NSO) is lead investigator • EGSO Coordinator in Chair of VSO Steering Committee • Differences in scale in objectives – big/small box… • Sun-Earth Connector (CoSEC) • EGSO Coordinator is a CoI on the recently funded CoSEC continuation grant • Significant synergies between EGSO and CoSEC • The three project are trying to collaborate closely • Joint sessions at AAS and AGU; regular telecons. • First joint meeting held at MSSL in Oct. 2002 • Joint Technical Meeting in conjunction with AGU (Dec. 2003)
Potential expansion… EGSO architecture is designed to be flexible, and the system should be able to handle potential areas of expansion. On the horizon are: • Living with a Star (ILWS) • Solar Dynamic Observatory (SDO) will produce 2TB/day • EGSO could enable immediate access to the data • And optimize creation of “duplicate copy” in Europe • Modelling and Simulations • Should be possible if we can support uploading of code • Space Weather • Immediacy of access to data – few minutes to few hours • Access to other types of data: STP, magnetospheric, etc. • Use of models with parameters derived from data as input
Generic Solar Physics Query • Identify suitable observations (many serendipitous) • As many different data sets as are available • Should be possible without accessing the data • Locate the data • Data scattered, with differing means of access (some proprietary) • Often only need a subset of each data set • Process the data • Involves extraction and calibration of a subset of raw data • Uses code defined by instrument teams (SolarSoft, C…) • Return results to the User • Compare results from different instruments • SolarSoft (IDL) provides a standard platform for analysis Note the exchange in order of the 3rd and 4th bullets in this Grid expression of the problem, as compared to current practice
Nature of solar observations • For a complete picture of what is happening, we need to use as wide a range of observations as possible • The appearance of the Sun changes dramatically with wavelength • Different layers of the solar atmosphere and material at different temperatures are best seen at different wavelengths • For technical and practical reasons: • UV, EUV, X-rays and -rays observed from space • Radio and optical wavelengths observed from the ground • Issues related to coverage by each observatory • Differences in approach to handling data have developed • The observations used to build up a picture of the plasma in multi-dimensional parameter space (incl. x, y, z, t, T & ) • How plasma contained in 3d structures evolves with time • Where and how energy released and how it affects the system • Etc…
Some generic issues We need to build on the existing situation • User community scattered around the world • Capabilities of users & their computing facilities vary greatly • Users want to know if data addressing a problem exists • Not really interested in where the data are located • Or, how the data are accessed, processed, etc. • Increasing desire for combined studies with other regimes • Astrophysics, Climate Physics, Space Weather, etc. • Data centres and observatories located around the world • Large and small data providers (with varying resources) • Need to make it as easy as possible to add new data sets • Planned data volumes much larger than for current instruments • Cataloguing differs in quality, contents, and dependencies • Must handle multiple copies of data and proprietary data • Must ensure integrity of data providers • Authentication an issue that needs serious consideration • Need to minimize how it affects the user, etc.
The EGSO Search Engine In order to provide an enhanced search capability, EGSO will improve the quality and availability of metadata • Enhanced cataloguing describes the data more fully • Standardized metadata versions of observing catalogues tie together the heterogeneous data sets • New types of catalogue allow searches on events, features and phenomena rather than just date & time, pointing, etc… • Ancillary data used to provide additional search criteria • Images, time series, derived products, etc. • Search Registry describes all metadata available for search • It will be possible to access to EGSO through: • A flexible Graphic User Interface (GUI ) – normal route • An Application Program Interface (API) – this provides access for users from other applications, communities or Grids
The enhanced solar catalogues • Unified Observing Catalogues (UOC) • Metadata form of observing catalogues used to tie together the heterogeneous data, leaving the data unchanged • Self describing (e.g. XML), quantised by time and instrument, with no dependencies on ancillary data or proprietary software and any errors corrected • Standards defined for future data sets (e.g. STEREO, ILWS, Solar-B) • Solar Event Catalogues (SEC) • Built from information contained in published lists • Flare lists, CME lists, lists in SGD, etc. • Solar Feature Catalogue (SFC) • Lists of the occurrence of events, phenomena and features provides an alternate means of selecting data • Derived using image recognition software developed in WP5 Similar hierarchical cataloguing required in other data Grid projects
The enhanced solar catalogues • Unified Observing Catalogues (UOC) • Metadata form of observing catalogues used to tie together the heterogeneous data, leaving the data unchanged • Self describing (e.g. XML), quantised by time and instrument, with no dependencies on ancillary data or proprietary software and any errors corrected • Standards defined for future data sets (e.g. STEREO, ILWS, Solar-B) • Solar Event Catalogues (SEC) • Built from information contained in published lists • Flare lists, CME lists, lists in SGD, etc. • Solar Feature Catalogue (SFC) • Lists of the occurrence of events, phenomena and features provides an alternate means of selecting data • Derived using image recognition software Similar hierarchical cataloguing required in other data Grid projects Objective of the improved metadata is to pose questions like: Identify events when a filament eruption occurred within 30° of the north-west limb and there were good observations in H, EUV and soft X-rays
One User Interface implementation will not satisfy all user requirements and users will be able to tailor the interface to their needs
Feature Recognition in EGSO • The enhanced search capability of EGSO requires development of new types of metadata – the Solar Feature Catalog (SFC) is a major part of this • Key to developing alternate routes into the data • EGSO has a work package (WP5) dedicated to developing tools needed to detect common solar features and then employing them to derive the feature catalog • Major undertaking! Where possible we need help from others in the community to: • Help verify the results and extend capability to as wide a range of features as possible • Help refine ideas of how the results can be used
Outline of progress on WP5 • Software developed to prepare images for feature recognition codes • Removal of artifacts, regularize shape, etc. • Now working on codes to detect the features • Codes for sunspots, active regions and filaments developed and under test • Codes to recognize coronal holes and magnetic neutral lines under investigation • Document discussing techniques available shortly • Trying to define standard way of describing features for the feature catalogue • Preliminary version of SFC has been prepared • Document on format available for discussion • Now starting experimenting with the results to determine if objectives can be realized with the stored information
Image Preprocessing Toolkit Difficulties with Images: • Image shape (ellipse), centre and pole coordinates • Weather transparency (clouds) and different thickness of atmosphere • Centre-to-limb darkening • Defects in data (strips, lines intensity) • Errors in FITS header information First part is code to clean images prior to further processing
Sunspots detection in white light Original image on the left and detected sunspots on the right
Filament Detection in H Original image on the left and detected filaments on the right
Active Regions detection in H Detected active regions on the right with corresponding result from Big Bear Solar Observatory on the left
EUV Ca KII Hα 04/04/2002 Ha07:23 + Ca K307:30 + EUV07:26 AR detection in Ca K II, H and EUV images
Building the Solar Feature Catalog • For each feature we must first: • Fully test the feature recognition code using images from a wide time period and several sources • Finalize the format of how the information is described in the catalog • Then run code on a representative set of images • Summary & synoptic data gathered by SOHO one example of type, but not necessarily coherent • Ideally need image cadence that allows us to have a reasonable idea of when things change • Probably requires use of images from several GBOs • Raises issues of consistency related to image quality, etc
Use of the Solar Feature Catalog • The SFC can be used in at least three ways: • Outline features recognized in one wavelength on an image taken in another (at a different time) • Determine when events related to features have occurred – e.g. filament eruptions, flux emergence • Track relative motion of features – e.g. sunspots • The SFC will be deployed as a Server addressed through Web Services • Not clear whether the SFC Server will be combined with the already deployed SEC Server • Server will be accessible to other VO projects • Feature Recognition software will be released under the EGSO’s Open Source software policy • Requirement of EC on IST Projects
Current Status of EGSO • Extensive survey of requirements in 2002 • Working architecture defined and detailed during the first half of 2003 • Release 1 of EGSO was demonstrated at IST2003 in Milan (October 2-4) • Demonstration of how the three roles work together • Solve simple query based on time and wavelength • Access to data resources initially through SolarWeb • Working prototype of the Solar Event Catalog Server • New version of EGSO Data Model document was released recently • Describes both solar and heliospheric data
Activities in the near future • First components of Feature Recognition software and documentation available shortly • Image Cleaning software is first part of toolkit • Release 2 of EGSO due at the end of November • Development of interface to Data Providers • More complex query supported through SEC Server • Greater GUI capabilities • Release 3 of EGSO due mid-February 2004 • Building profiles of data (etc.) providers • Discussing file formats and metadata with producers • Trying to finalize designs of UOC and Search Registry, and description of providers in Resource Registry • Format of synoptic maps discussed with NOAA and NSO • Discussing concerns and interfaces with data sources
Conclusions • Of necessity the solar community needs to move towards a virtual environment to access solar and related data • EGSO is a Data/Computing GRID that will be a key part of the global virtual solar observatory and has already established close links with counterparts in the US • Feature recognition techniques are being used to detect solar features at a number of wavelengths • One of the ways that we are developing to provide innovative ways to search for solar observations • For more information on EGSO see: • http://www.egso.org • Or e-mail • bentley@egso.org
Possible requirements for Data Providers Need to register each dataset. Required info. could include: • Catalogue Information: All observations “should be summarised in observing catalogues” (prime source only) • Data Map: This defines in broad outline what time intervals of the data are actually held. It should be considered dynamic. • Type of storage: On-line, near on-line or off-line. • Means to retrieve data: The exact meaning of this information depends on whether the provider is active or passive. • Active source: address where process data can be retrieved from and the means of retrieval (ftp, http, etc.). • Passive source: map of the physical location of data within the provider system and the means of retrieval. • Resource limits:The resource usagebeyond this a provider switches from active to passive mode needs to be defined. • Details of access restrictions: If any part of a data set is proprietary (for some period) or otherwise restricted. • Frequency of updates: So that system does not have to constantly monitor all data sources
Access to Resources EGSO is a Grid and activities depend on access to resources • Resources described by entries in aResource Registry and managed by a Broker. Types include: • Metadata – from prime data providers • Data – from data centres, observatories, etc • Processing – simple, multi-instance processors, HPC(?) • Storage – cache space, on-line mass storage, etc. • Services – support of complex (meta)data products Note: Some providers can support multiple capabilities • The Broker allocates resources and controls: • How much being requested of a particular provider • Processing of data & staging of results • Processing may be at different site to data provider • Broker & Registries replicated to provide system resilience and permit load sharing
Ancillary Data This is “catch-all” term for all non-catalogue items • Items used to set the context for a search • Images, time series, derived parameters, etc. • Processed products from data-intensive instruments • STP, etc. data could be incorporated in this way • Some data items will have to be derived on-the-fly • Not possible for everything to have been prepared already • Servers will provide this type of complex data products • Products include derived parameters, specialized actions, etc. e.g. GOES temperature, matched areas of images, etc. • SolarSoft packages, e.g. Chianti, could be brought up like this Objective of the improved metadata is to pose questions like: Identify events when a filament eruption occurred within 30° of the north-west limb and there were good observations in H, EUV and soft X-rays
Handling the data EGSO will dramatically enhance access to solar data • Data could be located anywhere in the world • User only needs to know observations exist, not where located • System able to optimize use of sources (closest, least used, etc) and handle of replicated data and aggregated sources • Burden on provider minimized to encourage participation • As far as possible, process the data “at source” • Involves extraction and calibration of a subset of the raw data • Software for processing defined by instrument team (IDL, C…) • Processing reduces volumes of data moved around • Simplifies requirements on user’s own system • Standard (pipe-line) processing adequate for many users • More complex problems require ability to upload code • Used in analysis of extended data sets (helioseismology, etc) • System allocates resources; Security an issue • Models and simulations have similar requirements
EGSO Services & Products EGSO will establish a number of services which can also be used by other communities and Grids • API access to the EGSO Search Engine • Supports complex queries of all metadata as a service • Solar Event Catalogue servers • Sites that specialize in providing access to Solar Event data • Servers of complex data products • Servers able to process requested items on the fly • Products include derived parameters, specialized actions, etc. e.g. GOES temperature, matched areas of images, etc. • SolarSoft packages, e.g. Chianti, could be brought up like this • A lot of scope for processing a number of things in this way • Extracted & processed data products • Requested data can be provided in a number of formats • Single file type/format will not satisfy all requirements
Definition of Requirements • Solicitation of Requirements • User Survey - in collaboration with SpaceGRID, in March 2002, with over 100 responses • Use Cases - covering multiple usage modes and scientific goals • User Consultation - discussions at meetings and other discussions • Brainstorming - Systems Concepts Document, cross-fertilization from other GRID and Virtual Observatory projects • Total of 220 requirements defined • Requirements refined to address system capabilities, not implementation • Specified in formal manner applicable in system design, but annotated to allow for meaning to be readily understood by stakeholders • Currently soliciting feedback and finalizing priority of each requirement
consumer consumer consumer Information Broker Information Broker Information Broker provider provider provider provider Architecture –Roles and Relations • Architecture defined in terms of roles • Provider, Broker and Consumer
Prototypes of Services • Solar Event Catalogue (SEC) Server • Server specializing in event catalogues • EGSO interested in things that could be added • Currently being tested with: • Active regions, CMEs, and Flares using data from NOAA, Hawaii and LASCO, etc. • Can be seen through URL: http://radiosun.ts.astro.it/sec/sec.php • Database for Solar Observatories (DSO) • Registry needed within EGSO to provide more detailed information on possible data sources • Currently being populated and will shortly be accessible through the Web