350 likes | 535 Views
Meteo Activity of the SIMDAT project: Building components of the WIS. Baudouin Raoult ECMWF. Phase 1 : Connectivity. Phase 2 : Interoperability. Phase 3 : Knowledge. . Virtual Data Repository . Introduction of Grid technologies research . Introduction of VO.
E N D
Meteo Activity of the SIMDAT project:Building components of the WIS Baudouin Raoult ECMWF HALO meeting –11.07.06
Phase 1: Connectivity Phase 2: Interoperability Phase 3: Knowledge .Virtual Data Repository .Introduction of Grid technologies research .Introduction of VO .Deployment of Grid infrastructure with particular attention to data transport and management .Distributed DB access .Integration of analysis services, workflows, discovery and data mining Data Grids for Process and Product Development using Numerical Simulation and Knowledge Discovery • 4 years project funded by the EU • Contract with EU was signed on 1 September 2004 • SIMDAT focuses on 4 application areas: • product design in automotive and aerospace, • process design in pharmacology • service provision in meteorology • Budget of 11 M € HALO meeting –11.07.06
SIMDAT Meteorology Partners • 22 members in the consortium • Deutscher Wetterdienst (DWD) • ECMWF • EUMETSAT • Météo France • UK Met Office • Intel • Ontoprise • IBM • IT Innovation • NEC HALO meeting –11.07.06
Meteo activity • To build an integrated and scalable framework for the collection and sharing of distributed data (WIS building blocks) • Instead of each National Met Service having a GISC, A “virtual” GISC • 2 DCPCs : ECMWF, EUMETSAT • Service oriented framework targeting meteorology, hydrology, climate and environment and offering transparent access to distributed resources • Grid enabled software • Services to process the data, elaborate products, visualize those products • Some key elements of the project are: • A single view of meteorological information which is distributed amongst the 5 partners • Improve visibility and access to meteorological data through a comprehensive discovery service • Offer a variety of reliable services for routine dissemination and for collection of data • Provide a global access control policy managed by the partners and integrated into their existing security infrastructure • 320 men/month taking into account the technology contribution to the meteo application HALO meeting –11.07.06
Virtual meteorological Centre - functional view • Through the Distributed Portal users searches for and retrieves data, subscribe to services such as routine dissemination subject to authentication and authorization • The Virtual Database Service provides a single view of partners databases HALO meeting –11.07.06
Architectural Choices • Catalogue duplicated and synchronized at each site • To have a fast discovery (browse & search phase) and a reliable system (client redirection to another node) • Build an open and flexible framework integrating technologies from different areas • Allow to pick the best components of each Grid Middleware (Globus,OGSA-DAI) • Associate J2EE and Grid/Web Services technologies to build solid components • QoS and Robustness are amongst the top priorities of the project • Framework based on J2EE components • Use pipelining, priority and queuing mechanisms to process user’s requests HALO meeting –11.07.06
Architecture • 3 main components to build the virtual database: Data Repository, Catalogue Node and Portal • installed on each partner site and interconnected through a dedicated secure connection channel • Data Repository • Interface to the partners databases • Offers metadata information to describe, search, locate data • Offers interface to retrieve data from the associated local databases • Catalogue Node • Maintains the registry and ensures synchronisation • Harvests metadata and requests data from the data Repository • Ingests data and maintains the cache of the real-time data • Serves clients: Portal or other Nodes • Monitors the execution of the requests • Distributed Portal • Offers interface to search/browse the catalogue HALO meeting –11.07.06
Architecture – con’t HALO meeting –11.07.06
WMO Core metadata standard • WMO Core Profile 0.2, profile of ISO19115 on geo-referenced data • Not scalable • Records are large and contain redundant information, slowing down the database hosting the catalogue • Same information repeated in all metadata records Unnecessary information is circulating over the network • Some documents are orders of magnitude larger than data itself • Cannot represent very large archives with small granularity • Cannot fulfil all requirements to build the Virtual Meteorological Centre • Information on how to retrieve data from local databases • Information to create a directory (Taxonomy of documents) • Information to sub-select data from a dataset HALO meeting –11.07.06
WMO UKMO Synop Heathrow 2005-10-12 Core Owner Data type Location Date Solutions • Split XML documents into fragments to solve the scalability issue • WMO core metadata is structured • Some parts are shared amongst many documents • Add specific extension to define all relevant information needed to implement the system and not defined by the WMO core • Internal unique ID • Hierarchy relationship • Physical location (which node holds the data) • Information used to generate a valid request to retrieve data from the end system • Information used to create web interface for the end user • Work with WMO to Integrate extensions in future releases of standards HALO meeting –11.07.06
WMO Information System (WIS) Requirements • Support variety of data types (Common to all WMO Programmes) • Support Archive and Real-time datasets • Build a Catalogue of all the meteorological data for exchange to support WMO programmes • Support ad-hoc requests for data and products: Pull model • Support routine dissemination of all observed data and products both real-time and non real-time : Push model • Support network security • Support of different users profile and data policies • Use different types of communication links (GTS, satellite, dedicated links) HALO meeting –11.07.06
WIS Requirements Support variety of data types HALO meeting –11.07.06
Data Repository Functions • Interface to the existing Meteorological Databases • It provides access to any kind of databases (rdbms, bespoke, flat files) • Metadata provider • Provide Metadata information to discover, locate and describe data, in respect with a defined XML metadata format • Answer Catalogue Node metadata harvesting messages • Data provider • Provide an interface to asynchronously request data from the associated existing database (to support real-time & archive datasets) • Transform the XML data request to the real database request • Offer a data channel (HTTP, FTP, …) to send the retrieved data to the Catalogue Node HALO meeting –11.07.06
Data Repository Implementation • Implemented as a web-service using a document-based interface • Protocol entirely described in an XML Message • Independent from the network transport (HTTP, SOAP, etc) • Three transport methods are supported • OGSA-DAI WSRF • Web Services (WS-I, WSDL, SOAP) • REST (XML over HTTP) • VMCMessage Protocol • A set of XML messages have been defined for metadata harvesting (Info,GetMetadataRecord) • A set of XML messages have been defined for data requesting (Submit, GetSubmitStatus, DeleteRequest) HALO meeting –11.07.06
WIS Requirements Support real-time data UMARF Satellite Data Unidart Climate Data Era40 ReanalysisData IAA NWP Outputs Data JEDDS Aeronautical Data HALO meeting –11.07.06
Realtime Data Repository • A GTS Data Repository is being developed by Meteo-France • Interfaced with the GTS (through a MSS) • It publishes GTS collections • For phase II : One source providing GTS data • No data replication over the SIMDAT infrastructure • For phase III several sources plugged onto SIMDAT • Strategy to uniquely identify the datasets (using MD5 hash codes) • Real-time data replication using the metadata synchronization mechanism • Generic Solution which can be used by all the partners HALO meeting –11.07.06
WIS Requirements Build a Catalogue of all the available meteorological products HALO meeting –11.07.06
Catalogue Node • The Catalogue is built using the metadata harvested from the Data Repositories • The Catalogue is synchronized and replicated on each Catalogue Node • The Catalogue offers discovery services accessible to the user through the distributed portal • The Catalogue contains the necessary information to retrieve and sub select the data HALO meeting –11.07.06
SIMDAT Infrastructure Support ad-hoc requests for data & products: Pull model HALO meeting –11.07.06
Distributed Portal • A Portal is deployed on each site and offers a unique view of all the datasets available • Portal offers discovery mechanisms to the users • Full text, temporal and geographical search (google-like) • Directory browsing (yahoo-like browsing) • Portal provides request handling mechanisms to the users • Submitted requests can be asynchronous to manage long-lived requests • A user can manage its requests (check status, delete them …) • A user retrieve the associated data when the request is complete • Portal uses the information contained in the metadata to create the data sub-selection forms • The metadata/data providers define how to access its datasets HALO meeting –11.07.06
How to create the database requests ? • Keep the request language of the different databases • Non intrusive solution • Add information in metadata <vgisc> extension to build the end system request: • <request>: hold information specific on how to generate a valid request to the data repository • <variables>: hold information on how to create a web interface to let the user select items from the dataset • Web portal uses the <variables> element to present selection dialogues to the user HALO meeting –11.07.06
WIS Requirements Support routine dissemination of all observed data and products both real-time and non real-time : Push model Dissemination/Subscription Will be addressed in phase III of the project HALO meeting –11.07.06
WIS Requirements Support Network Security Inter-Node Communications secured using SSL HALO meeting –11.07.06
WIS Requirements Support of different users profile & data policies Virtual Organization Implementation: Framework study and investigation in Phase II First Stable Version delivered for Nov 06 HALO meeting –11.07.06
VO Domain A B C D1 F D2 E VO Domains • Domain • Group of organisations that share a common policy(e.g. the RA-VI V-GISC) • The VO might contain a number of sub-domains. • Authentication (AuthN) • Users register with a node. • Users are known to all the nodes in the same domain • Any node within the domain should be able to authenticate a user of the domain. • Authorisation (AuthZ) • AuthZ is performed at the node level to allow/deny access to the data. • Data Access policy is expressed within the metadata. HALO meeting –11.07.06
VO Domain A B C D1 F D2 E Cross-domain issues • Metadata is visible across all domains • But some metadata can be explicitly hidden • Cross-domain authorisation involves user registration • User from domain “D2“ wanting to access data which is limited to domain “D1” will have to register to domain “D1” • Cross-domain authentication will be recognised on a trust relation-ship previously established. • Users authenticated coming from “D2” into “D1” will be checked against the trusted CA domains. • The concept of domain needs to be validated by VO working group HALO meeting –11.07.06
WIS Requirements Use different types of communication links Currently deployed on Internet Phase II : Study on a dual RMDCN/Internet deployment for production Phase III :RMDCN deployment and Eumetcast integration study HALO meeting –11.07.06
What do you need to publish data ? • Installation • Install a Catalogue • Install a Data Repository • Develop a Module to request data from the existing database • It can simply be a shell script calling the database client with the “zero development” Data Repository • Define the metadata describing the datasets • Define the discovery information (keyword, geographical, temporal) • Define how to request the database • Static information necessary to access the database • Define how to sub-select data • A metadata definition wizard is being developed HALO meeting –11.07.06
Milestones • Synchronization Engine Enhancements - June 06 • Mesh Network Management Software - June 06 • Lead by INTEL and fully compatible with the new synchronization engine • WSRF interfaces implementation - Sep 06 • Metadata Manager migration toward ebXML • Lead by UKMO, feasibility study by June 06 • Development of a Real-time Data Repository • To acquire GTS observations : Lead by Meteo-France, first implementation by Sep 06 • Implementation of the security services of the VO - Feb 07 • Onotology based discovery service • First Thesaurus implementation Sep 06, discovery interface Mar 07 HALO meeting –11.07.06
CBS conference demonstration • Meshed network of GISCs and DCPCs • Based on SIMDAT software and including the 5 European partners, JMA, CMA, BoM, NCAR, NODC • JMA, CMA, BoM fully integrated in the grid architecture • NCAR acting as DCPC and providing metadata information via OAI • NODC currently investigating the SIMDAT software HALO meeting –11.07.06
Results Achieved • Five (+2.5) Meteorological Centres interconnected and exchanging data and metadata • Users able to search browse and retrieve data distributed within the partners • Unified Catalogue based on WMO Core Profile v0.2 • First element of the security infrastructure UMARF Satellite Data Era40 Data UNIDART Data IAA Data JEDDS Data HALO meeting –11.07.06
Results Achieved (cont.) • Flexible, non intrusive architecture • Support any kind of databases (RDBMS, XML, Flat File, Object, bespoke). • Zero development Data Repository • Support Asynchronous requests (Archive, long requests) • Interests shown by meteorological community: • JMA (Japan) and CMA (China) fully integrated • BoM (Australia), KMA (Korea) and NODC (Russia) in progress • NCAR (US) catalogue is harvested using OAI, users are redirected to NCAR portal • SIMDAT work feeds back into WMO through expert teams: • ET-WISC: SIMDAT Meteo requirements are now used as the WIS requirements, IPET-MI: Findings have been used for the definition of the WMO Core Profile 0.3, ET-CTS: SIMDAT infrastructure is seen as a major infrastructure for implementing the WIS HALO meeting –11.07.06