320 likes | 468 Views
SeaDataNet JRA2: Technical development of the interoperable system components By Dick Schaap. Organisation. Technical Task Team (TTT) consisting of 11 Data Centres ( P1 , P3 , P4 , P5 , P6 , P7 , P8 , P9 , P11 , P14 , and P 23) + P2 (chair) + P 15 (ODV) .
E N D
SeaDataNet JRA2: Technical development of the interoperable system components By Dick Schaap
Organisation • Technical Task Team (TTT) consisting of 11 Data Centres (P1, P3, P4, P5, P6, P7, P8, P9, P11, P14, and P23) + P2 (chair) + P15 (ODV). • TTT supports Technical Coordinator in defining, coordinating, and tuning of the technical developments in JRA1, JRA2 and JRA3 and related NA activities. • TTG is responsible for developing the technical deliverables • TTG reports to Coordination Group and Steering Committee • TTG had pre-meeting in April 06 in Liverpool + 1st full meeting in May 06 in Paris
Project Communication • To support communication and document management in the project between partners: • List server for group mailings, based upon Sympa software • Project Working Environment for inter alia document archival, based upon BSCW software • Final deliverables will be available by the SeaDataNet website • Action:IFREMER will activate both tools and inform partners, when operational. – a.s.a.p.
SeaDataNet System approach • ·Discovery services = Metadata directories • ·Security services = Authentication, Authorization & Accounting (AAA) • Monitoring and statistics • ·Delivery services = Data access & downloading of data sets • ·Viewing services
Discovery services Harmonizing and optimizing the metadatabases and controlled vocabularies, incl maintenance & retrieval systems
Present set-up ICES Map Cross matrix Print Map Print Data online XML Web Print Web Print Web Print Map Print Web Data online Web Web EDMED EDMERP CSR EDIOS ORG CDI BODC MARIS BSH BODC MARIS MARIS Ascii Access XML Access Ascii Online XML Ascii Online Master CMS XML Java Tools Java Tools National collation by NODCs Java Tools Research Institutes, Data Holding Centres, Monitoring agencies, .. All over EUROPE
Directories objectives • Continue the operation, population and updating • Move towards a more continuous updating process • More cohesion in schemas and use of Vocabularies • Provide users with cross searching functionality • Discard redundancy to avoid asking for the same info • Thereby: • Safeguarding backward compatibility to legacy metadata (where required!) • Establishing governance arrangements • Ensuring continuity while developing
Considerations for maintenance of directories • The volume of entries and frequency of updating of each of the directories differs considerably between each directory and per NODC • The maintenance of each directory is performed by different classes of staff from institutes • The directories make more or less use of vocabularies, which differ for comparable topics
Conclusions for maintenance of directories • the maintenance system must support a number of modalities to cope with all NODCs AND the different metadirectories in practice • one integrated data model is not feasible in practice, but better cohesion between the different formats and schemas (with ISO19115 as basis) • each format has to be reconsidered for optimization, better use of controlled and common vocabularies and EDMO for organisations • for each metadirectory a central master database must be continued under governance of the object manager (next to and linked to local databases, if available) to safeguard momentum and quality
Proposed Set-up for maintenance of directories • derive a range of controlled vocabularies with technical and content governance • updating and maintenance of the metadirectories by: • online maintenance by NODCs via CMS (incl master editing options) • XML export from local system in a semi-automatic way, with exchange initiated by NODCs at regular intervals (push situation) • XML export from local system in automatic and continuous proces (harvesting system) • Note: per metadatabase NODCs can have a different choice! And it can shift in time.
Actions directories maintenance system • optimize and document each format (tune to use of common controlled vocabularies, ISO19115 schema and GML, relations to CDI) – BODC, MARIS, BSH and IFREMER – 1st draft Sept 06 – final version Feb 07 • define controlled vocabularies and organize technical and content governance (BODC with TTT Vocabularies subgroup) – 1st draft Oct 06 – operational release by Dec 06 • directory coordinators (BODC, MARIS, BSH) modify their configuration (entry, import and management) to the new schemas, use of controlled vocabularies and support of 3 maintenance modalities, conversion of ‘old’ to ‘new’ database – pre-operational Sept 07 – fully operational Feb 08 (V1)
Actions directories maintenance system • NODCs choose their preferred modality per metadatabase and adapt to the new configuration – between Sept 07 andFeb 08 (V1) • Note: all NODCs continue the updating of metadatabases with the present set-up • switch systems as soon as also the new front-end is ready – Sept 07 • Note: EDIOS needs an Entry facility soon and revision is already underway by BODC - Dec 2006.
Considerations for retrieval & presentation of directories • For each directory a dedicated end user interface must continue to be available, also in the new situation • NODCs must have ‘back-end’ access to the central directories and controlled vocabularies to enable them to develop their own end-user interfaces at local, national, regional and thematic levels • Also a cross searching end-user interface over all dirrectories must become available
Proposed use of Web Services • Install Web Services for the controlled vocabularies (BODC) • Install a Web Service for each Directory by BODC, BSH and MARIS to support ‘intermediate’ users • NODCs use these intermediate Web Services to develop their own local, regional, thematic end user interfaces • SeaDataNet uses these intermediate Web Services to develop its cross searching end user interface at the SeaDataNet portal
Benefits of Web Services • It will ease and stimulate the usage of the SeaDataNet common vocabularies. • Data management tools can maintain up-to-date vocabularies and directly use these for metadata typing. Will improve the overall coherence. • Queries result in XML records output, that can be transformed by XSLT to local HTML pages • Retrieving full copies / filtered subsets of the Directories with synchronization for building a local metadatabase configuration and end user interface. • It can support the check for duplicates. • Web Services are supported by all platforms • OGC is working on compliance with W3C (SOAP)
End-users End-User Interface SeaDataNet Cross search End user interface CSR Web Service NODCs and institutes End-User Interface EDMED SeaDataNet Web Service End-User Interface Local/ National/ Regional End user interfaces EDMERP Local/ National/ Regional End user interfaces Web Service Local/ National/ Regional End user interfaces Local/ National/ Regional End user interfaces End-User Interface Local/ National/ Regional End user interfaces CDI Web Service End-User Interface EDIOS Web Service End-User Interface EDMO NODCs Web Service Web Service Vocab
Actions for retrieval and presentation of directories • Develop Web Services for Controlled Vocabularies (BODC – NERC Data Grid) – ß versionOct 06 • Formulate specifications for the required functionality of the intermediate Web Services, end-user interfaces and cross searching interface - TTT Interface subgroup’, chair MARIS, - Feb 07 • Develop Web Services for each Directory (BODC, BSH, MARIS) in mutual coordination – Sept 07 • Upgrade existing end user interfaces for each directory + cross search user interface, incl OGC compliance (BODC, BSH, MARIS, IFREMER) – Sept 07
Vocabulary on platform names • To optimize the consistency between Directories and as useful addition it is considered to look into: • Controlled vocabulary on platform names • Research vessels in service or historical research vessels, Ships of opportunity, but also to other platforms (Buoys, Floats, Satellites, …), worldwide. • Platform description is limited to a small number of fields • ICES already maintains an international list for ships, while JCOMM/OPS maintains a list for other platforms (ARGO, VOOS).
Vocabulary on platform names • Action required: IFREMER, BSH, BODC, ICES, and JCOMM/OPS toestablish a cooperation and to prepare a plan for development, content and technical governance of a Controlled vocabulary on platform names, foreseen as a Web Service, hosted and managed by ICES, fitting in the SeaDataNet Directory system and global IOC system. – Nov 06
Research Vessels • To optimize the consistency between Directories and as useful addition it is also considered to look into: • European Research Vessel Directory • Only research vessels in service in Europe (as part of the European marine research infrastructure) • With full description of vessel characteristics, available equipments, pictures, … • Already an initiative underway by EurOcean
European Research Vessel Directory • Action required: IFREMER, BSH, ICES and EurOcean to prepare a development and management plan for a European Research Vessel Directory. – Nov06. • Suggestion: to combine the discussion and planning on both efforts. Furthermore: • IOC to inform POGO of SeaDataNet CSR and research vessel activities and propose to provide SeaDataNet solution for a global service, incl. Planned Cruises - a.s.a.p.
Duplicate tagging • Duplicates will be in the system, because many centres manage same data sets, that’s ok! but we must prevent that users are confronted with duplicates, which can not be identified as such. • CDI can hold the key to ban new ‘unknown’ duplicates in the future: a data centre should compare with each CDI update whether the data set matches with a data set already referred in the central CDI directory. The matching comparison should be done with a number of key fields (4 – 6), that together should provide sufficient evidence. Each CDI gets a unique tag, but duplicates get the same tag.
Duplicate tagging • For the comparison a generic tool should be developed, that enables each NODC to compare an d to tag its CDIs to the central CDIs in an efficient way. • Action: IFREMER and RNODC will define a draft specification and development plan for such a generic software programme. Based upon TTT agreement of this plan, later on RNODC will write the programme - plan by Dec 2006)
Security Services • Security must be incorporated in the overall approach, because it relates to Security configurations and Data policies of individual Data Centres, which will differ. • Also we want to expand the SeaDataNet infrastructure for CDI and data-access later on towards other marine data centres / institutes. • Some partners have an open policy and have set up special servers outside their firewall for free downloading of data sets without any registration or whatever, but other (present and future) partners may want to have more control over data access and regulate which users can have access.
Security Services and Monitoring • Also we want to have insight in the overall performance, number of users and uses of the infrastructure. • the architecture must include a component for managing access and monitoring use of the infrastructure and its data resources, fit to deal with differences in data policies and security configurations between partners. • solution must have a low threshold,be service oriented not to scare off potential users, based on ‘single sign-on’ and feasible for all data centres.
Security Services and Monitoring • TTT security subgroup, chaired by IFREMER, will formulate use cases (roles), compare and select a preferred solution for the log-on and transaction tracking services – Dec 06. • Monitoring of usage will be done by web statistics. These are to be prepared at local level by each data centre and consolidated at monthly intervals by IFREMER. Action: IFREMER will define required stats – Sept 06 • System monitoring will check availability of nodes etc. Action: IFREMER + HNODC will select appropriate software and procedure – also using Sea-Search WP6 experience
Delivery services V1 • The idea is that users will browse in the CDI directory and then identify data sets, to which they would like to have access. After passing through the AAA module their ‘shopping request’ should be answered. • Version 1: Users will be able to get copies of selected data sets in files (or parts of original files) from the local systems. • Downloading services • Offline by sending CD / DVD Note: For performance reasons this might be done in a delayed mode.
Delivery services V1 • Needed is a uniform format (syntactic harmonisation) for exchanging data set files: • NetCDF • ASCII format • Action: To define/ choose standard file formats in ASCII and NetCDF, incorporating the common vocabularies – TTT Data acces subgroup, chair BODC, - Oct 06
Delivery services V1 • Implementation requires each local system to run an application for selecting the requested data sets (as indicated in the CDI) from the local database / files and to convert these selected data sets to the uniform SeaDataNet exchange formats (ASCII and NetCDF). This could be solved by a general JavaTool. • Action: To define, develop, implement and test pilot for 4 data centres - TTT Data acces subgroup – Feb 07 • Note: using inter alia experiences and elements from RNODCs E2EDM pilot project. • Roll out to other TTT data centres – Sept 07 • V1 fully operational by Feb 08
Delivery services V2 • Data virtualization • Define features for: • Points • Profiles • Grid • Trajectories • Develop web services for data delivery and viewing • Also consider OpenDAP / THREDDS • No further planning yet.
Key elements Infrastructure Versions • Version 1: • upgraded and harmonised system of metadatabases involving all data centres • online data access / delivery system operational for 11 Data Centres • pre-operational by Sept 2007, fully operational by Feb 2008 • Version 2: • dynamic updating and synchronization of metadatabases • online data access / delivery system installed for all 40 Data Centres • data access upgraded to transparant and harmonised access with common data model • operational by Feb 2010