350 likes | 470 Views
Distributed Databases and metadata. G. Bégni, H. Makhmara - MEDIAS-France July 18, 2004 ENVIROMIS Tomsk. Aims of the presentation. Understanding the principles of metadata and databases. Making the scientific community aware of the efforts expected in terms of data documentation.
E N D
Distributed Databases and metadata G. Bégni, H. Makhmara - MEDIAS-France July 18, 2004 ENVIROMIS Tomsk
Aims of the presentation • Understanding the principles of metadata and databases. • Making the scientific community aware of the efforts expected in terms of data documentation. • Highlighting the positive impacts of such efforts. • Demonstrating the need of an easy way to access distributed databases.
Approach • Presentation of the AMMA context and its constraints: status of the problem. • Reflection on a solution. • Abstract description of the various elements part of the solution. • Selection and justification of standards and techniques. • Assessment of selections.
AMMA context • Scientific level • Multi-disciplinary • Multi-scale. • Technical level • Multi-format • Multi-volume • Multi-structure • Multi-location. • Cultural level • Multi-lenguage • Multi-usage • Multi-possibilities.
Constraints involved • Providing the various communities with the best suited access to data (language, medium, cost, services…) • Guaranteeing the durability of data wherever they are produced. • Ensuring the durability of services as time goes by (technological developments).
Access services • Easy web interface for data research and location (geographical, temporal, thematic, keywords). • Transparent service to access heterogeneous distributed data (possibilities of compiling…). • Homogeneous documentation for heterogeneous datain order to optimise their exploitation.
Data durability • Multiple and systematic back-up procedure. • Data transparency in relation to technological changes (hardware, software). • Transparent data exploitation as time goes by.
A solution • Fully defined back-up process. • Data storage in standardised formats. • Clear data documentation for future exploitation.
Service durability • Services should not depend on any proprietary or « exotic » software. • The quality of a service should not deteriorate according to technological changes.
A solution • Services based on standards. • Services based on the « Open source».
To sum-up: • Standardise storage. • Standardise services. • Standardise exploitation. • However, some data formats cannot be standardised (satellite imaging). • Neither can the related services.
Principles applied • Every item liable to be standardised should be standardised. • There should be a system gateway based on standards only. • Every item that cannot be standardised should be described in a standardised way.
A standard for each element • Data storage: ANSI/ISO, SQL, XML. • Data description: FGDC-STD-001-1998 or ISO 19115. • Service description: W3C SOAP. • Catalogue: ANSI/ISO 23950 (Z39.50).
Data description Metadata • Formed from a Greek root(« meta »). • What surpasses, encompasses a subject, a science.(Le Robert Dictionary). • Denoting a nature of a higher order or more fundamental kind. (Ofxord Talking Dictionary). • English: metadataFrench: métadonnées. • Literally speaking, metadata are data about data. • To be more precise, they are structured sets of information that describe resources.
Metadata standards • Metadata have always existed. • An effort of world-wide standardisation has been undertaken for several years. • Several (georeferenced) standards: • Content Standard for Digital Geospatial Metadata: FGDC-STD-001-1998. • ISO 19115 since the end of 2002. • FGDC is a de facto standard.
Advantages • Homogeneous presentation. • Pooled developments. • Possibility to automate data processing. • Comparison of examples: • GeoConnections Portal, Canada: http://geodiscover.cgdi.ca • Portal on desertification monitoring (OSS/Medias/SCOT): http://geooss.oss.org.tn/geooss
Efforts askedfrom data providers • Be aware of standards. • Endeavour to describe data as completely as possible. • Use data exchange formats as simple and consistent as possible. -------------------- • Data providers do not have to care about the technical or formal aspects of standards. • Database managers will provide them with easy and user-friendly tools to describe their data.
AMMA INFORMATION SYSTEM ARCHITECTURE 4.Choose datasets 1.Search by criteria (User friendly interface) MetaCatalog (Portal to the AMMA I.S) 6. Retrieve datasets 4.Query data 3.Retrieve metadata 2.Query metadata Meta database (ISO 19115 AND/OR FGDC) 5. Locate and query datasets from relevant data sources Exchange protocol Exchange protocol Exchange protocol DB AMMASAT DB SOP DB LOP
Technical diagram Other catalogues (GCMD, Clearinghouse FGDC) ZOOM YAZPHP Web forms XML records Z39.50 Zebra indexer Metadata creation - validation Zebra server ZAP client Import XML Catalogue service (any user) Edition service (data provider)
Characteristics • Management of multi-standard metadata • ISO 19115 • FGDC • DIF if XML schema. • Transparent to the data provider. • Transparent to the user.
Data access services • Médias-France is devoloping generic data access services • These services have to be auto descriptive, registered and with well know interfaces • For the moment, we focus our efforts on software permitting access to geographically distant databases (Distributed databases)
Principe • Each service is registered within a directory server • Each data source declares what data it serves • A web portal is used by scientists to locate and request data from different sources • Data is sent back to the user in a standardized format
Implementation • Data sources are under PostgreSQL, flat files or other RDBSM systems • Each data server is a DODS servlet (Distribued Oceanographic Data System) • Sevlet container is Apache Tomcat • Metada are in XML files
Prospects • Develop Web services based on W3C SOAP recommandation • Implement a Directory service for services • Hope share development effors with other organisations, within the framework of international projects (Funded by EC, INTAS…)