1 / 14

GALION Data Management Strategies: Centralized vs. Distributed Approach

Explore the debate on where GALION data should reside - centralized or distributed. Learn about user demands, current strategies, and proposed solutions, outlining pros and cons of both architectures.

qamra
Download Presentation

GALION Data Management Strategies: Centralized vs. Distributed Approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Where Should the GALION Data Reside? Centrally or Distributed? Introduction to the Discussion Fiebig, M.; Fahre Vik, A. Norwegian Institute for Air Research

  2. User Demands for GALION Data Management • Data should be easy to find and accessible via one common location. • Data should be searchable by location, time window, parameter, … • Plotting and browsing tool for online comparison. • Data should be downloadable in homogenous format, option for user selection between a few commonly used formats. • Data should be of homogenous high quality, including detailed documentation of processing steps for assessing comparability. • Different applications require different proximity to raw measurement. • Data should include a measure of uncertainty and variability. • Data should be available in near-real-time (crisis management, forecast, …) -> one location, one format! • Option for aggregating datasets into climatologies. • …

  3. Current Strategy for Data Management in GALION • At least one common point of access for common data pool. • Responsibility for QA and long-term availability remains with contributing institutions / networks. • Features of common access portal: • Holds access metadata from all contributing stations, i.e. dates, times, and type of measurements. • Allows search with criteria as network, date, location, … • Browsing / quicklook of data. • Link to download from original location. • Tools for format conversion. • Control of access rights.

  4. Solution 1: GAWSIS as Data Discovery Portal

  5. GAWSIS Features • Data directory encompassing all GAW data centres, holds access metadata. • Search data availability by country, network, station name, station ID, station type, and parameter. • Map visualisation of availability. • Station page with station metadata, available datasets list. • Link to original repository, direct link to dataset if available. • Functionalitysimilar to a Global Information System Centre (GISC) in WMO Information System (WIS) concept. • GAWSIS plans include WIS compliance (oncethat is defined) and plotting tool.

  6. Solution 2: EARLINET-ASOS Database and Portal

  7. EARLINET-ASOS Database Features • Search all EARLINET-ASAS data by date, daytime, season, station , event category, parameter. • Select and download data (NetCDF format). • Plotting, browsing, comparing function. • The EARLINET-ASOS database will be part ofthe ACTRIS distributed database, which is planned to be WIS compliant (whenwe know whatthatmeans). • ACTRIS: EU FP7 project, willnetwork European ground-based in situ & lidar aerosol observations, cloudpropertyobservations, and reactive trace gas observations.

  8. Solution 3: GEOmon Distributed Database • Data discovery portal holding access metadata. • Data may be searched by parameter, station, home database, type (in situ, remote sensing, simulation), platform, matrix, geolocation, altitude, temporal availability. • Portal links to individual dataset where possible, to database homepage otherwise. • Will be developedintoentry portal of ACTRIS distributed database.

  9. Distributed Data Architecture Pros & Cons • Pros: • Institutions / networks keep control over data access, data quality, long-term availability and maintain visibility. • Know-how on measurement principle and data management is combined for tailored solutions. • Cons: • All institutions / networks have to maintain server infrastructure (file archive, metadata server, webservice, WIS compliance, …) • Well defined formats are essential for smooth interoperability. Implementing on-the-fly conversion of dozens of formats would be resource drain and predefined vulnerability. • Near-Real-Time dissemination with uniform QA almost impossible to implement. • Long-term availability not ensured.

  10. Centralised Data Architecture Pros & Cons • Pros: • Server infrastructure needs to be maintained only once / few times (economy of scale). • Long-term availability ensured. • Easy to ensure homogenous data formatting and quality, frequent reformatting not necessary. • Almost the only option for implementing NRT service with homogenous automated QA. • Cons: • Somewhat less visibility of individual institution / network. • Institution(s) hosting data centre(s) need to ensure access management. • Institution(s) hosting data centre also need experimental expertise.

  11. Well-Defined Common Data Formats are Essential for any Data Architecture • Data format is more than just selecting NASA-Ames, NetCDF, … • Needs to include: implementation profile for format standard and defined vocabulary, i.e. which parameteres / metadata are included in what unit and how are they named, which processing steps were conducted, all self-explaining, flags to indicate special conditions. • Example EUSAAR data formats (all NASA-Ames 1001): • Level 0: Annotated, instrument specific raw data, ”native” time resolution. • Level 1: processed to final physical variable, original time resolution. • Level 1.5: automatically aggregated to (hourly) averages, includes uncertainty for averaging period. • Level 2: same as level 1.5, but manually quality assured. • Well-defined common processing steps between levels establish traceability. • Well defined formats don’t limit usability of data, but make routine work more efficient.

  12. Efficient Use of Project Resources: GAW aerosol NRT automatic feedback • Sub-network data centre: • auto-createshourly data files (level 0). • initiatesauto-upload to NRT server. • Station: • collectsraw data in custom format FTP transfer to data centre transfer • Data Centre: • check for correct data format (level 0). • checkwhether data stayswithinspecifiedboun-daries (sanitycheck). FTP transfer to data centre • Station: • auto-createshourly data files (level 0). • initiatesauto-upload to NRT server. automatic feedback Useraccess (restricted) via web-interface: ebas.nilu.no Processing to level 1.5 Processing to level 1 EBAS database Useraccess via machine-to-machineweb-service Hourlylevel 1 data file Hourlylevel 1.5 data file

  13. How Do You Access the Data?

  14. NRT-Example: Auto-Processed DMPS data

More Related