200 likes | 343 Views
“curator” DB design. Curator meeting, GFDL, Sep 20. Why RDBMS. A lot of information: Model metadata Experiments metadata Institution/user metadata Data metadata Mostly it’s in textual form
E N D
“curator” DB design Curator meeting, GFDL, Sep 20
Why RDBMS • A lot of information: • Model metadata • Experiments metadata • Institution/user metadata • Data metadata • Mostly it’s in textual form • Information is internally linked tightly that can be easy to express by means of relational databases. • Relational databases have well developed means for searching and extracting procedures (SQL query language and program interfaces for any language) as for local as well as for remote user. • Very reliable, safety technology. Curator meeting, GFDL, Sep 20
Desirable Features of Model Data Factory • Relational Database storing metadata, containing description of • model components and model configuration • scenarios • postprocessing (model output and CMOR) directives • experiments • variables • formalized rules of Quality Control • data locations • task scheduler • users and groups accounts • XML as data exchange format • for compliance with FRE • working format of existing third party software • good fitted for hierarchical metadata description • prevalent in world, easy to exchange with others Data Portals • Model Builder (FMS Runtime Environment in GFDL) • checks out available model components from DB • chooses model datasets from DB • sets postprocessing directives • checks components and configurations compatibility • builds executable application and runs it • write metadata about experiment into DB (model configuration, scenario, project, organization/user, postprocessing) Curator meeting, GFDL, Sep 20
Desirable Features of Model Data Factory (continue) • Climate Model Output Rewriter (CMOR) subsystem • prepares data consistently with specific project requirements • Data Publisher • transfer data to Data Portal storage in accordance to settings from DB • Data Portal Software Package • Configuration Manager (configures Aggregation Server and Data Portal Interface) • Search Catalog Engine • Data Subsampling Engine • Data Computation Engine • Data Visualization • Data Delivery Manager Curator meeting, GFDL, Sep 20
Standard scenario of functioning Model Data Factory (ideal picture) • Scientist builds model in FRE using available model components, datasets and forcing scenario. • FRE puts metadata about built model, scenario, experiment into “curator” DB and runs experiment; • Postprocessing subsystem extracts metadata about postprocessing plan from “curator” DB and executes it, and on finish puts metadata about processed experiment back into DB. • Data Publisher (DP) regularly checks “curator” DB for new experiments marked as “public” and if finds any invokes CMOR. • CMOR goes to “curator” DB for metadata and processes needed data following metadata instructions. • DP calls QAC and then transfers data to Data Portal storage. • Configuration Manager configures Aggregation Server and Data Portal Interface and puts records about new public data in “curator” DB. • End of process, data is ready to go. Curator meeting, GFDL, Sep 20
Common functionality schema of ‘Model Data Factory’ Curator meeting, GFDL, Sep 20
Database ‘curator’design Database Compartments: • Model Metadata Compartment contains models’ descriptions, allows to build coupled model of needed configuration • Variables Compartment List of all related physical variables • Workflow Compartment contains scenarios, experiments, institutions, projects and users info • Postprocessing Compartment defines postprocessing plan for conducting experiment • Data Portal Compartment contains info about experiments data Curator meeting, GFDL, Sep 20
MySQL DB CURATOR Curator meeting, GFDL, Sep 20
Coupled_Models Model_List Component_Medias Models Variables Model Metadata Compartment(in development) Workflow Compartment Experiments Variables Compartment Curator meeting, GFDL, Sep 20
Components_Medias Coupled_Models Model_List Models Data Samples from Model Compartment Curator meeting, GFDL, Sep 20
Variables Variable_Bundles Variable_Lists Variable_List_Contents Projects Proj_Var_Names Variables Compartment Workflow Compartment Curator meeting, GFDL, Sep 20
Proj_Var_Names Variables Variable_List_Contents Variable_Lists Variable_Bundles Data Sample from Variables Compartment Curator meeting, GFDL, Sep 20
GFDL_USERS Institutions Experiment_Status Realization Projects Experiments Scenarios Workflow Compartment (in development) Curator meeting, GFDL, Sep 20
Scenarios Experiments Data Samples from Workflow Compartment Curator meeting, GFDL, Sep 20
Post_Proc PP_Units Coupled_Models Projects GFDL_USERS PP_Content Average_Periods Variable_Lists PP_Content PP_Units Postprocessing Compartment Data Samples from Postprocessing Compartment Curator meeting, GFDL, Sep 20
Data_Files Data_Grids Variables MissedData_Descriptors Experiments Coupled_Models Variable_Bundles Data Portal Compartment Curator meeting, GFDL, Sep 20
Data_Files MissedData_Descriptors Data_Grids Data Samples from Data Portal Compartments Curator meeting, GFDL, Sep 20
“curator” DB is in use now: • CM2.0 • CM2.1 Curator meeting, GFDL, Sep 20
Future Development • Bring DB terms to conventional terminology. • Set up model metadata schema standards and create tables in “curator” DB following this schema. • Fill these tables with real metadata extracted from models of GFDL, CCSM, MIT and from ESMF Component Database. • Implement tables for observation data metadata. • Implement DODS aggregated data support. • Build XML bridge for XML transcoding DB input/output Curator meeting, GFDL, Sep 20
END Questions? Suggestions? Objections? Thanks! Curator meeting, GFDL, Sep 20