190 likes | 202 Views
Sharing Variables Between Models in a Plug-and-Play Community Modeling System Like CSDMS: The Semantic Mediation Problem and the CSDMS Standard Names. Scott D. Peckham, University of Colorado, and Former Chief Software Architect for CSDMS February 23, 2015.
E N D
Sharing Variables Between Modelsin a Plug-and-Play CommunityModeling System Like CSDMS:The Semantic Mediation Problemand the CSDMS Standard Names Scott D. Peckham, University of Colorado, and Former Chief Software Architect for CSDMS February 23, 2015 RDA Metadata and Semantics Workshop, Feb. 23-25, Indianapolis, IN
CSDMS: 1238 Members, 5 Working Groups, 6 Focus Research Groups
Three Major “Semantic Use Cases” Search & Discovery - Semantic Similarity What terms are similar? Thesauri and Synonyms; casting a big net Semantic Mediation (matching) – Semantic Equivalence What terms mean the same thing? Connecting resources (e.g. model, data); automation Data is passed from a provider to a user Different labels for the same thing; synonyms Need unambiguous, human & machine readable labels Knowledge Representation – Semantic Meaning How are terms related to other terms? Classification, class hierarchies, assertions Needed for machine reasoning Performance: Fast string comparison vs. “tree search”; supporting metadata
Semantic Matching for Model Variables • Hydro Model A • Output variables: • streamflow • rainrate • CSDMS Standard Names • channel_exit_water_x-section__ • volume_flow_rate • atmosphere_water__rainfall_volume_flux • Hydro Model B • Input variables: • discharge • precip_rate Goal: Remove ambiguity so that the framework can automatically match outputs to inputs.
Motivation for Standard Names • Most models require input variables and produce output variables. In a component-based modeling framework like CSDMS, a set of components becomes a complete model when every component is able to obtain the input variables it needs from another component in the set. Ideally, we want a modeling framework to automatically: • Determine if a set of components provides a complete model. • Connect each component that requires a certain input variable to another component in the set that provides that variable as output. • This kind of automation requires a matching mechanismfor determining whether — and the degree to which— two variable names refer to the same quantity and whether they use the same units and are defined or measured in the same way.
Types of Quantities we Need Associated with Processes: snow__melt_volume_flux, atmosphere_water__rainfall_volume_flux Generated from mathematical operations: bedrock_surface__time_derivative_of_elevation sea_water__north_component_of_velocity Dimensionless numbers: channel_water_flow__froude_number Mathematical and physical constants: earth__standard_gravity_constant (“little g”) physics__universal_gravitational_constant (“big G”) Empirical parameters: glacier_glen-law__exponent Flow rates and fluxes (incoming or outgoing): lake_water~incoming__volume_flow_rate Reference quantities: atmosphere_air_flow__reference-height_speed
Ambiguous Physical Quantities Albedo – black-sky, blue-sky, bond, geometric, visual-geometric, white-sky Compressibility – isentropic, isothermal Concentration – mass, mole, number, volume Density – bits-per-area, bulk_mass-per-volume, charge-per-area, energy-per-area, energy-per-volume, length-per-area, mass-per-area, mass-per-volume, number-per-area, number-per-volume, particle_mass-per-volume, torque-per-volume Flow rate – mass, momentum, energy, volume, mole Flux – mass, momentum, energy, volume, mole Hardness – indentation, rebound, scratch Heat capacity – mass-specific, volume-specific, isobaric, isochoric Latitude – authalic, conformal, geocentric, geodetic, isometric, rectifying, reduced Precipitation rate – mass flux, volume flux, liquid-equivalent,rainfall, snowfall, icefall Pressure – dynamic, osmotic, partial, radiation, stagnation, static, total, vapor Temperature – boiling-point, bubble-point, convective, dew-point, effective, equivalent, freezing-point, frost-point, melting-point, potential Viscosity – apparent, dynamic, eddy, extensional, kinematic, shear, volume Vorticity – absolute, ertel, planetary, potential, relative
Reconciling Controlled Vocabularies:Problem of Low vs. High Expressiveness Low Expressiveness: CUAHSI VariableName CV: (Hydrologic point measurements) Abundance Acetic acid (amount in water is implied) Albedo Baseflow (volume flow rate or mass flow rate?) Carbon dioxide flux (what kind of flux ??) Orientation (an azimuth angle, but how measured?) Streamflow Temperature High Expressiveness: CSDMS Standard Names
Reconciling Differences with Standards vs. Introduce a new, generic or standard representation (the “hub”), then map resources to and from it. The amount of work, maintenance, etc. drops to: Cost(N) = N. If we reconcile differences between the resources in a pairwise manner, the amount of work, etc. grows fast: Cost(N) = N (N-1) / 2 ~ N2.
The CSDMS Standard Names Data Models like RDF and EAV use triples like: Subject + Predicate + Object, and Entity/Object + Attribute + Value (object-oriented) CSDMS Standard Names use a similar pattern for creating unambiguous and easily understood standard variable names or “preferred labels”according to a set of rules. These are then used to retrieve values and metadata. The pattern is: Object name + [Operation name] + Quantity name Examples: atmosphere_carbon-dioxide__partial_pressure atmosphere_water__precipitation_leq-volume_flow_rate earth_ellipsoid__equatorial_radius soil__saturated_hydraulic_conductivity We have also started building a set of standard Attribute and Process Names.
Five Delimiters in CSDMS Standard Names Double underscore – separates the object and quantity parts Single underscore – separates distinct words Hyphen – binds words into single object, e.g. carbon-dioxide Tilde – separates adjectives from noun in object names The word “of”– at the end of every operation name Examples: sea_water_phosphorous~dissolved~inorganic__time_derivative_of_mole_concentration atmosphere_air_flow__elevation_angle_of_gradient_of_ potential_vorticity
CSDMS Standard Names for Projectiles Go to CSDMS Standard Names Examples page.
The CSDMS Standard Names The CSDMS Standard Names can be viewed as a lingua franca that provides a bridge for mapping variable names between models. They play an important role in the Basic Model Interface (BMI). Model developers are asked to provide a BMI interface that includes a mapping of their model's internal variable names to CSDMS Standard Names and a Model Metadata File that provides model assumptions and other information. IMPORTANT: Model developers continue to use whatever variable names they want to in their code, but then "map" each of their internal variable names to the appropriate CSDMS standard name in their BMI implementation. Main Page: csdms.colorado.edu/wiki/CSDMS_Standard_Names Basic Rules: csdms.colorado.edu/wiki/CSN_Basic_Rules Object Names: csdms.colorado.edu/wiki/CSN_Object_Templates Operation Names: csdms.colorado.edu/wiki/CSN_Operation_Templates Quantity Names: csdms.colorado.edu/wiki/CSN_Quantity_Templates Process Names: csdms.colorado.edu/wiki/CSN_Process_Names Assumption Names: csdms.colorado.edu/wiki/CSN_Assumption_Names Metadata Names: csdms.colorado.edu/wiki/CSN_Metadata_Names Model Metadata Files: csdms.colorado.edu/wiki/CSN_MMF_Example
For More Information • Peckham, S.D., E.W.H. Hutton and B. Norris (2013) A component-based approach to integrated modeling in the geosciences: The Design of CSDMS, Computers & Geosciences, special issue: Modeling for Environmental Change, 53, 3-12 . • Peckham, S.D. (2014) The CSDMS Standard Names: Cross-domain naming conventions for describing process models, data sets and their associated variables, Proceedings of the 7th Intl. Congress on Env. Modelling and Software, International Environmental Modelling and Software Society (iEMSs), San Diego, CA. (Eds. D.P. Ames, N.W.T. Quinn, A.E. Rizzoli). • Peckham, S.D. (2014) EMELI 1.0: An experimental smart modeling framework for automatic coupling of self-describing models, Proceedings of HIC 2014, 11th International Conf. on Hydroinformatics, New York, NY.
For More Information Basic Model Interface (BMI): http://csdms.colorado.edu/wiki/BMI_Description https://github.com/csdms/bmi/blob/master/bmi.sidl CSDMS Standard Names: http://csdms.colorado.edu/wiki/CSDMS_Standard_Names