210 likes | 315 Views
Harmonization of environmental data using the Climate Science Modelling Language. Jon Blower , Alastair Gemmell ( Reading e-Science Centre ) Andrew Woolf, Dominic Lowe, Arif Shaon ( STFC e-Science Centre ) Stephen Pascoe ( British Atmospheric Data Centre )
E N D
Harmonization of environmental data using the Climate Science Modelling Language Jon Blower, Alastair Gemmell (Reading e-Science Centre) Andrew Woolf, Dominic Lowe, Arif Shaon (STFC e-Science Centre) Stephen Pascoe (British Atmospheric Data Centre) Keiran Millard, Quillon Harphem (HR Wallingford)
We need to integrate and comparelots of different types of data…
…for validating numerical models… HadCM3 SSM/I Low res. Climate GCM HadCM3 Satellite ERA-40 HiGEM Hi-res Climate GCM, New physics Re-analysis product Putt, Gurney and Haines
Green stars: observations Red line: assimilation run …data assimilation… Black line: control run time
Search and rescue Climate prediction Flood prediction ... and making predictions
Where we are now (mostly) Separate websites for each data provider
The need for harmonization • Each community has evolved its own means for presenting data: • File formats • Metadata conventions • Coordinate systems • These are not usually mutually compatible • … and vital metadata can be missing • No widely-accepted standards exist for certain types of data • Hence scientists spend lots of time dealing with low-level technical issues • Need a common view onto all these datasets
Open Geospatial standards • Aim to describe all geographic data • XML encoding • Geography Markup Language • Web Services for data exchange • Rooted in international standards • Mandated by European INSPIRE directive • But fiendishly complex • Evolved from map-oriented systems • Vertical and temporal information not handled cleanly
Bridging the gap: CSML • Climate Science Modelling Language • Abstract data model defined using ISO/OGC approach • XML encoding based upon GML • Adapts open geospatial standards to environmental science data • “Best of both worlds” • Wraps existing data • Doesn’t expect providers to convert data • Data are seen as geographical “features”, not as a file system
Selected CSML Feature Types Feature Types are classified by their geometry
Harmonizing 2 databases using CSML • Different data providers, different internal representation • Met Office “MIDAS” dataset • “Environmental Change Network” dataset • Modelled both databases as collections of CSML PointSeriesFeatures • Allowed sharing of plotting and analysis tools • CSML-XML documents converted to maps, plots and KML • Intermediate step via XML not necessary in ideal world
Java-CSML • Need reusable libraries to apply CSML more widely • Aim is to reduce cost of developing data-driven applications • Interoperates with other means of modelling data in Java: • GeoAPI, Common Data Model • High-level analysis/visualization routines completely decoupled from low-level data access
Java-CSML: Design attempts • Transform CSML’s XML schema to Java code using automated tool • Led to very deeply-nested code • Based upon OGC-sponsored GeoAPI • Incomprehensible unless very familiar with ISO standards • GeoAPI is a moving target • Based on well-known Java concepts • Accessible to “typical” Java programmer • Compatibility with other data models assured through wrappers • Insulated against inevitable changes to standards • More code needs to be written by Java-CSML designers • Less code needs to be written by users
Java-CSML Application 1:Coastal oceanography decision support system Red line: Smartbuoy data Blue dots: model output
Java-CSML wrappers Behind the scenes Smartbuoys (via Web Feature Service) Physical model (via NetCDF files) Biological model (via OPeNDAP server) Java-CSML Plotting routines
Java-CSML Application 2:Atmospheric ozone Control run Assimilation run
ProfileFeature ArgoProfileFeature int qualityFlag Specializing CSML Features • A generic data model can’t encode all possible metadata without becoming extremely complex • In CSML generic feature types can be specialized • cf. object-oriented inheritance • Hence core data model retains simplicity
Java-CSML Application 3:Ocean data assimilation Red lines: Argo data Blue lines: model output ProfileFeature ArgoProfileFeature
Summary • CSML bridges gap between bottom-up (science) and top-down (GIS) approaches to modelling data • Wraps existing data holdings • Data modelled as Feature Types distinguished by geometry and “sensible plotting” • Complexity managed through feature inheritance • Doesn’t attempt to model everything! • Other technologies deal with discovery, provenance, security… • Java-CSML framework allows data intercomparison applications to be built quickly • Automates tedious and error-prone tasks
Wider lessons • “Interoperable” data formats not necessarily suitable for storage • Because no single data model can satisfy every application • Abstraction usually leads to data loss! • Trade-offs between scope and complexity • Don’t attempt to put everything in one specification • Symbiotic relationship between standards, tools and applications • Must be developed in parallel