270 likes | 403 Views
How to join the environmental data revolution. Jon Blower Reading e-Science Centre Environmental Systems Science Centre University of Reading United Kingdom. Pre-revolutionary times. Technical difficulties stand in the way of effective data sharing
E N D
How to join the environmental data revolution Jon Blower Reading e-Science Centre Environmental Systems Science Centre University of Reading United Kingdom
Pre-revolutionary times • Technical difficulties stand in the way of effective data sharing • plethora of file formats, coordinate systems etc • expensive GIS software plus vendor lock-in • Inhibits collaboration and effective publication of research results • In day-to-day collaboration and publication, scientists often share static images or plots of data, not the data • very inflexible, hard to combine with new data • if data is provided, often in home-grown file format
What is the environmental data revolution? • Common standards and simple, cheap or free software now available for data sharing • A “revolution” in both senses: • enables a big change in behaviour • “power to the people”: don’t have to be a technical guru • You can all do this!
eXtensible Markup Language (XML) • A structure for holding data in plain text files • Everything is defined between tags like HTML: <name>Fred Bloggs</name> • Tags can be nested and have attributes: <Person gender=“male”> <name>Fred Bloggs</name> </Person> • Can be generated and read by all languages on all platforms • and is (just) human-readable • lots of software libraries available • But it’s verbose: not suitable for large datasets
NetCDF (Common Data Format) • Binary format suitable for lots of different data types • Stores data and metadata (description of data) • hence self-contained • Climate and Forecast conventions should be applied to environmental data if possible • for correct labelling of axes (latitude, longitude, time, pressure, etc) • for standard naming of variables (sea_water_potential_temperature etc) • Software libraries available for most major languages • Use it if you can! • Don’t invent your own binary format if possible
How easy can it be to share data? • Simplest way is probably GeoRSS • RSS = “Really Simple Syndication” • Adds geo-referencing information to RSS: <rss version="2.0"> <channel> <title>Some stuff</title> <item> <title>My House</title> <georss:point>51.7 -1.0</georss:point> </item> </channel> </rss> • Not very sophisticated but not a bad start • Items can be time-stamped
Displaying GeoRSS on your website • Very easy (honest)! • Download WorldKit (www.worldkit.org) • Unzip into your webspace • Change config file to point at a GeoRSS file
Keyhole Markup Language (KML) The format of Google Earth Much richer than GeoRSS, encodes lots of types of geographic data Understood by many clients: Google Maps NASA World Wind Other GIS software Now on the “proper” standards track KMZ = zipped KML for large datasets Can package up imagery and icons
KML storm track (Katrina) • Points styled according to intensity • Can be animated • KML contains just as much information as original TRACK output file!
3-D Model output in KML • Eruption of Cleveland volcano modelled by PUFF (Alaska VO) • KML contains “altitude” attribute so can display things in the atmosphere • (but doesn’t contain “depth” attribute for ocean!”)
Web Map Service (WMS) • Generates map images • Advertises its capabilities through XML document • Clients can request data in a number of formats: • e.g. JPEG, GIF, PNG, KML! • … and map projections, styles, etc • Map request is encoded in one big long Web address • This address is a unique reference for the image • Can be passed around, emailed etc • Designed for overlaying of data from different sources
WMS in action Background imagery from NASA Ocean data from ESSC Lots of GIS software supports WMS, including WorldKit
Godiva2 WMS WMS Could use images from many other WMSs DATA DATA metadata (XML) images (PNG) WMS = OGC-compliant Web Map Service Web server HTML, Javascript Web server and WMS could be co-located
Exposing your data through WMS • ReSC has developed a WMS for NetCDF data (imaginatively entitled “ncWMS”) • We’re encouraging other groups to use the software to share their own data • See me if you want to share your data in this way • Can run your own server or add data to ours
Non-geographic data: SVG • Timeseries, depth profiles etc do not plot on a map • Scalable Vector Graphics (SVG) • Describe data in XML format • Renders to a graphical plot • Support increasing in modern browsers (e.g. Firefox, right) • Can be interactive • Zoom in, pan around • Show values on the graph • Tools to create SVG are growing • NERC e-Minerals project used this heavily
The next level • So far we’ve focussed on quick and easy ways to get data “out there” in a reusable way • With more effort, we can do even better • Geography Markup Language (GML) • Complex, but can create complete descriptions of very complex features, e.g. weather fronts • Climate Science Markup Language (CSML) • Produced by NERC Data Grid project • Describes features of interest in meteorology and oceanography • Web Feature Service (WFS) • Serves features (simple and complex) in GML • Better representation of semantics • Web Coverage Service (WCS) • Serves data (usually gridded) • We have developed a WCS in DEWS project • Standard is not yet quite adequate for our needs (doesn’t cope with large data well) • Currently difficult for scientists to expose their data in these ways without technical help • But these things will get easier with better software tools and wider adoption
Summary so far • XML is a general data format • But verbose: not suitable for large datasets • GeoRSS and KML are flavours of XML for geographic data • Both easy to create and visualize • KML is “easy GML”, designed for simple end-user use • For larger datasets use NetCDF • CF-compliant if possible • SVG for non-geographic data • Web Map Service + Godiva2 are good for creating visualizations of gridded data • We have developed a WMS you can use • All the above are easy to use but important to remember their limitations • the “next level” requires more technical help
What is a mashup? • Taking two or more “things” that were independently produced and putting them together such that the resulting whole is greater than the sum of the parts • Requires adherence to common standards • E.g. Online map of apartments for rent, merged with local crime statistics • Also applies to music… • with “hilarious” consequences
Mashup 1: Ocean science DAMOCLES (Arctic ice) NetCDF NetCDF Java program KML WMS KML KMZ Google Earth ARGO float data DRAKKAR model data (NEMO code) Useful for checking model results against assimilated observations to look for anomalies
Mashup 2: Hurricane Katrina SST data from Met Office FOAM model Output from TRACK program plain text NetCDF NetCDF Python script WMS WMS KMZ KMZ KMZ Google Earth vorticity data from ECMWF reanalysis Can check positioning of storm tracks and view effect of storms on ocean (e.g. cooling of sea surface as Katrina passes)
Summary of key technologies (easy-to-use cheap or free client software) Google Earth NASA World Wind WorldKit Web Mapping Service (WMS) WCS, WFS (Standard Web Services) (XML languages for geographic data) GML GeoRSS/Atom KML (general-purpose data formats, platform and language independent) NetCDF XML The World Wide Web (common communication protocol, means to uniquely identify resources) (common infrastructure) The Internet Not an exhaustive list!
Limitations • Mostly concerned with simple visualization • Data analysis is a harder problem • Specs are all very well but not everyone implements the specs properly! • Especially true for WMS z and t dimensions • Specs often focus on 2-D (land surface) • Google Earth doesn’t show proper ocean bathymetry • Chicken-and-egg situation • Scientists won’t learn these new technologies until there is enough data out there, but needs scientists to make data available
The Future • Full semantic interoperability • i.e. ability to search and compare data based on its meaning • Data described fully, together with operations that are possible on the data • Ability to perform data analysis operations • e.g. calculate difference of two remote datasets efficiently • NERC Data Grid • Discovery, visualization, download of NERC data • focus on atmosphere/ocean for the moment • strictly standards-based • INSPIRE • European Spatial Data Infrastructure
Plug for event! • 2-day workshop on Google Earth and other internet mapping tools • Focus on scientific applications for geobrowsers and geo-websites • 2-3 April 2007, Cambridge, UK • Registration closes Feb 25th (ish) • Presentations and coding tutorials • Please email resc@rdg.ac.uk or see me if you are interested
Conclusions • Please expose your data and results (not just static plots) – it’s easy! • GeoRSS/Atom • KML (recommended) • Web Map Service • Scalable Vector Graphics • (WCS and WFS for more advanced) • For larger datasets (even intermediate results): • CF-compliant NetCDF • Experiment! • We’re happy to help