1 / 28

DTC Archive: data repositories in the fight against diffuse pollution

DTC Archive: data repositories in the fight against diffuse pollution. Mark Hedges, Richard Gartner: King’s College London Mike Haft, Hardy Schwamm: Freshwater Biological Association. Open Repositories 2012, Edinburgh, Scotland/UK, 10 th July 2012. A message from our sponsors.

gali
Download Presentation

DTC Archive: data repositories in the fight against diffuse pollution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DTC Archive: data repositories in the fight against diffuse pollution Mark Hedges, Richard Gartner: King’s College London Mike Haft, Hardy Schwamm: Freshwater Biological Association Open Repositories 2012, Edinburgh, Scotland/UK, 10th July 2012

  2. A message from our sponsors • Collaboration between the Freshwater Biological Association and King’s College London (Centre for e-Research) • Funded by DEFRA (Department for the Environment, Food and Rural Affairs) • A UK government ministry • Runs from Jan. 2011 – Dec. 2014

  3. Background: water quality and the DTC project

  4. Diffuse Pollution – what is it? • Pollution processes that: • Individually, have minimal effect • Cumulatively, have significant impact • Some examples: • Run-off of water/rain (e.g. from road, commercial properties) • Farm fertilisers and waste • Seepage from developed landscapes

  5. Catchments – what are they?

  6. Water Framework Directive • What is an EU Directive? • An EU Directive is a European Union legal instruction or secondary European legislation which is binding on all Member States but which must be implemented through national legislation within a prescribed time-scale. • Water Framework Directive concerns water quality • Freshwater (rivers, lakes, groundwater,) adversely affected by diffuse pollution • Failure to comply means problems!

  7. DTC Project • DTC = Demonstration Test Catchment • Investigate measures for reducing impact of diffuse water pollution on ecosystems • Evaluate the extent to which on-farm mitigation measures can reduce impact of water pollution on river ecology • cost-effectively • maintaining food production capacity

  8. DefraDemonstration Test Catchments (DTCs) 3 catchment areas in England selected for tests

  9. How does the DTC project work? • The procedure is (roughly speaking): • Monitor various environmental markers • Try out mitigation measures • Analyse changes in baseline trends of markers in response to these measures • All this produces a great variety of data • The DTCs create data, the DTC Archive project has to make it usable and useful!

  10. Equipment for data capture Bank-side water-quality monitoring station Drilling a borehole for monitoring groundwater Images thanks to Wensum DTC

  11. Mains power LHS view RHS view Nitrate probe Ammonium analyser ISCO automatic water sampler Pump Flow cell YSI multi-parameter sonde Meteor telemetry unit Total P and Total reactive P analyser Bank-side water-quality monitoring station [Image from Wensum DTC]

  12. DTC Archive

  13. Purpose of the archive • Curating data generated and captured by DTC projects • DTCs create data, we have to make it useful! • Data archive, but also querying, browsing, visualising, analysing, other interactions • Integrated views across diverse data • Need to meet needs of different users – researchers, also land managers, civil servants, planners, ...

  14. The Data • Mostly numerical in some form: spreadsheets, databases, CSV files • Sensor data (automated, telemetry) • Manual samples/analyses • Species/ecological data • Geo-data • Also less highly structured information: • Time series images, video • Stakeholder surveys • Unstructured documents

  15. Example: water quality data 61,752 data points per year for all stations

  16. Example: weather station data

  17. Example: Field Use Data

  18. Challenges of data • Not primarily an issue of scale • Datasets diverse in terms of structure • Different degrees of structuring: • Highly structured (e.g. sensor outputs) • Highly unstructured (e.g. surveys, interviews) • Different types of structure (tables of data, geospatial) • Some small, hand-crafted data sets. • Idiosyncratic metadata, description, vocabularies • Varying provenance and reliability

  19. INSPIRE • Another EU directive  • An Infrastructure for Spatial Information in the European Community • Create a European Spatial Data Infrastructure for improved sharing of spatial information • Includes standards for describing, representing, disseminating geo-spatial data, e.g. • Gemini2 for catalogue metadata • GML (Geography Markup Language) • Builds on ISO standards (ISO 19100 series)

  20. Generic Data Model ISO 19156:Observation & Measurements

  21. Multiple Data Representations Generic data model implemented in several ways for different purposes: • Archival representation • based on library/archive standards • Data representation for data integration • “Atomic” representation as triples • Various derived representations • Generated for input to specific tools/analysis

  22. Archival Data Representation

  23. Model for Integration • RDF triples • Atomic statements forming network of node/relations • Discrete datasets mapped into common format Subject Object predicate Identified by URIs predicate Species Genus memberOf Literal value hasCommonName Water flea

  24. Example dataset Tarn Name English Lake District rainfall dataset – from FISH.Link project CollectionMethod Location GridReference Easting Northing Latitude Longitude Dataset Site Name Actor ObservationSet About:Rainfall Type:Raw Unit:Inch ObservationSet About:Rainfall Type:Raw Unit:Inch ObservationSet About:Rainfall Type:Derived Unit:mm DependsOn: OS1, OS2 Duration: 1Day ObservationSet About:Rainfall Type:Derived Unit:mm DependsOn: OS1, OS2 Duration: 1Day Observation StartDate: EndDate Value: Observation StartDate: EndDate Value: Observation StartDate: EndDate Value: Observation StartDate: EndDate Value:

  25. Dataset capture and mapping • Columns, concepts, entities mapped to formal vocabularies • Mappings defined in archive objects • Automated • e.g. sensor output files • Computer-assisted • e.g. some spreadsheets • Manual • by domain experts • e.g. mark up values in texts Spreadsheet transformation workflow – from FISH.Link project

  26. Architectural Overview Browsing Visualisation Search Analysis Mappings RDF triples Mappings Archive Objects Source datasets

  27. Current Status and Next Steps • Archive project started Jan. 2011, runs till end 2014. • Datasets are already being generated in large quantities. • Prototype functionality • Modelling and Ingestion of data (incremental) • Next steps: • Extend types of dataset covered. • User interactions (queries, visualisation etc.)

  28. Thank you mark.hedges@kcl.ac.uk MHaft@fba.org.uk http://dtcarchive.org/

More Related