180 likes | 336 Views
Organising data flows and modelling for the Essential Biodiversity Variables. Hannu Saarenmaa – University of Eastern Finland GEO BON, WG8 – Data Integration and Interoperability EU BON, WP2 – Data Integration and Interoperability BioVeL , WP2 – Workflows for Scientific Research.
E N D
Organising data flows and modelling for the Essential Biodiversity Variables Hannu Saarenmaa – University of Eastern Finland GEO BON, WG8 – Data Integration and Interoperability EU BON, WP2 – Data Integration and Interoperability BioVeL, WP2 – Workflows for Scientific Research GEO - X PlenaryGeneva, 14 January 2014
Essential Biodiversity Variables • Conceived by GEO BON Collaborators (Pereira et.al. (2013) “Essential Biodiversity Variables”, Science, Vol. 339, 18 Jan 2013). • EBVs facilitate data integration by providing an intermediate abstraction layer between primary observations and indicators. • Computed from a large number of inputs (monitoring/incidental data). • EBVs aim to help observation communities harmonise monitoring, by identifying how variables should be sampled and measured. • EBVs standardise an ontology for biodiversity and harmonise measurements, observations, and protocols. • Endorsed by Convention on Biological Diversity (CBD) and in line with the 2020 Aichi Targets. • Provide focus for GEO BON and hence for the interoperability thrust within GEO BON. • A use case that GEO BON, EU BON and BioVeL focus on.
Where does the data come from? • In Europethereareabout 2000 biodiversityobservationnetworks (only 643 listedbyEUMON). • GBIF has 10,000 data sets, openlyaccessible, conforming to GEOSS Data SharingPrinciples. • LTER/DataONEhas 1,000’s biodiversitydatasets. • EU BON is carrying out a gapanalysis: • There is a massiveduplication of effort in data management, and lack of data sharing. • Thereareveryfew data setswhose ”quality” (coverage, accuracy, etc.) hasbeendocumented and guaranteed. • Socalled ”Data core” in biodiversityhasnotyetbeendefined.
Biodiversity Virtual e-Laboratory BioVeL processing services and workflows • “Workflows” (series of data analysis steps) allow to process vast amounts of data. • Build your own workflow: select and apply successive “services” (data processing techniques.) • Import data from one’s own research and/or from existing libraries (i.e. GBIF, Catalogue of Life). • Access a library of workflows and re-use existing workflows. • Cut down research time and overhead expenses. Part of a workflow to study the ecological niche of the horseshoecrab
Aim: Predictive modelling of biodiversity change The analytical cycle Available tools from a growing family of ENM workflows – released to public at www.biovel.eu • Data Refinement Workflow (DRW) for pre-processing • Taxonomic Name Resolution / Occurrence retrieval • Geo-temporal data selection using ‘BioSTIF’. • Data quality checks / filtering using ‘Google Refine’. • Ecological Niche Modelling Workflow (ENM) • Classic ENM with 15 algorithms • Separate BioClim workflow (requires special inputs) • ENM Statistical Workflow (ESW) for post-processing • DIFF: Extent and intensity of change • STACK: Extent, intensity, and a cumulated potential • SHIFT: of the centre of gravity (direction, length, in kilometers) Data discovery Data assembly, cleaning, and refinement Ecological Niche Modeling Statistical analysis
Seamlessexchange of data layers http://openmodeller.cria.org.br/
Use case: The spruce bark beetle, Ipstypographus, disturbance of forest ecosystems Difference Pre 2002 Year 2050 • Statistical processing of the difference in Finland indicates that susceptibility of spruce forests to Ipstypographus damage will get five-fold by 2050. • Policy advise: Stricter forest hygiene through tougher legislation, so that Ipspopulations are kept at minimum, because of the increased risk. • Papers for Silva Fennicaand INTECOL session proceedings at Journal of Ecology.
Outline of the use case • Running Ecological Niche Modeling (ENM) workflow for large number of species • Process data points for hundreds of species (e.g. plants, butterflies, …) • Use data mostly from GBIF, but also from elsewhere • Each individual species may have 105 of data points • Run openModeller based ENM for all the data points • Choose predictive layers from WorldClim and GEOSS sources • Generate summary statistics that can answer questions such as: • How many species are increasing? How many are decreasing? Does the flora/fauna move to any direction? Is distribution fragmenting? Is distribution shrinking? How many populations are becoming marginalised? • Prototype automatic data processing for computing the Essential Biodiversity Variables (EBV) EBVs?
Status of the current BioVeL ENM workflow • Current openModeller based ENM workflows work at a smaller scale – focus on one or a few selected species • Current workflow requires frequent interaction with the user (many clicks if we simply multiply runs) • We need a system that is scalable and automated to run ENM for hundreds of species • We need a system that can perform a summary analysis across all the species based on the individual ENM runs • The 2nd generation BioVeL portal will provide the required capabilities. • To be released publicly in January 2014 (currently in beta mode)
Envisaged application structure • Multiple species may use the same ENM parameter set (e.g. Mediterranean dryland plants) • Parameter sets are generated and tested with another workflow (see next slide) ENM parameter sets for species Selected species EUMON query LTER query GBIF query . . . • Some species may need other offline data, or private data (uploaded from user side). ENM workflow ENM workflow ENM workflow . . . • One ENM workflow predicts the impact of environmental changes on the distribution of one species. ENM output file ENM output file ENM output file • Portal offers files for download Summary analysis • Performed with R-based custom tool outside the portal • EBV production by combining data from different models
ENM parameter optimisation workflow • Possible parameter combinations. Parameter matrix Selected species Parameter test and selection job Parameter test and selection job Parameter test and selection job . . . ENM parameter sets for species • The optimal parameter input for the large ENM workflow (see previous slide)
Results of data sweep, ready to be mapped, and statistically analysed
Example product: Accumulated invasive potential for ecological groups • 20 blacklisted species divided in 4 ecological regimes • Zoobenthos • Phytobenthos • Zoopelagial • Phytopelagial Example: Stack of combined macrozoobenthicinvasion heatmaps Slide by Matthias Obst, BioVeL
www.earthobservations.org/geobon.shtml www.eubon.eu www.biovel.eu Questions?