690 likes | 913 Views
Method comparison on graph based models for species occurences prediction - Methods And F irst results -. Jörn Vorwald BTU Cottbus. Motivation Basics Graph Theory Model Classification Methods Field Ecology GIS Statistics Results Outlook. Overview.
E N D
Method comparison on graph based models for species occurencesprediction- Methods And First results - Jörn Vorwald BTU Cottbus
Motivation Basics Graph Theory Model Classification Methods Field Ecology GIS Statistics Results Outlook Overview Overview – Motivation - Basics – Methods – Results - Outlook
Motivation • Atlas project for grasshoppers and bush crickets in Brandenburg, started 1996 • First 63 sampling sites in 1997, 61 in SPN and Cottbus, 2 outside; damselflies and dragonflies added for investigation • Next 60 sites in 1998, all in SPN and Cottbus • Completion in 1999 • Last 35 sites in 2000 in SPN with target of local aggregation • In 1999 first idea beyond atlases: information theory based approach answering the question ‚How much information is enough, when you cannot get complete information?‘ • In 2003 second idea: compare graph based models for prediction of species occurences Overview – Motivation - Basics – Methods – Results - Outlook
From Atlas Project To Modelling • Brandenburg • 299 TK-25 • SPN/CB • 27 TK-25 • 22 selected • contain 88 TK-10 • 65 selected, 158 sites • 50 for buffering, 106 sites Overview – Motivation - Basics – Methods – Results - Outlook
Basics – Graph Theory • What is a graph? • What are special graphs? • What is adjacency in graphs? • What are weighted edges? • What kinds of graphs are common in ecological modelling? • What kinds of graphs are used in my approach? Overview – Motivation - Basics – Methods – Results - Outlook
Graph Theory - Graphs • A graph is a system of points and the points connecting lines (Bodendiek & Lang 1995). • A graph is a system of point sets and of sets of point connecting lines. The set of lines may be empty. Usually the points are named vertices, and the lines are named edges. Overview – Motivation - Basics – Methods – Results - Outlook
Graph Theory – Special Graphs • Complete graphs • Wheels • Stars • Cycles • Trees • Platonian graphs • Petersen graph Overview – Motivation - Basics – Methods – Results - Outlook
x1 v1 z2 z1 e1 v2 v3 x2 e2 e3 e5 e6 x3 v4 z3 z4 e4 z5 e8 e7 Graph Theory - Adjacency • When an edge connects two vertices, the vertices are called ‚incident‘ to the edge, or, the edge is incident to each vertex. Overview – Motivation - Basics – Methods – Results - Outlook
z2 z1 e1 e2 e3 e5 e6 z3 z4 e4 z5 e8 e7 Graph Theory – Edge-weighting • Each edge can be weighted by adding a special attribute. e1 = 6 e5 = 5 e2 = 4 e6 = 5 e3 = 8 e7 = 6 e4 = 3 e8 = 7 Σe = 44 • Some important problems of graph theory and computer sciences are related to weighted graphs (e. g. optimisation problems, travelling salesman problem). Overview – Motivation - Basics – Methods – Results - Outlook
Graphs In Ecological Modelling • Graph based models are rare in ecology. • Two kinds of graphs found in literature review • Voronoi (Dirichlet, Thiessen) tessellation (e. g. Byers 1992, Mercier & Baujard 1997, Okabe et al. 2000) • Gabriel graph (Gabriel & Sokal, 1969) • In graphs usually ‚only‘ adjacency can be used for modelling. • Polygon methods can introduce more realistic assumptions about abiotic and biotic factors influencing sampling sites or target organisms. Overview – Motivation - Basics – Methods – Results - Outlook
Graphs In This Approach • Delaunay triangulation (dl) • Gabriel graph (ga) • Minimum spanning tree by Kruskal algorithm (kr) • Nearest neighbours (nn) • Voronoi tessellation (vo) Overview – Motivation - Basics – Methods – Results - Outlook
Delaunay Triangulation • In a delaunay triangulation a system of 3 vertices and 3 edges building triangles is establisht to separate the complete surface of interest, i.e. the area between vertices. • Algorithm (‚divide and conquer‘): • A triangle of edges is drawn between three points. • The Delaunay constraint is checked: • No fourth point is within the circumcircle of the triangle. • (additional: The sum of two angles is greater than 30°.) • A second triangle is drawn. • The Delaunay constraint is checked again … Overview – Motivation - Basics – Methods – Results - Outlook
Overview – Motivation - Basics – Methods – Results - Outlook
Gabriel Graph • A Gabriel graph is constructed similarly to a Delaunay triangulation. • In praxis, edges may be rejected from the graph due to external conditions. • Algorithm: • Draw an edge between two points with minimal distance (nearest neighbours). • Check the constraint: a third point must not be within a circle with the edge as diameter. • Draw an edge between one of the first points and a third point. • Check the constraint again. When the new edge violates the constraint, the edge is to reject as member of the graph. Overview – Motivation - Basics – Methods – Results - Outlook
Overview – Motivation - Basics – Methods – Results - Outlook
Minimum Spanning Tree • A minimum spanning tree is a set of connected vertices, where the sum of the lengths of all edges tends to be less then other sums. It is a tree containing all vertices. • Algorithm (Kruskal): • Choose an edge with minimal distance (nearest neighbours). When more than one exist, choose accidently one. • Choose a second edge with minimal or next bigger distance. • Choose a third edge under same condition. • Check the constraint: The edges must not build a cycle. If they do, reject the last choosen edge. • Choose a new edge. Check the constraint again. Overview – Motivation - Basics – Methods – Results - Outlook
Overview – Motivation - Basics – Methods – Results - Outlook
Nearest Neighbours • A nearest neighbour graph is necessarily a set of disconnected subgraphs, where each vertex has a connection to the vertex with minimum distance. (Nevertheless, a vertex may get a connection to two vertices.) • Algorithm: • Calculate the distances within a complete graph. Order the distances ascending. • Start with minimum distance and draw an edge. • Check the constraint: All vertices must be included. • Continue with the next bigger distance, draw a new edge. • Check the constraint again. Overview – Motivation - Basics – Methods – Results - Outlook
Overview – Motivation - Basics – Methods – Results - Outlook
Voronoi Diagram • A Voronoi diagram is the dual graph of a Delaunay triangulation, i.e. each edge within a Voronoi diagram is orthograpic to an edge within the Delaunay triangulation. • Within a Voronoi cell each point is affected nearer to the centre of the cell than to each other cell centre. • Algorithm: • Select two points (e. g. the most top and left and its nearest neighbour), draw temporarily a line between them. • Draw an edge on the line in the middle orthographic to it, remove the line. • Select a third point, draw temporarily lines between it and all neighbours. Create edges orthographic to each of the lines. Cut the edges on intersection points. Remove the lines. Overview – Motivation - Basics – Methods – Results - Outlook
Overview – Motivation - Basics – Methods – Results - Outlook
Model Classification • Multidimensional vector of classes • Rare classifications in literature review: • Levins (1966), Sharpe (1990), Refsgaard (1996), eWater Ltd. (2006) • Rare reflection of classifications • No explicit classification for each model possible Overview – Motivation - Basics – Methods – Results - Outlook
Model Classifications • By type • mechanistic • statistical • By time complexity • static • dynamic • By species complexity • single species • multiple species • By data distribution • localised • gridded • By purpose • screening • research • planning, monitoring, assessment Overview – Motivation - Basics – Methods – Results - Outlook
Model Classifications • By extent • local (x=1) • regional (x=2) • continental (x=3) • By number • presence only (y=1) • presence and absence (y=2) • activity, abundance (y=3) • By background • empirical (z=1) • causal (z=2) Overview – Motivation - Basics – Methods – Results - Outlook
Model Classification • Byers (1992) • statistic • static • 3 bark beetle species, used as single • extent: bork of single tree • presence only • localised data • for research • causal • Boyce et al. (2003) • statistic: log. regress. • dynamic: summer/winter • single species: elk • extent:Yellowstone National Park • relative abundance • localised data • for monitoring • empirical • Buckland & Elston (1993) • statistic: GLM • static • single species: green woodpecker, red deer • extent: north-east Scotland • relative abundance • gridded data • for screening • causal • Ferrier et al. (2002) • statistic • static • community level • extent: North East New South Wales • presence/absence • gridded data • for monitoring • causal • Vorwald (2006) • statistic • static • community level • extent: CB/SPN • relative activity • localised data • for screening • empirical B & E (1993) number Boyce (2003) Vorwald (2006) Ferrier (2002) Byers (1992) background extent Overview – Motivation - Basics – Methods – Results - Outlook
Methods – Field Ecology • Selection of sampling sites • First site set: one site in each topographic map 1:10,000 within SPN or CB • Second set: same procedure • Third set: unobserved topographic map (1:10,000) squares within 4 selected topographic maps 1:25,000 with one site each • Criteria: • Preferably grassland with gradient in wetness • Preferably open water (creek, river, pond or lake) • Preferably old trees on or near site Overview – Motivation - Basics – Methods – Results - Outlook
Methods – Field Ecology • Observation • Visual observation (grasshoppers, bush crickets, damselflies and dragonflies) • Net capturing (all groups) – specimen collection • Acoustic observation (grasshoppers and bush crickets) • By ear • With bat detector support • Documentation • Field forms • Database Overview – Motivation - Basics – Methods – Results - Outlook
Methods – GIS • Preparation: • Sets of sampling and buffer sites exported to plain text files from the database • Calculation of graphs within adopted Java program • Export of results to plain text files • Import of text file information into GIS for visualisation and preparation of intersection • Intersection of Voronoi diagrams in GIS, export of relevant information of intersected polygons to plain text files • Calculation of species vectors in database Overview – Motivation - Basics – Methods – Results - Outlook
KNOWN KNOWN ID ID SHORT SHORT BUF_DS BUF_DS START START SHAPE SHAPE SUBSET SUBSET PREDICT PREDICT X_COORD X_COORD Y_COORD Y_COORD 97 97 1 1 Jessern Jessern 05 05 1997 1997 97_05 97_05 98 98 4651304,13163 4651304,13163 5768510,75401 5768510,75401 2 2 97 97 Groß Drewitz Groß Drewitz 06 06 1997 1997 97_06 97_06 98 98 4679614,60514 4679614,60514 5767082,00114 5767082,00114 3 97 97 3 TÜP Lieberose TÜP Lieberose 05_06 05_06 1997 1997 97_05_06 97_05_06 98 98 4660128,00351 4660128,00351 5757345,31492 5757345,31492 4 98 98 4 Staakow Staakow 05 05 1997 1997 98_05 98_05 97 97 4665327,07645 4665327,07645 5764012,82831 5764012,82831 98 12 98 12 Weidenweg Weidenweg 06 06 1997 1997 98_06 98_06 2 2 97 97 4646409,33013 4646409,33013 5749301,96544 5749301,96544 13 98 13 98 Paulicks Mühle Paulicks Mühle 05_06 05_06 1997 1997 98_05_06 98_05_06 2 2 97 97 4646105,05869 4646105,05869 5743732,47509 5743732,47509 14 14 97_1 97_1 Byhleguhre Byhleguhre 97_2_05_1 97_2_05_1 1997 1997 97_1_97_2_05_1 97_1_97_2_05_1 2 2 00 00 4650020,89988 4650020,89988 5750267,69655 5750267,69655 29 97_1 97_1 29 Dahlitz Dahlitz 97_2_06_1 97_2_06_1 1997 1997 97_1_97_2_06_1 97_1_97_2_06_1 1 1 00 00 4653737,66796 4653737,66796 5739262,97754 5739262,97754 0 4632582.890005745551.93000 0 4636575.000005741010.00000 1 4638150.000005755400.00000 1 4632582.890005745551.93000 2 4638155.000005733600.00000 2 4636575.000005741010.00000 3 4638735.000005734745.00000 3 4638155.000005733600.00000 4 4638735.000005734745.00000 4 4636575.000005741010.00000 5 4639150.480005730300.92000 5 4638155.000005733600.00000 30 30 Zahsow Zahsow 1997 1997 1 1 4656727,79354 4656727,79354 5739109,94034 5739109,94034 31 31 Koselmühle Koselmühle 1997 1997 1 1 4651528,22304 4651528,22304 5733255,17356 5733255,17356 180 186 6047.0 186 180 6047.0 173 180 11313.0 180 173 11313.0 187 186 7577.0 186 187 7577.0 188 187 1284.0 187 188 1284.0 188 186 6627.0 186 188 6627.0 192 187 3446.0 187 192 3446.0 192 188 4463.0 188 192 4463.0 193 192 6464.0 192 193 6464.0 182 186 7531.0 186 182 7531.0 NodeId NodeX NodeY 1 4651304.13 5768510.75 2 4679614.61 5767082 3 4660128 5757345.31 4 4665327.08 5764012.83 12 4646409.33 5749301.97 13 4646105.06 5743732.48 14 4650020.9 5750267.7 29 4653737.67 5739262.98 30 4656727.79 5739109.94 31 4651528.22 5733255.17 32 4658004.59 5733498.77 NodeId NodeX NodeY 1 4651304.13 5768510.75 2 4679614.61 5767082 3 4660128 5757345.31 4 4665327.08 5764012.83 12 4646409.33 5749301.97 13 4646105.06 5743732.48 14 4650020.9 5750267.7 29 4653737.67 5739262.98 30 4656727.79 5739109.94 31 4651528.22 5733255.17 32 4658004.59 5733498.77 GIS - Preparation Overview – Motivation - Basics – Methods – Results - Outlook
Overview – Motivation - Basics – Methods – Results - Outlook
GIS - Preparation • 76 point sets for input (one file each): • 62 for all graph types (26 with ‚known‘ and ‚buffer‘ points, 36 with ‚known‘, ‚buffer‘ and points to ‚predict‘ • 14 for graph types except Voronoi graphs (6 with ‚known‘ points (without ‚buffer‘), 8 with known and points to ‚predict‘) • 670 files as output: • 62 * 5 (graph types) for all lines • 62 * 4 for all neighbouring points for all types except Voronoi • 14 * 4 for lines of graph types except Voronoi • 14 * 4 for neighbouring points of graph types except Voronoi Overview – Motivation - Basics – Methods – Results - Outlook
GIS - Intersection • Each Voronoi diagram with ‚known‘ sites and ‚buffer‘ sites has to be intersected with corresponding Voronoi diagram with sites to ‚predict‘ added. • Buffering for avoidance of ‚edge effects‘ (Kenkel et al. 1989) • 36 intersections at all • Split into small polygons with two parents: • One from known or buffering site • One from site to be predicted • Calculation of areas and relation of the area to the area of the parent for each polygon Overview – Motivation - Basics – Methods – Results - Outlook
Overview – Motivation - Basics – Methods – Results - Outlook
GIS - Intersection 1 4649437.15685 5770571.55337 1 4648965.43159 5765840.89544 1 4649350.00000 5765310.16310 1 4653966.56933 5767238.15705 1 4654150.52601 5767459.05431 1 4653141.97435 5771837.51857 1 4653062.68257 5771885.73293 2 4675843.70664 5767525.42210 2 4677881.38377 5764961.49765 2 4681057.63544 5765466.76608 2 4678293.73528 5771173.28156 2 4677786.85026 5771184.85724 2 4677461.30816 5770960.24708 3 4655085.49844 5757481.47363 3 4660252.59948 5756432.55224 3 4662308.69461 5760767.06035 3 4661109.53508 5761496.04101 3 4658889.96932 5761673.50719 3 4655171.68288 5759479.60279 113 211 211 100 114 175 175 100 115 176 176 100 116 177 177 100 117 64 2 74.4 118 64 10 3.57 119 64 167 3.63 120 64 169 1.48 121 64 170 16.92 122 65 3 75.11 123 65 15 19.25 124 65 18 3.65 125 65 177 1.98 126 66 3 1.96 127 66 4 77.17 128 66 164 19.56 129 66 165 1.32 Overview – Motivation - Basics – Methods – Results - Outlook
Calculation Of Species Vectors • A vector in this approach is a space of attributes. • Relevant attributes are ‚counts‘ of species sampled on the sites. • A species count is the maximum detection class, in which the species have been observed. Overview – Motivation - Basics – Methods – Results - Outlook
Calculation Of Species Vectors • Tables in database (filled by Visual Basic programs): • Neighbouring sample sites from Java output (incl. distances) • Voronoi cell intersection from GIS output (incl. areas) • Prediction table with sample sites and prediction subsets (incl. ‚found‘ as observed values) as rows, species as columns and species counts as table values • Filling prediction table Overview – Motivation - Basics – Methods – Results - Outlook
Calculation Of Species Vectors • Filling prediction table • For each site to be predicted iteration on neighbours defined by graph type • Sum of all distances for calculation of relation of each neighbour • Calculation of prediction relation using ‚real‘ number (i.e. converted class centre) • Sum of all relations reconverted to class • Similar procedure for Voronoi cells using areas instead of distances Overview – Motivation - Basics – Methods – Results - Outlook
Calculation Example • Site 77 within gabriel graph with known sites ‚97‘ and buffer set ‚05‘ • Neighbours: 14, 15, 16 ,17 77 14 4132.0 77 15 4413.0 77 16 3500.0 77 17 3642.0 • Σdist = 15,687 77 14 0.24 77 15 0.22 77 16 0.27 77 17 0.27 • Vector calculation Overview – Motivation - Basics – Methods – Results - Outlook
Vector Calculation Example • Vectors for observation • o14 <- c(0,0,6,5,6,2,7,0,4,7,6,4, ... ,2,0,0,0,2,2,0,2) • o15 <- c(0,0,4,6,6,5,4,0,0,7,6,2, ... ,0,0,0,6,4,4,0,0) • o16 <- c(0,0,5,7,5,5,6,0,0,5,6,2, ... ,0,0,0,0,0,2,0,4) • o17 <- c(0,0,4,5,5,4,5,0,0,7,6,1, ... ,2,0,0,6,4,1,0,1) • Calculation of prediction using interim transformation to ‚abundances‘ and retransformation to observation classes • p77 <- c(o14 * 0.24 + o15 * 0.22 + o16 * 0.27 + o17 * 0.27) • p77 <- c(0,0,5,6,6,5,6,0,2,7,6,2, ... ,2,0,0,5,2,2,0,2) • Calculations for sample sites, which are to be predicted within prediction subsets of sites, for each graph: 7,238 77 14 4132.0 77 15 4413.0 77 16 3500.0 77 17 3642.0 77 14 0.24 77 15 0.22 77 16 0.27 77 17 0.27 Overview – Motivation - Basics – Methods – Results - Outlook
Methods - Statistics • Preparation: • Export of values to be calculated in statistics environment to plain text files from database (prediction table) • Export of statistics scripts from database • Calculation of statistics in statistics environment R • Export of results to plain text files • Import of statistics results into database • Visualisation of results in R, or in spreadsheet calculation program Overview – Motivation - Basics – Methods – Results - Outlook
Statistics Preparation • Export of values to be calculated in statistics environment to plain text files from database (prediction table) • p77 <- c(0,0,5,6,6,5,6,0,2,7,6,2, ... ,2,0,0,5,2,2,0,2) dl ga kr nn vo fo 0 0 0 0 0 0 0 0 0 0 0 0 5 5 5 5 5 4 6 6 6 7 6 6 6 6 6 5 6 6 5 5 4 5 4 5 6 6 6 6 6 0 0 0 0 0 0 4 2 2 2 0 2 0 7 7 6 5 7 6 6 6 6 6 6 6 2 2 2 2 2 2 … • Export of statistics script from database: 1,506 tests sink( file = "U:/diss/r/kruskal_wallis/output/kruskal_result.txt", append = FALSE ) … #site: 77 kktst <- read.table("U:/diss/r/kruskal_wallis/input/77-97_05_98.dat", header = TRUE) site_77 <- c(kktst$dl,kktst$ga,kktst$kr,kktst$nn,kktst$vo,kktst$fo) ps_97_05_98 <- factor(rep(1:6, c(86, 86, 86, 86, 86, 86))) kruskal.test(site_77, ps_97_05_98) … sink( file = NULL ) Overview – Motivation - Basics – Methods – Results - Outlook
Statistics Calculation In R • Kruskal-Wallis rank sum test for each site within each prediction set: models vs. observation - 1,506 operations • Correlation using R-method „kendall“, i.e. rank based measure of association, for each site within each prediction set: each model vs. each other (incl. observation) - 22,590 operations • Group building by model comparison, e. g. all Delaunay triangulations vs. all Gabriel graphs, or all Voronoi tessellations vs. all observations: Kruskal-Wallis rank sum test for the comparison of correlation coefficients – 106 operations Overview – Motivation - Basics – Methods – Results - Outlook
Data And Result Handling • Calculation of statistics in R • Export of results to plain text files … Kruskal-Wallis rank sum test data: site_77 and ps_97_05_98 Kruskal-Wallis chi-squared = 19.9643, df = 5, p-value = 0.001269 … • Import of statistics results into database by text wrapping routine in Visual Basic • Visualisation of results in R, or in spreadsheet calculation program Overview – Motivation - Basics – Methods – Results - Outlook
Results • Kruskal-Wallis rank sum test for each site within each prediction set: models vs. observation • Correlation using Kendall‘s τ for each site within each prediction set: each model vs. each other (incl. observation) • Kruskal-Wallis rank sum test for the correlation coefficients of model comparisons • Advantages and limits of methods Overview – Motivation - Basics – Methods – Results - Outlook
Models vs. Observations • Rows: site • Columns: prediction set • Cells: p-value of Kruskal-Wallis rank sum test (models vs. observation) • Low correlation without statistical significance • Significance level less depending on prediction set • Heavy differences between groups of sites Overview – Motivation - Basics – Methods – Results - Outlook
Models vs. Observations • Pattern to be recognised • Not independent from prediction set • Differences between groups of sites • Low correlation without statistical significance Overview – Motivation - Basics – Methods – Results - Outlook
Model Correlations • Independent from prediction set • Model comparison creates groups: • Delaunay triangulations are similar to Gabriel graphs, and similar to Voronoi tessellations • Minimum spanning trees are similar to nearest neighbour graphs • Observations are less similar to each model than unsimilar models among each other Overview – Motivation - Basics – Methods – Results - Outlook
Delaunay/Gabriel and MST/NN are very similar. • MST and NN are different from Delaunay as well as from Gabriel. • Observations are different from all other models. Final Kruskal-Wallis test Overview – Motivation - Basics – Methods – Results - Outlook
Advantages And Limits • The models are easy to implement. • The models are easy to understand. • The models are easy to extend. • The less the graph is connected, i.e. the less the set of edges is, or, the less the number of neighbours of a single vertex is, the less is the probability of connections between known sites and sites to be predicted: The decrease of edge number increases the error rate. • The border effects are important limitations, not only for Voronoi cells (comp. Byers 1992), but for all graph types: spatially outlying sites are not or only bad to be predicted. Overview – Motivation - Basics – Methods – Results - Outlook
Limits: Graph Connections 116 177 177 100 117 64 2 74.4 118 64 10 3.57 119 64 167 3.63 120 64 169 1.48 121 64 170 16.92 122 65 3 75.11 123 65 15 19.25 -> 22.03% from unknown (buffering) sites Overview – Motivation - Basics – Methods – Results - Outlook