590 likes | 927 Views
Introduction to Niche Modeling A small bit of theory re: niches How niche modeling works G-space and E-space How it came to be Uses in ecology and evolution - present, past and future modeling of species distributions - predicting disease spread
E N D
Introduction to Niche Modeling • A small bit of theory re: niches • How niche modeling works • G-space and E-space • How it came to be • Uses in ecology and evolution - present, past and future modeling of species distributions - predicting disease spread - predicting invasive species spread - niche conservation Note: some material has been used from internet sources in regards to niche modeling pedagogy so thanks to Arthur Chapman, Town Peterson, Enrique Martinez-Meyer and others.
Niche Distinctions Grinnellian • Spatially explicit • Focus on Non-interactive Requirements for populations to thrive • Measurable from distribution Eltonian • Focus on community impacts, biotic interactions, i.e. species functional roles Hutchinsonian • Also focus on non-interactive requirements • Defined Fundamental Niche– mostly what we think of as environmental variables • Defined Realized Niche– subset of Fundamental Niche + biotic interactions
Chthamalus Two barnacle species, Chthamalus and Balanus In the intertidal. Balanus cannot stand exposure to air - similar fundamental and realized Niche. Chthalamus cannot compete with Balanus but if Balanus is removed, it can survive lower in the intertidal - different fundamental and realized niche. Balanus
HOW CAN WE RECONSTRUCT THE FUNDAMENTAL NICHE? (we can start by looking at where a species occurs) Poecile gambeli – Mountain chickadee Dots are occurrences of Poecile gambeli across its range
Model of niche in ecological dimensions precipitation temperature How Can We Model the Fundamental Niche? Geographic Space Ecological Space ecological niche modeling occurrence points on current distribution
Model of niche in ecological dimensions precipitation temperature Projection back onto onto climate landscapes at the Last Glacial Maximum Last Glacial Maximum prediction Geographic Space Ecological Space ecological niche modeling occurrence points on current distribution Current range prediction From Peterson and Soberon
SOME TERMINOLOGY Geographic Space Environmental Space G is the geographic space, typically composed of 2-D pixels Ga , Gp = The abiotically suitable area (potential distribution) Gb= The biotically suitable area Gm= Accessable area through dispersal Gi = Invadable distributional area Go= Occupied distributional area Gdata = set of observations (presences, and, if existing, true absences). E Environmental space of environmental variables. Ea Scenopoetic fundamental niche EiInvadable niche space Eo Occupied niche space Ep Biotically reduced niche
Example Mapping Between Geographic Space and Environmental Space Porque no occupado? Ea Note: This Area is occupied but not sampled --- (because you are Omiscient In this example. Work with me.) Eo Go is shown as gray shading, and Ga is “white”
General species’ distribution modeling approach Modified from NCEP module Species distribution modeling for conservation educators and practitioners.
Key factors determining the degree to which observed localities can be used to estimate the niche or distribution: • Equilibrium: A species is said to be at equilibrium with current environmental conditions if it occurs in all suitable areas, whilst being absent from all unsuitable areas. What causes disequilibrium? • Sampling adequacy: The extent to which the observed occurrence records provide a sample of the environmental space. The importance of this cannot be overestimated How could you possibly know? Modified from NCEP module Species distribution modeling for conservation educators and practitioners.
The Ideal Scenario: at equilibrium and good sampling Modified from NCEP module Species distribution modeling for conservation educators and practitioners.
New areas to survey! Suppose high equilibrium but poor sampling (in both geographical and environmental space) Modified from NCEP module Species distribution modeling for conservation educators and practitioners.
Suppose high equilibrium and poor sampling in geographical space, but good sampling in environmental space Modified from NCEP module Species distribution modeling for conservation educators and practitioners.
Potential Distribution Fundamental Niche Suppose low equilibrium but good sampling Modified from NCEP module Species distribution modeling for conservation educators and practitioners.
Circle A represents area where abiotic conditions are right for a species to occur (Ga) • Circle B represent the area where lack of competition, disease, and occurrence of mutualists allows populations to grow. • Circle M is area within which individuals & populations are capable of moving due to lack of dispersal barriers. • Gois occupied area • Giis invadable area Note: niche modeling pulls occurrences from that intersection.
Circle A represents area where abiotic conditions are right for a species to occur (=Fundamental niche Ea) • Circle B represent the area where lack of competition,disease, and occurrence of mutualists allows populations to grow • Circle M is area within which individuals & populations are capable of moving due to lack of dispersal barriers • Intersection of A and B is biotically reduced niche (Ep) • Intersection A, M, B is occupied niche space (Eo). E From Soberon and Peterson, 2005, Biodiversity Informatics
SOME POSSIBLE OUTCOMES Best Case: Weak, diffuse abiotic interactions and lack of dispersal barriers create general overlap. No dispersal barriers, but area of “correct” biotic interactions different from area of correct abiotic conditions. Estimate of FN using occurrence data should be carefully examined FN (and potential distribution) will be much larger than actual distribution due to dispersal limitations From Soberon and Peterson, 2005, Biodiversity Informatics
What abiotic factors determine fundamental niche? • The answer is complicated (but important) • Species have physiological tolerances, migration limitations and evolutionary forces that limit adaptation • A starting point for physiology may be traits • A starting point for abiotic factors is often climate • Climate variables often also correlate with other variables (elevation, land cover)
“Easy” In Theory --- But how does it work in practice? • The development of spatial ecological modeling approaches occurs in 90s • But has origins in ongoing innovations from the 70s forward • A bit of history…
How do we in practice model the “scenopoetic” ecological niche?andHow do we determine a species distribution (actual and potential)and what is the difference?
Around 1990 three things happened Large databases of presences of species (mainly computerized scientific collections) began being accessible at significant amounts
II. GIS… • Geographical Information Systems technology became widely accessible to ecologists and biogeographers
IV. Worldwide Environmental Data Layers • Remote sensing data • Land cover/land type • Vegetation • Terrain • Ocean SST, chlorophyll • Slope, aspect, flow rate hydrology data • Climatology databases • Worldclim (what we’ll use in this class) • Models of worldwide past and future climates (IPCC) • All other ancillary data layers (roads, human population density, etc)
Which leads to an NCEAS Working Group Title: Choosing (and making available) the right environmental layers for modeling how the environment controls the distribution and abundance of organisms Aim: To generate co-registered environmental data layers at 1km resolution representing climate, vegetation/landcover, hydrology/topography, marine.
WORLDWIDE MEAN ANNUAL TEMPERATURES (GREEN=cold, RED=hot) NOW LGM (based on General Circulation Models)
WORLDWIDE MEAN ANNUAL TEMPERATURES (GREEN=cold, RED=hot) NOW North America Double CO2, 2100 CE, North America (CCM models)
Occurrence record precipitation Inputs into a niche model: • stack of environmental data layers • Set of occurrence records representing presences temperature elevation soils
NICHE AND DISTRIBUTION MODELING Input: Species Presence Input Env. Data Layers CAN WE PREDICT NICHE AND DISTRIBUTION FROM SUCH DATA? (answer: maybe!) From Maxent presentation by Pearson
The outcome of a niche model is: • a prediction of suitable habitats for that taxon (based on the input data). • Output of suitability can be a yes/no or a probability function from 0-100. Panel B - input data points in black and suitable habitat in the western US for Neotoma cinerea • Panel D - close-up of suitable/unsuitable areas in the Great Basin of Western NA.
PART 1 : Idealized Workflow for building and validating a species distribution model: Acquire species occurrence data (e.g. fieldwork, museum voucher specimens, observations, surveys, etc) Map/vet the species’ distribution data; especially if coordinates are from third-party sources (e.g. removing geographic and environmental outliers) Apply modeling algorithm (e.g. Bioclim, Maxent, artificial neural network, general linear model, boosted regression tree) Process environmental layers to generate predictor variables important in defining species’ distributions (e.g. maximum daily temperature, frost days, soil water balance) and convert to appropriate formats Collate GIS database of environmental layers (e.g. temperature, precipitation, soil type) Model calibration (select suitable parameters, test importance of alternative predictor variables)
PART 2 : Idealized Workflow for building and validating a species distribution model: If possible, test model against observed data, such as occurrence records in an invaded region, or distribution shifts over recent decades Test model performance through additional fieldwork or statistical approach (e.g. AUC or Kappa or null model comparisons) Model species’ distribution in a different region (e.g. for an invasive species) or for a different time period (e.g. under future climate scenario) Create map of current modeled distribution Modified from NCEP module Species distribution modeling for conservation educators and practitioners.
SOME ISSUES WITH MODELING Adapted from a presentation by Enrique Martinez-Meyer and others Determining Species Distribution given that: • Most occurrence data available for the vast majority of species are presence-only • Sampling effort across most species’ distributional ranges is uneven and eco-geographically biased • We do not know what environmental variables are relevant for each species.
Modeling Niches • All niche modeling approaches model the function approximating the true relationship between the environment (i.e., the niche) and species geographic occurrences/distribution.
Modeling Niches P2 • All want to estimate function f = μ(Gdata, E) - that is the result of applying an algorithm to data given an environmental space E toestimate G (distribution) • Different algorithms have different data requirements • True presence-only • Presence-absence • Presence-background (can be any sample from within environment) • Presence-pseudoabsence (a pseudoabsence cannot be where a species is known to occur)
Algorithms Applied to the Problem From Richard Pearson et al. 2006
Niche Modeling Has Problems PT 2tradeoffs w/algorithms • Many algorithms do not handle asymmetric data (e.g. GLM, GAM) • Many don’t handle interaction effects (BioClim) - Some of the do not handle nominal environmental variables (e.g. soil classes) [e.g. BioClim, ENFA] - Many stochastic algorithms present different solutions even under identical parameterization and input data (e.g. GARP) - We do not know the ‘real’ distribution of species, so we do not know when models are making mistakes and when are filling knowledge gaps.
Modeling Approaches • Presence only (bioclimatic envelopes or mahalanobis distance) – points inside envelope suitable or distance of points away from mean values (farther away equals less suitable) • Presence-absence – GAMs, GLMs, MARs, CARTs. Use a link or function or set of logical statements describing the multivariate relationship between mean of response variable and predictor variables. Note: best for determining occupied distribution (not potential dist.) • Presence-background – Maxent finds the probability distribution most spread out, or closest to uniform, subject to constraints given observed occurrence records information and environmental conditions across study area. All regression techniques work with background as well. • Presence-pseudoabsence – GARP. Rule set predictions.
Example of Presence-Only Envelope Approach - BioClim • Heuristic based model • Works with presence-only data • Simple to use • 35-dimensional Hypercube in climate-space (19 in Diva-GIS) • Tends to over-predict • Works with small number of records • Will work in batch mode • Can’t make quantitative predictions or provide confidence levels • Used for predicting potential distributions • Versions incorporated into Diva-GIS
BioClim Type Modeling • The dot-dash line square is the BioClim fit of the data (for two dimensions ) • This defines an range of the values in the occupied by a species across all environmental variables for all axes. • Anything in this box might be considered “suitable”. From Peterson et al. ms. Ecological Niches and Geographic Distributions: A Modeling Perspective
Presence-Background Modeling • No known absences • How to determine false absences from true absences then? • Solution (of sorts): Compare background is the set of grid cells used in modeling • Note: These points include input true presences Question: What does this mean for model validation?
Modeling with Maxent • Assume presence records come from some unknown probability distribution called • How to estimate probability function over a set of grid cells, G? • What is the probability that any one grid cell, g, is suitable for a species?
Modeling in Maxent We can join the presence records for a taxon to the underlying environmental variables and determine means, SDs in terms of experienced climate Temperature profiles for Acacia orites
Modeling with Maxent • Each grid cell has a set of “features” defined by the environment. • Features can be the raw environment or some more complex function of those environmental variables (linear, quadratic, logistic) • Grid cells with presences can be summed to determine means and SDs across all environmental variables in order to estimate • Means of the probability distribution match the observed means • Find the flattest function (one that maximizes entropy)
Modeling with Maxent • Maxent is an iterative approach • Starts with a fully uniform distribution over all grid cells • Conducts optimization routine to maximize “gain” • Gain is likelihood statistic maximizing the probability of the presences given input data and in relation to the background data • Gain will asymptote (maximizing fit) leading final probability distribution • Distribution becomes the basis for fitted predictor variable coefficients • These coefficients are used to assess probability of presence
Maxent • Maxent is run by first selecting a set of input environmental data layers in a common GIS forrmat (gridded .ASC giles) • Next select a set of species occcurrence locations defined by lat/lon • Important to subset data into training and testing. Training data builds model, testing data is used for validation
More on Maxent • maximum spread = maximizing the log likelihood of the data associated with the presence sites minus a penalty term (think AIC) • Penalty term is basically related to a weighting based on how much information the environmental data adds to the model. • The best weighting term is discovered through a sequential updating algorithm run a specified number of iterations (you can change this parameter)
More on Maxent • Maxent regularization parameter determines “penalty function” - smaller values tend to overfit models (typically leading to smaller geo. distributions) & larger values do the opposite. • You can choose culmulative versus logistic outputs. Logistic is interpreted as probability of presence (e.g. what you most often want) • Definitely create response curves • What about features?