1.7k likes | 1.85k Views
Quantitative Methods in Palaeoecology and Palaeoclimatology PAGES Valdivia October 2010. John Birks. Introduction and Overview of Major Numerical Methods. Contents. What is palaeoecology? What are palaeoecological data? Why attempt quantification in palaeoecology?
E N D
Quantitative Methods in Palaeoecology and PalaeoclimatologyPAGES Valdivia October 2010 John Birks Introduction and Overview of Major Numerical Methods
Contents What is palaeoecology? What are palaeoecological data? Why attempt quantification in palaeoecology? What are the main approaches to quantification in palaeoecology? What are the aims of the course? What is the level of the course? What are the major numerical techniques in quantitative palaeoecology? How to transform palaeoecological data? What are the basics behind the major techniques used iin quantitative palaeoecology?
WHAT IS PALAEOECOLOGY? Palaeoecology is, in theory, the ecology of the past and is a combination of biology and geology. In practice, it is largely concerned with the reconstruction of past communities, landscapes, environments, and ecosystems. It is difficult to study the ecology of organisms in the past and hence deduce organism – environment relationships in the past. Often the only record of the past environment is the fossil record. Cannot use the fossil record to reconstruct the past environment, and then use the past environment to explain changes in the fossil record!
There are several approaches to palaeoecology • Descriptive – basic description, common • Narrative - ‘story telling’, frequent • Analytical - rigorous hypothesis testing, rare • Qualitative - common • Quantitative – increasing • Descriptive - common • Deductive - rare, but increasing • Experimental – very rare
Why Study Palaeoecology? • Present-day ecology benefits from historical perspective • "Palaeoecology can provide the only record of complete in situ successions. The framework of classical succession theory (probably the most well known and widely discussed notion of ecology) rests largely upon the inferences from separated areas in different stages of a single hypothetical process (much like inferring phylogeny from the comparative analogy of modern forms). Palaeo-ecology can provide direct evidence to supplement ecological theory." • S.J. Gould, 1976 • "There is scarcely a feature in the countryside today which does not have its explanation in an evolution whose roots pass deep into the twilight of time. Human hands have played a leading role in this evolutionary process, and those who study vegetation cannot afford to neglect history." C.D. Pigott, 1978 • 2. Past analogue for future • 3. Intellectual challenge and desire to understand our past • 4. Reconstruction of past environment important to evaluate extent of natural variability • 5. 'Coaxing history to conduct experiments' (Deevey, 1969) • 6. Fun!
Mechanisms and modes of studying environmental change over different timescales (modified from Oldfield, 1983)
Descriptive historical science, depends on inductive reasoning. Uniformitarianism “present is key to the past”. Method of multiple working hypotheses. Simplicity “Ockham’s razor”. Sound taxonomy essential. Language – largely biological and geological. Data frequently quantitative and multivariate. Philosophy of Palaeoecology
WHAT ARE PALAEOECOLOGICAL DATA? Presence/absence or, more commonly, counts of fossil remains in sediments (lake muds, peats, marine sediments, etc). Fossils - pollen diatoms chironomids cladocera radiolaria testate amoebae mollusca ostracods plant macrofossils foraminifera chrysophyte cysts - biochemical markers (e.g. pigments, lipids, DNA) Sediments - geochemistry grain size physical properties composition magnetics stable isotopes (C,N,O)
Data are usually quantitative and multivariate (many variables (e.g. 30-300 taxa), many samples (50-300)). Quantitative data usually expressed as percentages of some sum (e.g. total pollen). Data may contain many zero values (taxa absent in many samples). Closed, compositional data, containing many zero values, strong inter-relationships between variables. If not percentages, data are presence/absence, categorical classes (e.g. <5, 5-10, 10-25, >25 individuals), or ‘absolute’ values (e.g. pollen grains cm-2 year-1). Samples usually in known stratigraphical order (time sequence). Some types of data may be modern ‘surface’ samples (e.g. top 1 cm of lake mud) and associated modern environmental data. Such data form ‘training sets’ or ‘calibration data-sets’.
Palaeoecological data are thus usually • stratigraphical sequences at one point in space or samples from one point in time but geographically dispersed • percentage data • contain many zero values
Multivariate data matrix MatrixXwithncolumns x mrows. n x mmatrix. Order (n x m). subscript X21 Xik element in row two column one row i column k
WHY ATTEMPT QUANTIFICATION IN PALAEOECOLOGY? • Data are very time consuming (and expensive) to collect. • Data are quantitative counts. Why spend time on counting if the quantitative aspect of the data is then ignored? • Data are complex, multivariate, and often stratigraphically ordered. Methods to help summarise, describe, characterise, and interpret data are needed (Lectures 2, 3 and 4). • Quantitative environmental reconstructions (e.g. lake-water pH, mean July temperature) important in much environmental science (e.g. to validate model hindcasts or back-predictions) (Lecture 5). • Often easier to test hypotheses using numerical methods (Lecture 6).
Reasons for Quantifying Palaeoecology 1: Data simplification and data reduction “signal from noise” 2: Detect features that might otherwise escape attention. 3: Hypothesis generation, prediction, and testing. 4: Data exploration as aid to further data collection. 5: Communication of results of complex data. Ease of display of complex data. 6: Aids communication and forces us to be explicit. “The more orthodox amongst us should at least reflect that many of the same imperfections are implicit in our own cerebrations and welcome the exposure which numbers bring to the muddle which words may obscure”.D Walker (1972) 7: Tackle problems not otherwise soluble. Hopefully better science. 8:Fun!
WHAT ARE THE MAIN APPROACHES TO QUANTIFICATION IN PALAEOECOLOGY? • Model building • explanatory • statistical • Hypothesis generation ‘exploratory data analysis’ (EDA) • detective work • Hypothesis testing ‘confirmatory data analysis’ (CDA) • CDA and EDA – different aims, philosophies, methods • “We need both exploratory and confirmatory” • J.W. Tukey (1980)
Model Building in Palaeoecology Model building approach Cause of sudden and dramatic extinction of large mammals in North America 10-12,000 years ago at end of Pleistocene. One hypothesis - arrival and expansion of humans into the previously uninhabited North American continent, resulting in overkill and extinction. Model - arrival of humans 12,000 years ago across Bering Land Bridge. Start model with 100 humans at Edmonton, Alberta. Population doubles every 30 years. Wave of 300,000 humans reaching Gulf of Mexico in 300 years, populated area of 780 x 106 ha. Population could easily kill a biomass of 42 x 109 kg corresponding to an animal density of modern African plains. Model predicts mammal extinction in 300 years, then human population crash to new, low population density.
A hypothetical model for the spread of man and the overkill of large mammals in North America. Upon arrival the population of hunters reached a critical density, and then moved southwards in a quarter-circle front. One thousand miles south of Edmonton, the front is beginning to sweep past radiocarbon-dated Palaeoindian mammoth kill sites, which will be overrun in less than 2000 years. By the time the front has moved nearly 2000 miles to the Gulf of Mexico, the herds of North America will have been hunted to extinction. (After Mosimann and Martin, 1975.)
CONFIRMATORY DATA ANALYSIS EXPLORATORYDATA ANALYSIS Real world ’facts’ Hypotheses Real world ‘facts’ Observations Measurements Data Observations Measurements Data Data analysis Statistical testing Patterns ‘Information’ Hypothesis testing Hypotheses Decisions Theory
Underlying statistical model (e.g. linear or unimodal response) Biological Data Y Exploratory data analysis Description Confirmatory data analysis Additional (e.g. environmental data) X Testable ‘null hypothesis’ Rejected hypotheses
induction Scientific H0 Scientific HA deduction Observation Prediction deduction Theory/Paradigm Conceptual design of study, choice of format (experimental, non-experimental) and classes of data Evaluate theory/paradigm Evaluate scientific H0, HA Statistical H0 Statistical HA Evaluate prediction Evaluate statistical H0, HA Sampling or experimental design Data collection Analysis The Popperian hypothetico-deductive method, after Underwood and others. HO = null hypothesis HA = alternative hypothesis
Data diving with cross-validation: an investigation of broad-scale gradients in Swedish weed communitiesERIK HALLGREN, MICHAEL W. PALMER and PER MILBERG Journal of Ecology, 1999, 87, 1037-1051. Full data set Remove observations with missing data Clean data set Ideas for more analysis Random split Flow chart for the sequence of analyses. Solid lines represent the flow of data and dashed lines the flowof analysis. Exploratory data set Confirmatory data set Hypotheses Choice of variables Some previously removed data Hypothesis tests Combined data set Analyses for display RESULTS
WHAT ARE THE AIMS OF THE COURSE? Provide introductory understanding to the most appropriate methods for the numerical analysis of complex palaeoecological data. Recent maturation of methods. Provide introduction to what these methods do and do not do. Provide some guidance as to when and when not to use particular methods. Provide an outline of major assumptions, limitations, strengths, and weaknesses of different methods. Indicate to you when to seek expert advice. Encourage numerical thinking (ideas, reasons, potentialities behind the techniques). Not so concerned here with numerical arithmetic (the numerical manipulations involved).
Teaching of statistics in ecology At its best, statistical analysis sharpens thinking about data, reveals new patterns, prompts creative thinking, and stimulates productive discussions in multi-disciplinary research groups. For many scientists, these positive possibilities of statistics are over-shadowed by negatives; abstruse assumptions, emphasis of things one can’t do, and convoluted logic based on hypothesis rejection. One colleague’s reaction to this Special Feature (on statistical analysis of ecosystem studies) was that “statistics is the scientific equivalent of a trip to the dentist.” This view is probably widespread. It leads to insufficient awareness of the fact that statistics, like ecology, is a vital, evolving discipline with ever-changing capabilities. At the end of the semester, could my students fully understand all of the statistical methods used in a typical issue of Ecology? Probably not, but they did have the foundation to consider the methods if authors clearly described their approach. Statistics can still mislead students, but students are less apt to see all statistics as lies and more apt to constructively criticise questionable methods. They can dissect any approach by applying the conceptual terms used throughout the semester. Students leave the course believing that statistics does, after all, have relevance, and that it is more accessible than they believed at the beginning of the semester. J.S. Clark 1994
A warning! “Truths which can be proved can also be known by faith. The proofs are difficult and can only be understood by the learned; but faith is necessary also to the young, and to those who, from practical preoccupations, have not the leisure to learn. For them, revelation suffices.” Bertrand Russell 1946 The History of Western Philosophy
“It cannot be too strongly emphasised that a long mathematical argument can be fully understood on first reading only when it is very elementary indeed, relative to the reader’s mathematical knowledge. If one wants only the gist of it, he may read such material once only, but otherwise he may expect to read it at least once again. Serious reading of mathematics is best done sitting bolt upright on a hard chair at a desk. Pencil and paper are indispensable.” L Savage 1972 The Foundations of Statistics. BUT: “A journey of a thousand miles begins with a single step” Lao Tsu
WHAT IS THE LEVEL OF THE COURSE? Approach from practical palaeoecological, biological, and geological viewpoint, not statistical theory viewpoint. Assume no detailed background in matrix algebra, eigenanalysis, or statistical theory. Emphasis on techniques that are palaeoecologically realistic and useful and that are computationally feasible.
WHAT ARE THE MAJOR NUMERICAL TECHNIQUES IN PALAEOECOLOGY? • Exploratory data analysis • 1a. Numerical summaries - means • medians • standard deviations • ranges • 1b. Graphical approaches - box-and-whisker plots • scatter plots • stratigraphical diagrams • 1c. Multivariate data analysis - classification • ordination (including discriminant analysis)
Numerical Techniques in Palaeoecology (cont.) • Confirmatory data analysis or hypothesis testing • Statistical modelling (regression analysis) • Quantitative environmental reconstruction (calibration = inverse regression) • Time-series analysis
MAJOR USES OF NUMERICAL METHODS IN PALAEOECOLOGY • Data collection and assessment • Identification • Error estimation • Data summarisation – summarise major patterns • Single data set • Two or more stratigraphical sequences • Two or more geographical data sets • Data analysis – estimate particular numerical characteristics • Sequence splitting • Rate-of-change analysis • Time-series analysis • Environmental reconstructions • Data interpretation • Vegetation reconstruction • Causative or ‘forcing’ factors
1. Exploratory Data Analysis 1a. Summary Statistics • Measures of location ‘typical value’ • (1) Arithmetic mean • (2) Weighted mean • (3) Mode ‘most frequent’ value • (4) Median ‘middle values’ Robust statistic • (5) Trimmed mean 1 or 2 extreme observations at both tails deleted • (6) Geometric mean R
Q1 Q2 Q3 (B) Measures of dispersion (1) Range A = 0.37 B = 0.07 (2) Interquartile range ‘percentiles’ (3) Mean absolute deviation ignore negative signs Mean absolute difference 10/n = 2.5
(B) Measures of dispersion (cont.) (4) Variance and standard deviation Variance = mean of squares of deviation from mean Root mean square value SD (5) Coefficient of variation Relative standard deviation Percentage relative SD (independent of units) mean (6) Standard error of mean R
CI around median 95% Median 1.58 (Q3) / (n)½ quartile 1b. Graphical Approaches (A) Graphical display of univariate data Box-and-whisker plots – box plots R
Box plots for samples of more than ten wing lengths of adult male winged blackbirds taken in winter at 12 localities in the southern United States, and in order of generally increasing latitude. From James et al. (1984a). Box plots give the median, the range, and upper and lower quartiles of the data.
Three-dimensional perspective view for the first three variables of the iris data. Plants of the three species are coded A,B and C. Triangular arrangement of all pairwise scatter plots for four variables. Variables describe length and width of sepals and petals for 150 iris plants, comprising 3 species of 50 plants.
(C) Graphical display of multivariate data FOURIER PLOTS Andrews (1972) Plot multivariate data into a function. where data are [x1, x2, x3, x4, x5... xm] Plot over range -π ≤ t ≤ π Each object is a curve. Function preserves distances between objects. Similar objects will be plotted close together. MULTPLOT
Other types of graphical display of multivariate data involve some dimension reduction methods (e.g. ordination or clustering techniques), namely multivariate data analysis.
1c. Multivariate Data Analysis EUROPEAN FOOD (From A Survey of Europe Today, The Reader’s Digest Association Ltd.) Percentage of all households with various foods in house at time of questionnaire. Foods by countries. Country
Clustering Dendrogram showing the results of minimum variance agglomerative cluster analysis of the 16 European countries for the 20 food variables listed in the table. Key: Countries: A Austria, B Belgium, CH Switzerland, D West Germany, E Spain, F France, GB Great Britain, I Italy, IRL Ireland, L Luxembourg, N Norway, NL Holland, P Portugal, S Sweden, SF Finland
Ordination Key: Countries: A Austria, B Belgium, CH Switzerland, D West Germany, E Spain, F France, GB Great Britain, I Italy, IRL Ireland, L Luxembourg, N Norway, NL Holland, P Portugal, S Sweden, SF Finland Correspondence analysis of percentages of households in 16 European countries having each of 20 types of food.
Minimum spanning tree fitted to the full 15-dimensional correspondence analysis solution superimposed on a rotated plot of countries from previous figure.
Geometric models Pollen data - 2 pollen types x 15 samples Depths are in centimetres, and the units for pollen frequencies may be either in grains counted or percentages. Adam (1970)
Alternate representations of the pollen data Palynological representation Geometrical representation In (a) the data are plotted as a standard diagram, and in (b) they are plotted using the geometric model. Units along the axes may be either pollen counts or percentages. Adam (1970)
Geometrical model of a vegetation space containing 52 records (stands). A: A cluster within the cloud of points (stands) occupying vegetation space. B: 3-dimensional abstract vegetation space: each dimension represents an element (e.g. proportion of a certain species) in the analysis (X Y Z axes). A, the results of a classification approach (here attempted after ordination) in which similar individuals are grouped and considered as a single cell or unit. B, the results of an ordination approach in which similar stands nevertheless retain their unique properties and thus no information is lost (X1 Y1 Z1 axes). N. B. Abstract space has no connection with real space from which the records were initially collected.