Cajo J. F. ter Braak Biometris, Wageningen University and Research Centre Cajo.terBraak@wur.nl www.biometris.wur.nl/UK/S

Cajo J. F. ter Braak Biometris, Wageningen University and Research Centre Cajo.terBraak@wur.nl www.biometris.wur.nl/UK/Staff/Cajo+ter+Braak History of canonical correspondence analysis in ecologycorrespondence analysis without chi-squareandthe barycentric principle as ecological niche model Biometris quantitative methods in life and earth sciences 25 year 1986-2011

History of canonical correspondence analysis in ecology • Introduction: • Papers 1986-88 & citation • Example data, eigen equation & diagram • History of CA in ecology • Theory, WA method • From WA to CA, algorithm • Transition formulas, but why optimal? • History of CCA in ecology • my prehistory: absences don’t count 1,2,3,4,5 • my history 1-3 • Theory of CCA • From WA to CA to CCA, algorithm • Eigen equation and its origin • My CCA derivation 1-2 • Duality diagram • CCA, triplets & biplots • Ordination diagram • Description, Data, & example • CCA in R • CCA, RC model & ideal point discriminant analysis • Later landmarks • Summary of CA-related methods in ecology • Open questions

Canonical Correspondence Analysis (CCA) 25 years 1986 • New multivariate technique to relate community composition to known variation in the environment • Extends Correspondence Analysis (CA) • Two steps (1. CA, 2. interpret axes) →one step (CCA) • Ordination diagram: the pattern of community variation that can be explained best by known environment CCA = (M)CA with external linear restrictions on the row points from the abstract:

1987 duality diagram of CCA (ACC)

barycentric principle

Citations, Canoco and usage 3000+ citations to ter Braak (86,87,95) on CCA

DATA: CtB, Ecology 1986 Spider abundance Yn×m and environment Zn×p • n = 28 sites (pitfalls), m = 12 spider species , p = 6 environmental variables sites species YT= ZT= env. var. discretized data row/column order from CCA

Eigenvector method to relate two data matrices Y and Z Y with non-negative values {yik}nm(e.g. abundances of of m species in n sites) Z quantitative and/or nominal predictor matrix {zij}np(on p environmental variables) Eigen equation: with Dc = diag(y+1 ,...,y+m) , Dr = diag(y1+ ,...,yn+) Looks like: canonical correlation analysis, but Dc/r ..?... Special case: Z = In×n or indicator matrix  correspondence analysis Canonical Correspondence Analysis (CCA)? CtB, Ecology 1986 Y~Z (ZTY (Dc-1) YTZ – λZTDrZ)b =0 eigen eq in appendix

CtB, Ecology 1986 CCA ordination diagram (factorial plane) Projection of species points on BARE SAND arrow shows approx. ranking of the species centroids along this variable ukspecies xi*sites c* env var joint plot { x*, u }of Y as in CA biplot {u, c*} of Dc-1YT Z={weighted average of species wrt environment variables} weighted least squares biplot

yik History of CA in ecology builds on (1) Theory: “a plant species does not grow when it is either too wet or too dry” Liebig’s law: a species requires a minimum amount of a resource (e.g. N): agriculture Shelford’s law of tolerance (1919): but also does not tolerate more than a certain maximum →resource Ecological niche: region where species actually grows Niches vary among species preference model

Independently: Roux&Roux, RSA,1967 History of CA in ecology builds on (2) Method of weighted averaging (WA) to obtain • preference or indicator value of a species wrt to a physical gradient, e.g. moisture • WA of moisture values at sites where species occurs • estimate of moisture value at a site • WA of indicator values of species that occur there Gause(1930), Ellenberg (1948), Whittaker (1948) Idea: iterate to replace a manifest gradient by the best latent one → Reciprocal averaging (Hill, 1973,74) == CA “le principe barycentric” both ways

From weighted averaging (WA) to CA (1) yik i  sites k  species yik ≥ 0 • WA: uk = i yik moisturei /i yik →moisture →moisture uk = centroid , weighted average wrt moisture, “optimum”

From weighted averaging (WA) to CA (1) yik i  sites k  species yik ≥ 0 • WA: uk = iyikmoisturei/iyik For presence/absence data yik is 1 or 0. Example: species k found at sites with moisture values 20, 30 and 40 and nowhere else, then uk = (1*20+1*30+1*40)/(1+1+1)=90/3 = 30 →moisture →moisture uk = centroid , weighted average wrt moisture, “optimum”

Reverse (calibration): weighted averaging of species optima From training data or literature, the ‘optima’ {uk} of 200 species wrt a moisture scale [0-100] are known. Then an estimate of the moisture value at a site is obtained from the species composition at the site by the weighed average: xi = kyikuk/ kyik-site score is (prop. to) WA of species scores where yikis abundances (amounts,yik≥ 0 ) of these species. Example: Let site i contain the species with optima 75, 80, 85 Then it moisture value is estimated as xi = (1*75+1*80+1*85)/(1+1+1)=80 (the average of the optima of species present) If the abundances (amounts) of these species are 4, 1,1 then xi = (4*75+1*80+1*85)/(4+1+1)= 465/6=77.5 (the weighted average of the optima of species present)

Start with arbitrary site scores { xi } with zero mean Calculate new species scores = WA of site scores: uk= iyikxi/iyik Calculate new site scores = WA of species scores: xi= kyikuk/ kyik Remove arbitrariness in scaling by standardizing the site scores Stop on convergence, i.e. when site scores stay the same after cycle WA = Weighted Averaging; RA = Reciprocal Averaging Algorithm CA (Hill 1973)

==Two way weighted summation 1-bk = iyikxi xi = kyikbk  eigenvalue >0(prop. to variance explained) Correspondence analysis == Two way Weighted Averaging 1-uk = iyikxi/ iyik-species score is (prop. to) WA of site scores xi = kyikuk/ kyik-site score is (prop. to) WA of species scores eigenvalue(between 0 and 1)  = scaling of the axis →Transition formulae (Hill 1973) But what does it optimize? For comparison Principal components (PCA) Reciprocal linear regression bk = iyikxi /i xi2 xi = kyikbk/k bk2

But what does it optimize? • Seriation: recovery of a Petrie matrix, consecutive 1’s • Relation with canonical correlation: max. correlation of row and column quantification • Finds block structures in two-way data → cluster analysis (TWINSPAN), spectral clustering • Also: bad features • sensitivity to rare species • arch effect (Guttman effect) DECORANA, 1979 Hill & Gauch 1980  truth CA Hill 1974, Heiser 1981

1979-1984 How was CCA derived? My prehistory • Benzécri’s 1973 L’analyse des donneés I and II with Hans • MSc Newcastle (UK, ’79-80), learned ML eqs from RL Plackett • contacts with: Colin Prentice (NMDS) and Mark Hill (CA/RA/DECORANA) • Mark did not like my PCA biplot and diversity paper (Ecology ’83). • Ecology is not linear... • Differences in niche location create diversity! See DECORANA • Second Gifi course Leiden (’81):J de Leeuw/W Heiser • From ’81, for ecologists, I studied properties of • Weighted Averaging vs ML in ecological niche models (Gaussian model) (ter Braak Vegetatio ’86, Math. Biosc. ’86) • Reciprocal Averaging (CA) vs ML (ter Braak Biometrics ’85) • When might WA be (nearly) as good as ML?

Absences don’t count in...weighted averaging Greig-Smith 1983 • “telephone poles have an optimum of pH of 5.5” h(x|y>0) = probability density function of x given species presence f(y>0|x) = occurrence probability given x g(x) = (prior) pdf of x Bayes’theorem h(x|y>0) g(x) f(y>0|x) WA/CA uses response function in logistic regression uses g(x) is same for all species, so perhaps distinction does not matter in MVA...

WA for Absences don’t count in... estimating optima a) Regression and WA: • response functions for species A and B: • x~ Uniform or even, g(x) ~ 1 • But for this g(x) reversal of optima So be warned, use GLM ... b) g(x) c) ter Braak, Vegetatio 1986

WA for Absences don’t count in... estimating site conditions Calibration and WA: Find a model such that the weighted average xi = kyikuk/ kyikis as efficient as maximum likelihood (ML): species packing model μik= ckexp{-½ ( xi-uk)2/tk};yik~Poi(μik) -ukuniform over large interval A -ckconstant or independent of uk -tk constant ter Braak, Math. Biosc. 1986

Absences don’t count: CA vs Gaussian ordination • Transition formulas approximate ML equations of equi-width Gaussian response model with latent predictor x μik= ckexp{-½ ( xi-uk)2 /tk}; yik~Poi(μik) -ukuniform over large interval A -ckconstant or independent of uk -xiuniform over large interval B in A -tk constant Unimodal model ter Braak Biometrics 1985

1984-5 How was CCA derived? My history (1) • Preparing with Colin Prentice in Uppsala (1984) a Theory of Gradient Analysis, search for “something like canonical correlation for niche models” Idea: • linear constraints on scores in Gaussian ordination and approximate the ML equations, as I did for CA (Oct 1984) 1985: • linear combination of Z that best separates species niches Niche: region in x-space which species occurs (yik>0)

1985 & Canoco How was CCA derived? My history (2) • Quick first draft → report (1985) sent around. • Jan de Leeuw commented quickly: • CCA is a relatively simple special case of HOMALS (or, if you like CANALS). [YES!] • this suggests directly which geometry Gifi would use. Nothing bilinear, nothing biplot, just centres of gravity [YES, for nominal predictors] • Extending DECORANA (Hill, 1979) to include CCA → CANOCO v1.0 ’85 and later PCA & redundancy analysis • Help from Onno van Tongerenand Petr Smilauer for CANOCO v2.0 1985; v3.0 1990 (permutation testing), v4.0 1998 (1st Windows version), v4.5 2002 (2nd Windows version) • Ambassadors: Colin Prentice, John Birks, Paul van de Brink

1986-France How was CCA derived? My history (3) • Partial CCA at 1st IFCS in Aachen (Germany): • Y. Escoufier stood up and ..... disliked CCA and ... invited me to Montpellier to work with JD Lebreton, D Chessel and P Sabatier, who independently invented CCA just a few months after I did. • I tried to bridge the gap: • lectured in English with only French references • learned duality diagrams • Later found : RH Green, Ecology 1971,1974: multi-group discriminant analysis - a precursor of CCA (lacking environmental arrows) Interest lostin same time that CA got popular via Mark Hill (1973-81)! Lets celebrate now 40 years of CCA!

From weighted averaging (WA) to CA (2) yik i  sites k  species yik ≥ 0 • WA: uk = iyikmoisturei/iyik How well does moisture explain the data? δ = dispersion of species scores δ = ky+kuk2/ y++ wrt standardized env. var. =B/T uk = centroid , weighted average wrtenv. var., “optimum”

From weighted averaging (WA) to CA (3) yik i  sites k  species yik ≥ 0 • WA: uk = iyikmoisturei/iyik • CA: uk = iyikxi /iyik {xi} = Site scores that best separate the species curves as measured by the dispersion of the {uk} uk = Species score, centroid, weighted average “optimum” Arch effect: folded first axis has also high δ2

From weighted averaging (WA) to CCA (4) yik i  sites k  species yik ≥ 0 • WA: uk = iyikmoisturei/iyik • CCA: xi = icjzijuk= iyikxi /iyik {xi} = Site scores that are the linear combination of environmental variables {zj} that best separate the species curves as measured by the dispersion of the {uk} In multigroupdisciminant: max B/T uk = Species score, centroid, weighted average “optimum”

Start with arbitrary site scores with zero mean Calculate new species scores = WA of site scores Calculate new site scores = WA of species scores Remove arbitrariness in scaling by standardizing the site score Stop on convergence, i.e. when site scores stay the same after cycle WA = Weighted averaging; (C)CA = (Canonical) Correspondence Analysis C and constrain the site scores by a weighted* multiple regression on the env. variables; take fitted values as new scores Algorithm CA *Site weights = { yi+} but why?

Eigenvector method to relate two data matrices Y and Z Y with non-negative values {yik}nm(e.g. abundances of of m species in n sites) Z quantitative and/or nominal predictor matrix {zij}np(on p environmental variables) Eigen equation: with Dc = diag(y+1 ,...,y+m) , Dr = diag(y1+ ,...,yn+) Looks like: canonical correlation analysis, but Dc/r ..?... Special case: Z = In×n or indicator matrix  correspondence analysis Canonical Correspondence Analysis (CCA)? CtB, Ecology 1986 Y~Z (ZTY (Dc-1) YTZ – λZTDrZ)b =0 eigen eq in appendix

Origin of CCA - Why is this equation appealing? • Transition formulas approximate ML equations of equi-width Gaussian response model with latent predictorx = Zb (b unknown) μik= ckexp{-½ ( xi-uk)2}; yik~Poi(μik) -ukuniform over large interval A -ckconstant or independent of uk -xiuniform over large interval B in A As in CA, but now with linear restrictions ter Braak 1986,1987

CtB, Ecology 1986 CCA derivation (1): ML eqs of equi-width Gaussian response model with latent predictor xi = jzijbj{bj}to be estimated uk = iyikxi/y+k – [i(xi - uk)μik/y+k ] (A.1) izij[kyik (xi - uk)] = i[k(xi - uk)μik]zij (A.2) Under the conditions k(xi - uk)μik]≈0 and i(xi - uk)μik≈-λ*uky+k we obtain λ uk = iyikxi/y+k(λ = 1- λ*) izij[kyik (xi - uk)]=0 (CtB,Biometris 1985)

CtB, Ecology 1986 CCA derivation (2) : Transition formulas of CCA From: xi = jzijbj λ uk = iyikxi/y+k izij[kyik (xi - uk)]=0 • λ uk = iyikxi/y+k • xi*= kyikuk/ yk+ • b = (ZTRZ)-1ZTRx* • x = Zb  In matrix notation:  u = Dc-1YT x x*= Dr-1Yu b = (ZTDrZ)-1ZTDrx* x = Zb b = (ZTDrZ )-1ZTDrx* = (ZTDrZ )-1ZTYu = (ZTDrZ )-1ZTY Dc-1YT x-1 =(ZTDrZ )-1ZTY Dc-1YT Zb-1 (ZTY (Dc-1) YTZ – λZTDrZ)b =0

Duality diagrams and transition formulas

CCA, PCA-triplets and biplots fittedcontingency ratios fittedtable of WA’s low rank regression coefs

CtB, Ecology 1986 Ordination diagram (factorial plane) • Two sets of sites (row) scores • x derived from Z (environmental data) • x* derived from Y (species data)[chosen to be closer to Y] • Plot x*, u and interset correlations, c* = cor (Z,x* ) • CA type of joint plot { x*, u }of Y • WLS-biplot {u, c*} of Dc-1YT Z ={weighted average of species wrt environment variables} • For nominal predictors: class point at centroid of x* CA type plot, but no example given except in Canoco manual (instead of b)

CtB, Vegetio,1987 Dune meadow data n = 20 sites (meadows), m = 30 plant species , p = 8 environmental variables Predictors mix of • quantitative (3) • nominal (1 with 4 classes BF,SF,NM,HF) → class centroids (squares) • pCCA uses all power of regression • factors, interactions between pred. YT= ZT= discretized data row/column order from CCA

CCA Dune meadow biplot of species’ centroids (WA) w.r.t. Manure u = WA(x) λx= WA(u) xi*sites Barycentric principle everywhere: projection points of species are centroids of the site projections on the same environmental variable c* env var squares: class centroids specuk,

CCA in R packages anacor, vegan, ade4 #e.g.: library(anacor) # Jan de Leeuw & Patrick Mair # CCA res = anacor(dune, row.covariates = dune.env) plot(res) library(vegan) cca(Y~ Z) # also with a formula interface # e.g factors A, B and C cca(Y~ A*B + Condition(C))

(C)CA & RC-model, ideal point discr. anal. bilinear model • Goodman’s RC model Eyik = rickexp(xi T uk ) is equivalent to • Eyik = rickexp(d2(xi ,uk )) with d2(xi ,uk )= (xi -uk )T(xi -uk ) which Ihm’ model B, and with x=Zbidentical to ideal point discriminant anal. (C)CA is approx. to this model (ter Braak 1988, Takane ...) Goodman showed this for small λ, ter Braak for large λ distance model See de Rooij, 2007 JCGS for plot scalings

and forward selection ... Later landmarks • Cointertia analysis(Dolédec& Chessel , 1994) • Linear restrictions, norm on bcoefficients • CCA-PLS(ter Braak & Verdonschot, 1995)Key: 1.Pre-transform: • 2. apply PLS 3. post-transform! • Variance partitioning (Borcard, Legendre, Drapeau, 1992) • RLQ (Dolédec et al. 1996) ; doubly constrained CA (Lavourel...) • All (?) CA methods obtainable from standard methods • by inflating the data matrices (super indicator matrices, in ecology: unit = the individual instead of the site • by pre and post transformation →Regularized generalized canonical correlation, Tenenhaus2

Method Abbr Response Predictors vars Correspondence analysis CA Community Canonical CA CCA Community Environmen data tal variables CCA Partial Least Squares CCA-PLS Community Many env. vars Weighted averaging WA Env.var. Community data WA-PLS WA-PLS Env.var(s) Community Co-correspondence CO-CA Community Community analysis Summary: CA-related methods in ecology - Coinertia anal., RLQ,...

Open questions (C)CA is a chameleon: sometimes shows up as • linearmethod:reconstitution formula & biplot • unimodal method: relation with CVA & r-c distance plot In the RC model and logratio analysis: clearly both With arch effect: 1-d reconstitution bad.... Can Greenacre 2009 power transformations shed light? Sparse CA? Sparse CCA? Apply principal response curves (PRC) in CCA context PRC: good for display of interactions

Max δ* = i,kyik(xi -uk)2/ y++ subject to iyi+ xi =0,i(yi+ /y++)xi 2=1 and x=Zb squared distance (xi -uk)2has meaning (unfolding) the data are a kind of weights and are not approximated in any formal way. See also Nishisato’s (1980) quantification of row and columns in one way ANOVA: his D2 = 1/δ*=B/W and ideal point model Unfolding criterion As in multigroupdisciminant: max W/B (cf. Heiser 1987, Takane et al 1991)

CA: Max !δ = ky+k uk2/ y++ with uk = iyikxi /iyik, ,weighted average of sites scores {xi } which are Dr-standardized iyi+ xi =0 and i(yi +/y++)xi 2=1 . CCA: extra contraintx=Zb δ= dispersion = weighted variance = inertia Best separation of species niches, max dispersion δ In multigroupdisciminant: B/Winstead of δ=B/T ter Braak 1987

Gaussian ordination: • log(Eyik) = ck - (xi - uk)2 / 2tk2 {xi} = Site scores that best explain the species data by Gaussian curves Gaussian regression → ordination yki yik abundance ↑ i  sites k  species yik ≥ 0 • Gaussian regression: log(Eyik) = ck - (moisture - uk)2 / 2tk2 →moisture {uk} = Species scores or optimum

Gaussian ordination? • Axes cannot be extracted one after the other; a joint fit is needed • difficult to fit more than 1 axis, but see R{VGAM} • Axes are not nested: • solution for axis 1 changes when fitting 2 axes jointly • this problem is common outside the eigenvalue techniques such as PCA - CCA • Ecologists invented a simpler approximate method: reciprocal averaging alias correspondence analysis (CA)

Cajo J. F. ter Braak Biometris, Wageningen University and Research Centre Cajo.terBraak@wur.nl www.biometris.wur.nl/UK/S