230 likes | 397 Views
Correlating traits with phylogenies. Using BaTS. Phylogeny and trait values. A phylogeny describes a hypothesis about the evolutionary relationship between individuals sampled from a population Discrete character traits of interest can be mapped onto the phylogeny
E N D
Correlating traits with phylogenies Using BaTS
Phylogeny and trait values • A phylogeny describes a hypothesis about the evolutionary relationship between individuals sampled from a population • Discrete character traits of interest can be mapped onto the phylogeny • A significant association between a particular trait value and its distribution on a phylogeny indicates a potential causative relationship
Phylogeny and trait values • A phylogeny describes a hypothesis about the evolutionary relationship between individuals sampled from a population
Phylogeny and trait values • Discrete character traits of interest can be mapped onto the phylogeny
Phylogeny and trait values • A significant association between a particular trait value and its distribution on a phylogeny indicates a potential causative relationship
Phylogeny and trait values • Often, the phylogeny-trait relationship does not appear unequivocal by eye: an analytical framework may be needed. (clear association) (no association) ????
Phylogeny and trait values The null hypothesis The null hypothesis under test is one of random phylogeny-trait association; that is, that “No single tip bearing a given character trait is any more likely to share that trait with adjoining taxa than we would expect due to chance”
An example • Salemi et al (2005)*: Dataset of HIV sequences sampled from CNS tissues post mortem • Analysis by Slatkin-Maddison (1989) method, reanalyzed in BaTS**. • Compartmentalization by tissue type: circulating viral populations defined by location in the body: *Salemi et al. (2005) J. Virol79(17): 11343-11352. **Parker, Rambaut & Pybus (2008) MEEGID8(3):239-246.
Available methods • Non-phylogenetic: ANOVA • Ignores shared ancestry • Phylogenetic: • Single tree mapping • Slatkin-Maddison & AI • BaTS
Methods: Single-tree mapping • Method: • Map traits onto a tree • Look for correlation • Pros: • Fast • Simple • Cons: • No indication of significance • Statistically weak (high Type II error) • Conditional on a single topology
Methods: Slatkin-Maddison & AI • Method: • Map traits onto a tree by parsimony & count migration events (Slatkin-Maddison) or measure ‘association index’ within clades recursively (AI) • Compare observed value with a null (expected) value obtained by bootstrapping • Pros: • Still reasonably fast • Indication of significance • Cons: • Still conditional on a single topology
Methods: BaTS • Method: • See below(!) • Pros: • Indication of significance • Statistically powerful and Type I error is correct • Accounts for phylogenetic uncertainty • Cons: • Requires Bayesian MCMC sequence analysis • Slower
BaTS: under the bonnet • Use a posterior distribution of phylogenies from Bayesian MCMC analysis • Calculates migrations, AI and a variety of other measures of association • Both observed and expected (null) values’ posterior distributions sampled • Significance obtained by comparing observed vs. expected
BaTS: analysis workflow • Preparation: • Sequence alignment • Bayesian MCMC phylogeny reconstruction (BEAST, MrBAYES) to obtain posterior distribution of trees (PST) • Taxa in PST marked up with discrete traits • BaTS analysis • Interpretation
Workflow: Preparation (i) • Sequence alignment: • CLUSTAL, BioEdit, SE-Al • Bayesian MCMC analysis: • MRBAYES, BEAST • Taxa marked-up with traits
Workflow: Preparation (ii) • Taxa marked-up with traits: Typical NEXUS format:
a) Declare ‘states’ block begin states; b) Assign a trait to each taxon in the order that they appear in the original #NEXUS file c) Close the ‘states’ block. d) Omit ‘translate’ and ‘taxa’ blocks. Workflow: Preparation (iii) • Taxa marked-up with traits:
Workflow: BaTS analysis To use BaTS from the command-line, type: java –jar BaTS_beta_build2.jar [single|batch] <treefile_name> <reps> <states> Where: single or batch asks BaTS to analyse either a single input file, or a whole directory (batch analysis) <treefile_name> is the name and full location of the treefile or directory to be analysed, <reps> is the number (an integer > 1, typically 100 at least) of state randomizations to perform to yield a null distribution, and <states> is the number of different states seen.
30 trees were detected in the input file Output: statstics, one per line, tabulated (housekeeping and debugging messages) The ‘MC…’ statistics are reported in the order in which they occur in the input file The analysis • C:\joeWork\apps\BaTS\BaTS_beta_build2\BaTS_beta_build2>java -jar BaTS_beta_build 2.jar single example.trees 100 7 • Performing single analysis. • File: example.trees • Null replicates: 100 • Maximum number of discrete character states: 7 • analysing... 30 trees, with 7 states • analysing observed (using obs state data) • 30 29 • 30 29 • 30 29 • 30 29 • Statistic observed mean lower 95% CI upper 95% CU null mean lower 95% CI upper 95% CI significance • AI 1.5555052757263184 1.1128820180892944 2.160351037979126 12.03488540649414 11.475320040039 12.6391201928711 0.0 • PS 18.5 17.0 20.0 80.7713394165039 77.86666870117188 83.56666564941406 0.0 • MC (state 0) 12.633333206176758 9.0 16.0 1.7496669292449951 1.399999976158142 2.1666667461395264 0.009999990463256836 • MC (state 1) 19.0 19.0 19.0 1.7480005025863647 1.33333337306976 32 2.0999999046325684 0.009999990463256836 • MC (state 2) 12.666666984558105 12.0 13.0 1.77991247559 1.33333697632 2.200000047683716 0.009999990463256836 • MC (state 3) 8.566666603088379 3.0 11.0 1.66733866943 1.2333333492279053 2.133333444595337 0.009999990463256836 • MC (state 4) 11.0 11.0 11.0 1.5526663064956665 1.16666662693023 68 2.0999999046325684 0.009999990463256836 • MC (state 5) 3.433333396911621 2.0 6.0 1.4840000867843628 1.100000023841858 2.0333333015441895 0.009999990463256836 • MC (state 6) 5.066666603088379 5.0 6.0 1.2973339557647705 1.0333333015441895 1.600000023841858 0.009999990463256836 • done • Done.
Workflow: Interpretation The null hypothesis The null hypothesis under test is one of random phylogeny-trait association; that is, that “No single tip bearing a given character trait is any more likely to share that trait with adjoining taxa than we would expect due to chance”
Workflow: Interpretation The statistics: • Larger values increased phylogeny-trait association • Significance indicated by p-value • In addition, observed posterior values are informative for some statistics: • PS: indicates migration events between trait values • MC(trait value): indicates number of taxon in largest clade monophyletic for that trait value
FAQs / common pitfalls • Java 1.5 or higher is required. See java.sun.com for more. • Large datasets can be slow, so down-sample input tree files (uniformly, not randomly) where necessary, or to check BaTS input files are marked-up correctly. • A RAM (memory)shortage can slow the analysis, use –Xmx switch to allocate virtual RAM* • Check input file mark-up carefully if in doubt. *See more: http://edocs.bea.com/wls/docs70/perform/JVMTuning.html
Author contact: Joe Parker Department of Zoology Oxford University, UK OX1 3PS joe@kitserve.org.uk http://evolve.zoo.ox.ac.uk