1 / 30

Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept

Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept Muséum National d’Histoire Naturelle Paris Scientific Advisory Board of the CBOL Data Analysis Working Group. What is the molecular signature of speciation events?.

tawana
Download Presentation

Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept Muséum National d’Histoire Naturelle Paris Scientific Advisory Board of the CBOL Data Analysis Working Group

  2. What is the molecular signature of speciation events? There is no molecular signature of speciation events What are the other signatures of speciation events? There is no universal signature of speciation events But there are local signatures of speciation events, and one kind of signature (e.g. morphological) can be present when the other (e.g. genetical) is absent

  3. GC% at COII in hexapoda Other hexapoda European earwig Forficula auricularia earwigs Two examples : 1st / 2 A case of two mtDNA species with no morphological difference In 1998, the common European earwig was shown to consist of two sympatric and reproductively isolated species differing only in the number of annual broods (one or two broods per year). The two species differ strikingly in COII sequence This is because the GC% of these species evolves at a very high rate But since they present no apparent morphological difference, the two species remain unnamed Wirth, Le Guellec, Vancassel, & Veuille. 1998. Evolution 52: 260-265 Wirth, Le Guellec, & M. Veuille. 1999 MBE, 16: 1645-1653.

  4. Two examples : 2nd / 2 A case of two morphological species with no mtDNA difference São Tome Drosophila santomea Drosophila yakuba Drosophila santomea lives in the highlands of São Tome above 1100 m Drosophila yakuba lives in the lowlands, below 1100 m. They hybridize at 1100 m, and nevertheless remain genetically distinct They share the same mitochondria, but can be easily identified through the colour pattern of the abdomen After Lachaise et al. Proc. Roy Soc. London, 2000

  5. 1978 Cameroon 1974 Tropical Africa 1971 Tropical Africa 1954 Tropical Africa 2000 São Tome island 1830 Tropical Africa + worldwide 1919 Tropical Africa + worldwide 1974 Mauritius island 1981 Sechelles islands D. santomea D. yakuba D. orena D. erecta D. teissieri D. yakuba D. santomea D. melanogaster D. simulans D. mauritiana D. sechellia They belong to the Drosophila melanogaster ("black abdomen") subgroup Share the same mitochondrion through common descent

  6. The condition of the barcoder is challenging The species concept is hotly debated There are many definitions of species « Species » make sense to everybody. For example, 12% of the nouns in the French vocabulary* correspond to taxa that make sense to a taxonomist (species, families, varieties) A solution is to let people use whatever species concept they prefer and limit the barcoder’s activity to the domain where he/she can be helpful * : From the Robert a classic French dictionary

  7. What data analysis is about ?0,000,000 species Black box (barcoder) (taxonomist) Data & tools Data analysis consists in providing data to taxonomists, in order to make decisions about the status of specimens and taxa. Barcoding and taxonomic decisions are logically distinct, even though they can be performed by the same person. « This is species A or B » « This is a new species »

  8. Tree of life closest COI validated node Closest validated node using additional information Local barcode Tree of life closest validated node Query sequence Local barcode sister group What data analysis is about (contd) If we want to be 100% sure of the assignment of a taxon, thenwe must look at the nodes below the closest node excluding a sister group with probability p < 0.01. Below this point, a series of statistical and classificatory approaches allow us to estimate the probability that the query sequence belongs or not to an already described species, based on the available information. Alternatively, additional information using other genes, or an enlarged dataset can increase our understanding of the taxonomic status of the query.

  9. The population genetics background behind data analysis

  10. Principle two sequences from the same population find their last common ancestor with some constant probabiilty p = 1/N It is a « death process » Very different from a normal distribution Past (generations) The most probable coalescence time: t = 1 P = 0.05 for: t = 3N the expectation: t = N

  11. MRCA p n 39 9 19 2 Sample n1 Probability p that the MRCA of a sample of size n is also the MRCA of the species assuming a standard Wright-Fisher model. In a very large population p = (n-1)/(n+1) p increases very rapidly. The probability is p = 0.6667 for n = 5, and p = 0.8 for p = 9 Increasing the sample size beyond this is useless

  12. MRCA Sample n1 Typically, under a standard equilibrium Wright-Fisher model(*) , the expected time to the last common ancestor of the tree (MRCA) is only twice the time to the common ancestor of two randomly sampled sequences N generations 2N (1-1/n) generations • (*) assuming : • neutrality • constant population size • no structuring • mutation drift-equilibrium • N = effective number of genes

  13. MRCA MRCA Sample n2 > n1 Sample n1 Using a larger dataset does not increase the information very much at this level N generations 2N (1-1/n) generations « The older nodes of a genealogy tend to be revealed in a small sample, whereas more recent portions are, on average, only revealed as the sample size per locus grows large. »  Kliman et al. 2000.

  14. After AG Clark 1997 polymorphisms can go very far, back in the past of the species, and enter the ancestral population with a sister species A long time after they have split, two species still share some neutral polymorphisms.

  15. Exploring shallow nodes

  16. 1. Nielsen and Matzen’s MCMC method Derived from Nielsen and Hey’s (2001) IM method, based on MCMC (Monte Carlo Markov chains). This method estimated 5 Parameters, thus involving very long computation time

  17. 1. Matz and Nielsen’s MCMC method • Derived from Nielsen and Hey’s (2001) • IM method, based on MCMC • (Monte Carlo Markov chains). • This method estimated 5 • Parameters, thus involving very long computation time • Matz and Nielsen (2005) reduce it to two parameters: • - the population size • time to speciation. • They estimate the probability that the query sequence belongs or not to the same species as the reference sample

  18. The classification methods partition the dataset using a few characters The distance methods work well with a small dataset, provided there are enough mutations 2. Evaluating classification and phylogenetic methods : Austerlitz et al. They compare two classification methods CART random forest And two phylogenetic methods Neighbour-joining phy-ML They simulate n +1 individuals in each species. n individuals are a reference sample the last individual is the query. Repeated simulations, allow them to record the rate of correct assigment of the query to its species

  19. Comparison of the methods for a low q (2 populations, reference sample size = 10, q = 3) Classification methods perform better for a low variation

  20. Comparison of the methods for a high q (2 populations, Reference sample size = 10, θ = 30) Phylogenetic methods perform better for a highly variable population

  21. Conclusion :the appropriate method varies with the properties of the dataset

  22. Comparing methods using realistic datasets

  23. 1. Litoria nannotis 4 species Average sample size: 43.7 average q = 1.54 2. Astraptes fulgeraptor 12 species Average sample size: 38.8 average q = 23.5

  24. 3. Cowries

  25. Other solutions: Can we replace CO1 ?Can we complement it with other genes

  26. Properties of bilaterian mtDNA Other systems rDNA has a high copy number Large number of copies per cell Microsatellites also High mutation rate Low variation / divergence ratio Centromeres, telomeres (documented in Drosophila) No recombination Centromeres, telomeres (documented in Drosophila) Haploid X-chromosome, Y chromosome Maternally inherited asexual The Y is asexual The other chromosomes recombine Variation in mtDNA is lowered due to selective sweeps according to Bazin et al (2006) Variation is also lowered in some nuclear regions due to background selection The main disadvantage of maternal inheritance is that mitochondria can be transferred horizontally along with Wolbachia endosymbiotic bacteria. Examples: Protocalliphora and Drosophila The main disadvantage of asexuality is that mitochondria do not follow the 2nd law of Mendel : mtDNA carries no information on genetic barriers..

  27. nuclear mtDNA Maternally transmitted endosymbiotic bacteria : hitchhiking by Wolbachia Phylogeny of the fly Protocalliphora based on AFLP (nuclear markers),according to Whitworth et al (2007). Symbols represent different Wolbachia strains Phylogeny of Protocalliphora based on COI+COII. The authors claim that the assignment of unknown individuals to species is impossible in 60% of the species After Whitworth et al. Proc Roy. Soc. B, in press

  28. MRCA Phylogram of nuclear DNA Phylogenetic tree of mtDNA A phyletic tree in mtDNA represents true phyletic relationships. Mutations are in linkage disequilibrium because they do not recombine. Having two divergent clades is trivial under a FW standard model Whereas the phylogram of a recombining gene represents distances between haplotypes, where mutations can seem to « appear » repeatedly on several terminal branches. They thus inform us on the existence of barrier to gene flow

  29. Conclusions • There is no mitochondrial signature of speciation. There is no room for a barcode species concept, and anything like a « barcodon ». • Even a moderate sample can provide a wealth of information on the history of a species. • Additional information can be obtained in difficult cases, either by increasing the population sample, or by using additional markers.

  30. The END

More Related