1 / 50

The use of the concepts of evolutionary biology in genome annotation.

The use of the concepts of evolutionary biology in genome annotation. Comparative genomics, concept of orthology and paralogy. What is phylogenomics?. Structural and functional annotation. Structural annotation (deciphering of gene structure).

hsu
Download Presentation

The use of the concepts of evolutionary biology in genome annotation.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The use of the concepts of evolutionary biology in genome annotation.

  2. Comparative genomics, concept of orthology and paralogy. • What is phylogenomics? • Structural and functional annotation. • Structural annotation (deciphering of gene structure). • Functional annotation (especially the use of phylogeny to decipher proteins function). Figenix . Genome evolution CASSIOPE

  3. Arthropods Gastrotrichs Nematodes ECDYSOZOANS Onychophorans Tardigrades Kinorhynchs PROTOSTOMES Priapulids Molluscs Rotifers Urbilateria Annelids Gnathostomulids Sipunculans BILATERIA Sequenced metazoan species Nemerteans Pogonophorans LOPHOTROCHOZOANS Platyhelminthes Entoprocts Bryozoans Drosophila | Anopheles Brachiopods Phoronids ?? C.elegans Vertebrates Cephalochordates Human | Mouse | Zebrafish | Fugu Urochordates DEUTEROSTOMES Hemichordates Ciona Echinoderms Ctenophorans Cnidarians Poriferans Metazoan Phylogeny (From Adoutte et al. 2000)

  4. URBILATERIA : The hypothetical Metazoan Ancestor Geoffroy de St Hilaire during XIX th Century URBILATERIA Genome evolved by the fixation of : • Gene mutation • Gene loss • Genic duplication • Gene duplication • Genome region duplication • Whole genome duplication 800 millions years ago …

  5. Large scale gene duplication in vertebrate lineage Amniota (Human) 360 450 Vertebrates Lisamphibia AIS (Adaptative Immune System) 528 T2 Actinopterygii (Zebrafish) Chondrichthyes (shark) Deutérostomata 564 T1 Pikaia Cephalaspidomorphi (lamprey) 751 Myxini (Hagfish) 20 000 genes >751 Céphalochordata (amphioxus) <833-993 Urochordata (Ciona) Echinodermata Insects (Drosophila) 833-993 Protostomata Nématod (c. elegans)

  6. HUMAN Ancestral Function DROSOPHILA Ancestral Function A A Purifying Selection Purifying Selection Speciation URBILATERIA A Orhologs under purifying selection

  7. HUMAN New Function DROSOPHILA Ancestral Function A A2 Positive selection Or relaxed Purifying Selection Speciation URBILATERIA A Ortholog functional switch

  8. DROSOPHILA Ancestral Function HUMAN Sub-Function HUMAN Sub-Function A A’ A” Duplication Purifying Selection Speciation URBILATERIA A Co-ortholog Sub Functionalization

  9. HUMAN Ancestral Function HUMAN New Function DROSOPHILA Ancestral Function A A A2 Positive or relaxed selection Duplication Purifying Selection Purifying Selection Speciation URBILATERIA A Co-ortholog Neo Functionalization

  10. HUMAN multigenic family DROSOPHILA multigenic family A1 A2 A3 A1 A2 A3’ A3” A1 A2 A3 URBILATERIA A1, A2, B Paralogs Speciation A1/2 A3 Duplication A Orthologs and paralogs

  11. A1 HUMAN A1 DROSO A1/2 A2 HUMAN A2 DROSO A A3’ HUMAN Speciation A3” HUMAN Duplication Co-Orthologues A3 A3 DROSO Orthology/ Paralogy Orthologs : 2 genes on different species Which come from a common ancestor and separated by a speciation event. Paralogs : 2 genes resulting from a duplication event in a genome.

  12. How to evidence orthologous relationship ? Many scientists are using the best BLAST hit to look for orthologous relationship … BUT! Many co orthologs can be present Problem with genomes that are not fully sequenced Or when gene loss occurred … AND … Even with Phylogenetic analysis : • Bias must be corrected. • Different methods must be used to reconstruct phylogenetic trees.

  13. HUMAN Ancestral Function HUMAN New Function DROSOPHILA Ancestral Function A A A2 Duplication Purifying Selection Purifying Selection Speciation URBILATERIA A Co-ortholog Neo Functionalization

  14. Paralogue replacement PSMB5 PSMB8 (LMP 7) PSMB6 PSMB9 (LMP 2) PSMB7 PSMB10 (LMP Z) Constitutive Proteasome Immuno-Proteasome • New function (specialization) (Specific size protein or peptide degradation – used by MHC system) • Only found in vertebrates • Ancestral function : Protein degradation • Present in all Metazoans, therefore present in Urbilateria (Metazoan ancestor). Constitutive proteasome β-subunits replacement after Interferon-γ stimulation Paralogue = duplicated gene

  15. Amniota (Human) 360 450 Vertebrates Lisamphibia 528 Immuno Proteasome T2 Actinopterygii (Zebrafish) Chondrichthyes (shark) Deutérostomata 564 T1 Pikaia Cephalaspidomorphi (lamprey) 751 Myxini (Hagfish) Proteasome 20 000 genes >751 Céphalochordata (amphioxus) <833-993 Urochordata (Ciona) Echinodermata PROTEASOME Insects (Drosophila) 833-993 Protostomata Nématod (c. elegans) Large scale gene duplication in vertebrate lineage

  16. 58 59 * 52 PSMB7 Mus 69 91 80 99 PSMB7 Ratt 91 100 95 PSMB7 Bos 98 * PSMB7 Homo 62 Duplication 88 PSMB7 Gall 75 PSMB7 Xeno * 93 * * PSMB7 Zebra * 95 59 58 PSMB7 Fugu PSMB10 Zebra 95 78 99 PSMB10 Fugu 74 100 * PSMB10 Bos 93 100 * PSMB10 Mus 100 * PSMB10 Homo 62 80 PSMB7/10 Bran * PSMB7/10Ci-zeta Cionai 78 76 PSMB7/10 Bombyx * PSMB7/10Prosbeta2 * 95 * PSMB7/10CG18341 Drosophila 44 0.1

  17. PHYLOGENOMICS = STUDY genes and genomes history. => HELP to find evidences for gene function.

  18. Comparative genomics, concept of orthology and paralogy. • What is phylogenomics? • Structural and functional annotation. • Structural annotation (deciphering of gene structure). • Functional annotation (especially the use of phylogeny to decipher proteins function). Figenix . Genome evolution CASSIOPE

  19. Une prédiction structurale correcte pour une analyse phylogénétique correcte.

  20. Structural annotation Genome nucleotide-level Annotation : • Mapping • Finding genomic landmarks • Gene finding and protein prediction • Non-coding RNAs and regulatory regions • Identifying repetitive elements • Mapping segmental duplications • Mapping variations (SNP, microsatellites, ….)

  21. Available tools State of the Art Structural annotation Ab initio : • Genscan • Fgenesh • Genie • Etc … Based on statistical signals within the DNA. Coding propensity (hexamer signals). Splice Site Signals. Strengths : • Easy and quick to run. • Only need DNA as input. Weakness : High false positive rate. Similarity Assisted : • GenomeScan • Twinscan Extension of ab initio programs. Use sequence similarities to guide the predictions Strengths : Should be better than pure ab initio. Weakness : High false positive rate. Similarity Based : • Genewise • Sim4 • Est2genome Alignement programs that know about gene structure. Very accurate with strong sequence similarities Strengths : Accurate. Weakness : Need strong similarities, slow to run.

  22. Structural annotation «FIGENIX SOFTWARE PLATFORM» Annotating method • Structural Annotation combining together a statistical and homologous approach (similarities with known proteins). The process automation resulted in an expert system based on biological inference rules using gene history and ab-initio program.

  23. région 1 région 2 segment ADN hsp: A1 hsp: A2 hsp: A3 hsp: B1 hsp:B2 protéine A (meilleur hit région 1) protéine B (meilleur hit région 2)

  24. + M D A A D A D A D A D D A D A D D A S A La « meilleure » solution sera par exemple

  25. P Gene = nucleotidic sequence Transcription mRNA = nucleotidic sequence Traduction Protein = amino acid sequence Validation of structural annotation Genome Sequence Sequence Experimentation Genscan : 31% HMMGene : 38% Protein Protein Result : 100% Figenix  : 87% The platform performances were validated on standard dataset (HMR195) see Guigò et al, 2000; Rogic et al, 2001.

  26. PROGRAMS EXON TYPE OVER PREDICTION CORRECT PROTEIN PREDICTION Initial (55) Internal (186) Terminal (55) Genscan 0.55 0.80 0.65 0.22 0.31 Hmmgen 0.75 0.81 0.78 0.15 0.38 figenix 0.91 0.92 0.95 0.05 0.87 Structural annotation Accuracy versus Exon Type and Prediction The Mouse and Rat sequence from the HMR195 dataset was used on the human division of swissprot.

  27. Functional annotation Biochemical and Biological process : • Experimental approach : • RNA Interference • Tandem affinity purification and mass spectrometry • In Silico • Similarity • …

  28. Functional annotation • Functional Annotation Based on phylogeny. It is inferred exclusively from experimentally annotated genes…

  29. Small fraction correspond to known, well-characterized proteins.If the function is unknown : Phylogenetic analysis : • Case 1 : an ortholog of experimentally known function is found. The function of the gene to annotate can be deduced. • Case 2 : no ortholog of experimentally known function is found. The function of the gene to annotate will be deduced by the knowledge of the function of the closest paralog. In both cases the protein molecular function prediction by Bayesian Phylogenomics can be used (Engelhardt et al PLOS Computional biology 2005)

  30. Functional annotation Orthologs, Paralogs with experimentally known function: how information can be found. Gene Ontology SwissProt GenBank MedLine Textual Information Analysis G.O. Standard

  31. Functional annotation Gene Ontology Classification Functionality classification : Three GO categories • Biological process – biological process to which the gene or gene product contributes. • Cell growth and maintenance; pyrimidine metabolism; … • Molecular function – biochemical activity, including specific binding to ligands or structures, of a gene product. • Enzyme, transporter; Toll receptor ligand, … • Cellular component – place in the cell where a gene product is active. • Cytoplasm, ribosome, …

  32. GgaTNFSF10 99 96 DreTNFSF10 HsaTNFSF10 73 PolTNFSF11 79 DF1 HsaTNFSF11 95 78 XlaTNFSF11 GgaTNFSF5 99 MmuTNFSF5 99 HsaTNFSF5 98 79 BboTNFSF5 HsaTNFSF2 99 MmuTNFSF2 96 HsaTNFSF1 99 MmuTNFSF1 88 MmuTNFSF15 99 DF2 74 HsaTNFSF15 HsaTNFSF14 99 MmuTNFSF14 HsaTNFSF6 99 RnoTNFSF6 MmuTNFSF6 69 HsaTNFSF13 99 GgaTNFSF13 68 PolTNFSF13 MmuTNFSF7 99 HsaTNFSF7 55 MmuTNFSF8 DF3 99 HsaTNFSF8 58 MmuTNFSF9 97 HsaTNFSF9 EIGER (DmeTNF) 0,2 Tumor necrosis factor family Phylogenetic tree : Orthologs identification Atherosclerotic plaque formation ALPS - LPR/GLD Lymphoproliferativesyndrome Trends in Immunology (July 2003)

  33. TNFRSF10B TNFRSF10A TNFRSF10C TNFRSF10D Human TNF family Phylogenetic tree : Search for the closest Paralog Functional annotation Molecular Function Biological Process TNFSF3 TNFRSF3 LN, PP, GC, Tumorocidal activity PP, GC, T cell Homeostasis (death) TNFSF1 TNFRSF1A T cell Homeostasis (death) TNFSF2 TNFRSF1B T cell costimulation, negative selection? TNFRSF12 TNFSF15 T cell Homeostasis (survival?), CTL activation, peripheral tolerance? TNFRSF14 TNFSF14 TNFRSF6B T cell Homeostasis (death), CTL function, peripheral tolerance, T cell costimulation, chemotaxis TNFSF6 TNFRSF6 T cell transmigration and homeostasis (survival)? TNFSF18 TNFRSF18 T cell homeostasis (survival), peripheral tolerance TNFSF4 TNFRSF4 GC, B cell function, peripheral tolerance, T cell priming TNFSF5 TNFRSF5 Tumorocidal activity, T cell function? Tumorocidal activity, T cell function? TNFSF10 TNFRSF11B TNFSF11 TNFRSF11A LN, bone Homeostasis, mammary gland development B cell Homeostasis B cell Homeostasis ? B cell Homeostasis BR3 TNFSF13B TNFRSF17 TNFSF13 TACI TNFSF12? T cell activation? TNFSF7 TNFRSF7 TNFSF9 TNFRSF9 T cell activation and survival, CTL activity, Tumorocidal actvity? TNFSF8 TNFRSF8 Negative selection, autoimmunity TNFRSF19 ? Tooth, hair, sweat gland formation EDA-A1 EDAR EDA-A2 XEDAR Tooth, hair, skin formation? TNFRSF21 ? Trends in Immunology (July 2003) RELT ?

  34. INFORMATISATION DES CONCEPTS

  35. FIGENIX FIGENIX est une plate-forme logicielle multi-utilisateur dédiée aux taches d'annotation structurales et fonctionnelles: - Prédictions de gènes pour de grandes séquences d'ADN - Construction d'arbres phylogénétiques robustes - Détection automatique d'orthologues et de paralogues - Recherche automatique de données fonctionnelles sur les gènes disponibles à partir de bases de données « Web » - Filtrage et construction de bases de données protéiques (contigage d'EST) - Processus chainés (ex: Prédiction de gènes suivie d'études phylogénétiques pour chacun)

  36. ETAPES DU PIPELINE de Phylogénie (1) Séquence protéique codée par un gène putatif Ensembl NR… BLAST + filtrage CLUSTAL W + purification + correction de biais Alignement multiple PFAM Recherche de domaines par HmmPFAM Conservation « repeats » monophylétiques Enumération domaines Construction Arbre de la Vie Alignement « repeats » fusionnés Existence « repeats »? O N Arbre de référence Test de composition par TREEPuzzle pour élim séq trop divergentes Création domaine « FIGENIX » (correctDomains) Conservation alignement complet

  37. ETAPES DU PIPELINE de phylogénie (2) Détection « groupes de paralogie » + élim sites qui évol trop vites (« test de Gu ») Élim séq >30% « gaps » Construction Arbre de la Vie Élim domaines les + non congruents détectés par HomPart de PAUP Arbre de référence Test de saturation NJ Parcimonie Maximum de vraisemblance arbre arbre arbre Comparaison topologies par tests Templeton-Hasegawa Topologies congruentes? Arbre consensus Arbre NJ N O Détection orthologues I recherche de fonctions

  38. Architecture de FIGENIX EST Agent MGI Agent GO Agent Functional Collector Agent Archiver RDBMS Expert System Annotation Engine Persistence Layer Repository Load Balancing, Security, ... Web Server Request Data exchange EGEE Genomic Data - plate-forme Intranet/Extranet -architecture 3 tiers (interface web/ serveurs “métier” / base de données)

  39. Résultats (1) EGEE

  40. Résultats (2) EGEE

  41. Gouret P, Vitiello V, Balandraud N, Gilles A, Pontarotti P, Danchin EG.FIGENIX: intelligent automation of genomic annotation: expertise integration in a new software platform.BMC Bioinformatics. 2005 Aug 5;6:198__________________________________________Balandraud N , Gouret P, Danchin EGJ , Blanc M, Zinn D , Roudier J Pontarotti P A rigorous method for multigenic families' functional annotation: the peptidyl arginine deiminase (PADs) proteins family exampleBMC Genomics 2005, 6:153     doi:10.1186/1471-2164-6-153

  42. Analysis using Figenix • Vienne et al . Evolution of the proto-MHC ancestral region: more evidence for the plesiomorphic organisation of human chromosome 9q34 region. Immunogenetics. 2003 55(7):429-36 • Danchin E, et al. The Major Histocompatibiliy Complex Origin Immunological reviews. 2004 April;198(1):216-232. • Danchin EGJ , Gouret P, Pontarotti P Universally conserved genes lost in mammals and vertebrates BMC evolutionary biology accepted . C Yu, et al Roles of co-option in the emergence of vertebrate adaptive immune system, insights from amphioxus submitted On line users : INSERM U624*, TAGC, UPRESA CNRS 6032*, Marseille, INRA Nancy , Institute Mol. Genet., Acad.Sci. Czech Republic, SunYat Sen University China, Uppsala University, Department of Neuroscience Sweden. * Draft papers

  43. Comparative genomics, concept of orthology and paralogy. • What is phylogenomics? • Structural and functional annotation. • Structural annotation (deciphering of gene structure). • Functional annotation (especially the use of phylogeny to decipher proteins function). Figenix . Genome evolution CASSIOPE

  44. C.A.S.S.I.O.P.E • C.A.S.S.I.O.P.E: Clever Agent System for Synteny Inheritance and Other Phenomena in Evolution • find conserved regions between genomes • C.A.S.S.I.O.P.E decrease 50 times the working time

  45. C.A.S.S.I.O.P.E.

  46. Vers la reconstruction des génomes ancestraux

  47. Etienne Danchin (AFMB) Collaboration • Philippe Gouret Etienne Pardoux • Vérane Vitiello Simona Grusea • Nathalie Balandraud • Alexandre Vienne • Virginie Lopez • Magali Lienart • Pierre Pontarotti

More Related