540 likes | 794 Views
Orthology Analysis. Erik Sonnhammer C enter for G enomics and B ioinformatics Karolinska Institutet, Stockholm. Outline. Basic concepts BLAST-based approaches to orthology Tree-based approaches to orthology Domain-level orthology. Homologs. = genes with a common origin
E N D
Orthology Analysis Erik SonnhammerCenter for Genomics and BioinformaticsKarolinska Institutet, Stockholm
Outline • Basic concepts • BLAST-based approaches to orthology • Tree-based approaches to orthology • Domain-level orthology
Homologs = genes with a common origin • May be genes in the same or in different organisms • Does not say that function is identical • Can only be true or false, and not a percentage! • Homologs have the same 3D-structure layout
Homologs Orthologs Paralogs
Gene Y1 in human D Gene Y2 in human Gene X in ancient animal Gene Y in rat Gene Yin ancient mammal Orthologs: separated by speciation Gene Xin human Orthologs Gene Xin ancient mammal Gene X in rat S Out-paralogs paralogs In-paralogs D Orthologs S speciation Time
In/Out-paralog definition In-paralogs ~ co-orthologs paralogs that were duplicated afterthe speciation and hence are orthologs to a cluster in the other species Out-paralogs = not co-orthologs paralogs that were duplicated before the speciation. Not necessarily in the same species. Sonnhammer & Koonin, Trends Genet. 18:619-620 (2002)
Orthologs for functional genomics • Co-orthologs / inparalogsare more likely than outparalogs to have identical biochemical functions and biological roles. • Co-orthologs can be used to discover human gene function via model organism experiments • Co-orthologs are key to exploit functional genomics/proteomics data in in model organisms
Orthology and function conservation • Orthology does not say anything about evolutionary distance. • Close orthologs, e.g. human-mouse are very likely to have the same biological role in the organism. • Distant orthologs, e.g. human-worm are less likely to have the same phenotypical role, but may have the same role in the corresponding pathway.
How to find orthologs? 1. Calculate phylogenetic tree, look for orthologs in the tree (Orthostrapper, Rio): 2. Two-way best matches between two species can be used to find orthologs without trees. [However, in-paralogs are harder to find this way]
orthologs COGs COG2813: Out- paralogs
Blue = species 1 Red = species 2 Inparalog ‘n ortholog identification Inpara-n-oid
Blue = species 1 Red = species 2 Inparanoid
Resolve overlapping clusters No overlap - no problems: Partial overlap - separate: Complete overlap - merge:
Inparalog score B 20 40 60 80 100% 0 A P Score for inparalog P = (scoreAP - scoreAB) / (scoreAA - scoreAB)
Confidence values for main orthologs from sampling TVHIVDDEEPVR---KSLAFM---LTMNGFA T+ ++DD +R K L M +T+ G A TILLIDDHPMLRTGVKQLISMAPDITVVGEA Sampling with replacement; insertions kept intact GAFDEP---LVTHVR.......... GA + ++T +R GAEEHMAPDILTLLR.......... “Bootstrap alignment” -> “bootstrap score” Confidence = (bootstrap alignments best-best matches / nr of bootstraps)
inparanoid.cgb.ki.se Homo Sapiens vs. C. elegans Remmet al,J. Mol. Biol. 314:1041-1052(2001)
Drawbacks of Blast-based orthology assignment • No guarantee that the same segment is used in different sequences • No evolutionary distance model • Does not take multiple domains into account
Domain orthology • Inparanoid Human-Fly ortholog pairs with domains in Pfam-A 13.0: 20335 • Different domain architectures: 5411 • Many of these are minor differences, e.g. 22 vs 21 Spectrin repeats • Sometimes the difference is big: ef-hand UCH TBC UCH
Distance-based tree building A1 MKFYSLPNFPEN A2 MKYYKLPDLPDE A3 MRFYTACENPRS • Bootstrapping: • randomly pick columns to bootstrap alignment, calculate tree • Repeat 1000 times, frequency of node = bootstrap support Distance matrix 1 A1 A2 A3 2 3 5
Orthology by tree reconciliation Species tree Gene tree Infer 2 duplications and 2 losses
Drawbacks of tree reconciliation for orthology assignment • Assumption that the species tree is fully known • Does not give confidence values • Gene trees become unreliable when involving a lot of sequences (more data -> less certainty) • Computationally expensive
Partial tree reconciliation • Find pairwise orthologs by computer parsing of tree.
PIR-S67168 AAF52138.1 T04F8.1 99 C47D12.3 45 Y6E2A.9 85 100 F37H8.4 82 AH6.2 C14F5.4 99 AAF49194.1 Pairwise orthology confidence by ‘orthostrapping’ The original tree with bootstrap support values
PIR-S67168 AAF52138.1 T04F8.1 Fly C47D12.3 AAF49194.1 AAF52138.1 Worm AH6.2 0 0 Y6E2A.9 F37H8.4 0 0 F37H8.4 Y6E2A.9 0 0 AH6.2 C47D12.3 0 0 C14F5.4 T04F8.1 0 1 AAF49194.1 C14F5.4 1 0 Pairwise orthology confidence by ‘orthostrapping’
PIR-S67168 AAF52138.1 T04F8.1 Fly AAF49194.1 AAF52138.1 C47D12.3 Worm AH6.2 0 0 Y6E2A.9 F37H8.4 0 0 F37H8.4 Y6E2A.9 0 0 AH6.2 C47D12.3 0 1 C14F5.4 T04F8.1 0 2 AAF49194.1 C14F5.4 2 0 Pairwise orthology confidence by ‘orthostrapping’
PIR-S67168 AAF52138.1 Fly AAF49194.1 AAF52138.1 Worm T04F8.1 99 AH6.2 0 77 C47D12.3 F37H8.4 0 77 45 Y6E2A.9 0 77 Y6E2A.9 85 C47D12.3 0 81 100 F37H8.4 T04F8.1 0 98 82 AH6.2 C14F5.4 99 0 C14F5.4 99 AAF49194.1 Pairwise orthology confidence by ‘orthostrapping’
Orthology is not transitive! Multiple species at different distances may give erroneous groups, that includes out-paralogs
Orthology is not transitive! Y H1 D1 H2 D2 Y H2 D1 -> Orthology strictly defined for only2 species/cladesCombining species of different distances is very dangerousBut OK to combine multiple equidistant ones
chordata metazoa arthropoda eukaryota viridiplantae nematoda fungi HOPS - Hierarchy of Orthologs and Paralogs • All species in Pfam are bundled in groups according to scheme: • Apply Orthostrapper to groups at same level in Pfam families • Display results in NIFAS
Pfam in brief: SEED alignment representative members Profile-HMM HMMer-2.0 Search database Description file FULL alignment Manually curated Automatically made • Release 13.0 (April 2004): • 7426 familiesPfam-A domain families • Based on 1160000 sequences (Swissprot & Trembl) • 21980 unique Pfam-A domain architectures • 73% of all proteins have >=1 Pfam-A domain
HOPS results Pfam 10, 6190 families: • 2450 families (40%) have HOPS orthologs • 1319 families (21%) have HOPS orthologs in all 6 pairwise comparisons • 286356 pairwise orthology assignments (> 75% orthostrap) Storm and Sonnhammer, Genome Research 13:2353-2362 (2003)
Ways to access HOPS • NIFAS graphical browser • By sequence ID at Pfam.cgb.ki.se/HOPS • Flatfiles (Orthostrap tables of 2 clades)
ATP sulfurylase domain, metazoa vs fungi Orthologous shuffled domains?
Summary of ATP sulfurylases/APS kinases: Shuffled non-orthologous domains Metazoa Fungi