1 / 53

Orthology Analysis

Orthology Analysis. Erik Sonnhammer C enter for G enomics and B ioinformatics Karolinska Institutet, Stockholm. Outline. Basic concepts BLAST-based approaches to orthology Tree-based approaches to orthology Domain-level orthology. Homologs. = genes with a common origin

cachet
Download Presentation

Orthology Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Orthology Analysis Erik SonnhammerCenter for Genomics and BioinformaticsKarolinska Institutet, Stockholm

  2. Outline • Basic concepts • BLAST-based approaches to orthology • Tree-based approaches to orthology • Domain-level orthology

  3. Homologs = genes with a common origin • May be genes in the same or in different organisms • Does not say that function is identical • Can only be true or false, and not a percentage! • Homologs have the same 3D-structure layout

  4. Homologs Orthologs Paralogs

  5. Gene Y1 in human D Gene Y2 in human Gene X in ancient animal Gene Y in rat Gene Yin ancient mammal Orthologs: separated by speciation Gene Xin human Orthologs Gene Xin ancient mammal Gene X in rat S Out-paralogs paralogs In-paralogs D Orthologs S speciation Time

  6. In/Out-paralog definition In-paralogs ~ co-orthologs paralogs that were duplicated afterthe speciation and hence are orthologs to a cluster in the other species Out-paralogs = not co-orthologs paralogs that were duplicated before the speciation. Not necessarily in the same species. Sonnhammer & Koonin, Trends Genet. 18:619-620 (2002)

  7. Orthologs for functional genomics • Co-orthologs / inparalogsare more likely than outparalogs to have identical biochemical functions and biological roles. • Co-orthologs can be used to discover human gene function via model organism experiments • Co-orthologs are key to exploit functional genomics/proteomics data in in model organisms

  8. Orthology and function conservation • Orthology does not say anything about evolutionary distance. • Close orthologs, e.g. human-mouse are very likely to have the same biological role in the organism. • Distant orthologs, e.g. human-worm are less likely to have the same phenotypical role, but may have the same role in the corresponding pathway.

  9. Ortholog Databases

  10. How to find orthologs? 1. Calculate phylogenetic tree, look for orthologs in the tree (Orthostrapper, Rio): 2. Two-way best matches between two species can be used to find orthologs without trees. [However, in-paralogs are harder to find this way]

  11. Two-way best match approachto finding orthologs

  12. orthologs COGs COG2813: Out- paralogs

  13. Blue = species 1 Red = species 2 Inparalog ‘n ortholog identification Inpara-n-oid

  14. Blue = species 1 Red = species 2 Inparanoid

  15. Resolve overlapping clusters No overlap - no problems: Partial overlap - separate: Complete overlap - merge:

  16. Inparalog score B 20 40 60 80 100% 0 A P Score for inparalog P = (scoreAP - scoreAB) / (scoreAA - scoreAB)

  17. Confidence values for main orthologs from sampling TVHIVDDEEPVR---KSLAFM---LTMNGFA T+ ++DD +R K L M +T+ G A TILLIDDHPMLRTGVKQLISMAPDITVVGEA Sampling with replacement; insertions kept intact GAFDEP---LVTHVR.......... GA + ++T +R GAEEHMAPDILTLLR.......... “Bootstrap alignment” -> “bootstrap score” Confidence = (bootstrap alignments best-best matches / nr of bootstraps)

  18. http://inparanoid.cgb.ki.se

  19. inparanoid.cgb.ki.se Homo Sapiens vs. C. elegans Remmet al,J. Mol. Biol. 314:1041-1052(2001)

  20. Ortholog group sizes, human vs X

  21. Nr of inparalogs per ortholog group

  22. Drawbacks of Blast-based orthology assignment • No guarantee that the same segment is used in different sequences • No evolutionary distance model • Does not take multiple domains into account

  23. Domain orthology • Inparanoid Human-Fly ortholog pairs with domains in Pfam-A 13.0: 20335 • Different domain architectures: 5411 • Many of these are minor differences, e.g. 22 vs 21 Spectrin repeats • Sometimes the difference is big: ef-hand UCH TBC UCH

  24. Tree-based approaches

  25. Distance-based tree building A1 MKFYSLPNFPEN A2 MKYYKLPDLPDE A3 MRFYTACENPRS • Bootstrapping: • randomly pick columns to bootstrap alignment, calculate tree • Repeat 1000 times, frequency of node = bootstrap support Distance matrix 1 A1 A2 A3 2 3 5

  26. Orthology by tree reconciliation Species tree Gene tree Infer 2 duplications and 2 losses

  27. Drawbacks of tree reconciliation for orthology assignment • Assumption that the species tree is fully known • Does not give confidence values • Gene trees become unreliable when involving a lot of sequences (more data -> less certainty) • Computationally expensive

  28. Partial tree reconciliation • Find pairwise orthologs by computer parsing of tree.

  29. PIR-S67168 AAF52138.1 T04F8.1 99 C47D12.3 45 Y6E2A.9 85 100 F37H8.4 82 AH6.2 C14F5.4 99 AAF49194.1 Pairwise orthology confidence by ‘orthostrapping’ The original tree with bootstrap support values

  30. PIR-S67168 AAF52138.1 T04F8.1 Fly C47D12.3 AAF49194.1 AAF52138.1 Worm AH6.2 0 0 Y6E2A.9 F37H8.4 0 0 F37H8.4 Y6E2A.9 0 0 AH6.2 C47D12.3 0 0 C14F5.4 T04F8.1 0 1 AAF49194.1 C14F5.4 1 0 Pairwise orthology confidence by ‘orthostrapping’

  31. PIR-S67168 AAF52138.1 T04F8.1 Fly AAF49194.1 AAF52138.1 C47D12.3 Worm AH6.2 0 0 Y6E2A.9 F37H8.4 0 0 F37H8.4 Y6E2A.9 0 0 AH6.2 C47D12.3 0 1 C14F5.4 T04F8.1 0 2 AAF49194.1 C14F5.4 2 0 Pairwise orthology confidence by ‘orthostrapping’

  32. PIR-S67168 AAF52138.1 Fly AAF49194.1 AAF52138.1 Worm T04F8.1 99 AH6.2 0 77 C47D12.3 F37H8.4 0 77 45 Y6E2A.9 0 77 Y6E2A.9 85 C47D12.3 0 81 100 F37H8.4 T04F8.1 0 98 82 AH6.2 C14F5.4 99 0 C14F5.4 99 AAF49194.1 Pairwise orthology confidence by ‘orthostrapping’

  33. orthostrapper.cgb.ki.se

  34. Orthology is not transitive! Multiple species at different distances may give erroneous groups, that includes out-paralogs

  35. Orthology is not transitive! Y H1 D1 H2 D2 Y H2 D1 -> Orthology strictly defined for only2 species/cladesCombining species of different distances is very dangerousBut OK to combine multiple equidistant ones

  36. Domain-level orthology

  37. chordata metazoa arthropoda eukaryota viridiplantae nematoda fungi HOPS - Hierarchy of Orthologs and Paralogs • All species in Pfam are bundled in groups according to scheme: • Apply Orthostrapper to groups at same level in Pfam families • Display results in NIFAS

  38. Pfam

  39. Pfam in brief: SEED alignment representative members Profile-HMM HMMer-2.0 Search database Description file FULL alignment Manually curated Automatically made • Release 13.0 (April 2004): • 7426 familiesPfam-A domain families • Based on 1160000 sequences (Swissprot & Trembl) • 21980 unique Pfam-A domain architectures • 73% of all proteins have >=1 Pfam-A domain

  40. HOPS results Pfam 10, 6190 families: • 2450 families (40%) have HOPS orthologs • 1319 families (21%) have HOPS orthologs in all 6 pairwise comparisons • 286356 pairwise orthology assignments (> 75% orthostrap) Storm and Sonnhammer, Genome Research 13:2353-2362 (2003)

  41. Ways to access HOPS • NIFAS graphical browser • By sequence ID at Pfam.cgb.ki.se/HOPS • Flatfiles (Orthostrap tables of 2 clades)

  42. Pfam.cgb.ki.se/HOPS

  43. Evolution of Domain Architectures NIFAS:

  44. ATP sulfurylase /APS kinase

  45. ATP sulfurylase domain, metazoa vs fungi Orthologous shuffled domains?

  46. APS kinase domain

  47. HOPS orthologs of PPS1_HUMAN (ATP sulfurylase/APS kinase)

  48. Summary of ATP sulfurylases/APS kinases: Shuffled non-orthologous domains Metazoa Fungi

More Related