1 / 25

The Genome Gamble, Knowledge or Carnage?

The Genome Gamble, Knowledge or Carnage?. Comparative Genomics Leading the Way @ Organon. Tim Hulsen, Oss, November 11, 2003. Summary. (1) An introduction to orthology and paralogy (2) Orthology determination within eukaryotes (3) Testing the advantages of our ortholog set

Download Presentation

The Genome Gamble, Knowledge or Carnage?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Way @ Organon Tim Hulsen, Oss, November 11, 2003

  2. Summary • (1) An introduction to orthology and paralogy • (2) Orthology determination within eukaryotes • (3) Testing the advantages of our ortholog set • (4) Using evolutionary conservation of co-expression for function prediction • (5) Evolutionary conservation of chromosomal distance and orientation

  3. (1) An introduction to orthology and paralogy • Homologous genes: genes that have a common ancestor • Orthologous genes: genes that evolved from a common ancestor through a speciation event ( equivalents in different species) • Paralogous genes: genes that evolved from a common ancestor through a duplication event

  4. Orthology and paralogy explained graphically (from http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/Orthology.html)

  5. The importance of orthology and paralogy • Orthology relationships especially important for function prediction: orthologous genes generally have the same function but in different species • Paralogy relationships can be used for function prediction too: paralogous genes are often involved in the same process, but have different molecular functions (e.g. globins)

  6. (2) Orthology determination within eukaryotes • Not much eukaryotic orthology available at this moment: • euKaryotic Orthologous Groups (KOG,NCBI) • Inparanoid • OrthoMCL • Existing databases are either too inclusive or too restrict • Most methods rely on best bidirectional hit (E-value), while orthology is an evolutionary principle.. should be determined using phylogenetic trees!

  7. GENOMES SELECTION OF HOMOLOGS LIST ALIGNMENTS AND TREE PHYLOME Our orthology determination within eukaryotes GENOME Hs At, Ce, Dm, Ec, Gt, Hs, Mm, Sc, Sp Z>20, RH>0.5*QL 24,263 groups Hs-Mm: 85,848 pairs Hs-Dm: 55,934 pairs etc. TREE SCANNING

  8. Our orthology determination: using phylogenetic trees Example: BMP6 (Bone Morphogenetic Protein 6)  5 orthologous relations are defined, all Hs-Mm

  9. The ortholog database: Eukaryortho http://t2.teras.sara.nl:4086 (only accessible from Organon, CMBI and SARA)

  10. (3) Testing the advantages of our ortholog set • Quality of orthology difficult to test • Orthologs should have more or less the same function --> use conservation of function as an orthology benchmark • Gene Ontology (GO) database: hierarchical system of function and location descriptions • Orthologs are in same functional category when they are in the same 4th level GO Molecular Function class

  11. GO molecular function benchmark 0 1 2 3 4 • Molecular function: one of the three ‘subroots’ (together with biological process and cellular location) • ‘True’ orthologs should share a 4th level molecular function (here: GO0019912) • Our Hs-Mm ortholog set: 67 % • KOG Hs-Mm ortholog set: 51 %

  12. Co-expression benchmark • Second method: comparing expression profiles of each orthologous gene pair • Using GeneLogic Expressor data set: • Human chips: 3269 samples, 44792 fragments, 115 tissue categories, 15 SNOMED tissue categories • Mouse chips: 859 samples, 36701 fragments, 25 tissue categories, 12 SNOMED tissue categories

  13. SNOMED tissue categories used for co-expression calculation

  14. Calculating the correlation Nxy – (x)(y) r = ------------------------------------------------- sqrt( (Nx2 - (x)2)(Ny2 – (y)2) )

  15. Co-expression comparison of our ortholog set to the KOG set

  16. (4) Using evolutionary conservation of co-expression for function prediction Human Gene A Gene B Co-expression = Cab (-1<=corr.<=1) (Co-expression calculated over 115 tissues in human, 25 in mouse) Human/Mouse Gene A’ Gene B’ Ca’b’ >= Cab  Increases probability that A and B are involved in the same process

  17. GO biological process benchmark 0 1 2 3 4 • Biological process: one of the three ‘subroots’ (together with cellular location and molecular function) • Both orthologs and paralogs are often involved in the same process/pathway (=sharing a 4th level biological process, here: GO0007584)

  18. Conservation of co-expression used in function prediction

  19. The importance of (conserved) co-expression for function prediction • Co-expression without conservation can already be used for function prediction • Paralogous conservation gives a 2x higher accuracy • Orthologous conservation gives a 3x or 4x higher accuracy • Alternative for GO Biological Process: KEGG Pathway database  similar results

  20. (5) Evolutionary conservation of chromosomal distance and orientation Human Gene A Gene B Distance = Dab (# bp) Orientation = Oab (,,) Co-expression = Cab (-1<=corr.<=1) Da’b’ <= Dab Oa’b’ == Oab Ca’b’ >= Cab (Co-expression calculated over 115 tissues in human, 25 in mouse) Human/Mouse Gene A’ Gene B’  Increases probability that A and B are involved in the same process

  21. Function prediction using co-expression and chromosomal distance (without conservation)

  22. Conservation of chromosomal distance used in function prediction

  23. The importance of chromosomal distance and orientation for function prediction • Chromosomal distance in eukaryotes less important than in prokaryotes (due to the absence of operons) • Only genes with distance < 1 Mbp seem to be coregulated • Conservation of relative orientation seems to be important only for very close gene pairs • Limited number of genes can be functional annotated using the conservation of chromosomal distance and orientation

  24. Conclusions • Orthologous and paralogous relations can be used to improve function prediction • Our orthologous pairs of Protein World proteins perform better than KOG, in terms of co-expression and involvement in the same process • Chromosomal distance and relative orientation between genes can be used for function prediction too, in a limited number of cases • Future plans: find examples where the function of a protein can be predicted using these methods

  25. Credits • Martijn Huynen • Peter Groenen • Others at Comics • Others at Organon Bioinf.

More Related