250 likes | 410 Views
The Genome Gamble, Knowledge or Carnage?. Comparative Genomics Leading the Way @ Organon. Tim Hulsen, Oss, November 11, 2003. Summary. (1) An introduction to orthology and paralogy (2) Orthology determination within eukaryotes (3) Testing the advantages of our ortholog set
E N D
The Genome Gamble, Knowledge or Carnage? Comparative Genomics Leading the Way @ Organon Tim Hulsen, Oss, November 11, 2003
Summary • (1) An introduction to orthology and paralogy • (2) Orthology determination within eukaryotes • (3) Testing the advantages of our ortholog set • (4) Using evolutionary conservation of co-expression for function prediction • (5) Evolutionary conservation of chromosomal distance and orientation
(1) An introduction to orthology and paralogy • Homologous genes: genes that have a common ancestor • Orthologous genes: genes that evolved from a common ancestor through a speciation event ( equivalents in different species) • Paralogous genes: genes that evolved from a common ancestor through a duplication event
Orthology and paralogy explained graphically (from http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/Orthology.html)
The importance of orthology and paralogy • Orthology relationships especially important for function prediction: orthologous genes generally have the same function but in different species • Paralogy relationships can be used for function prediction too: paralogous genes are often involved in the same process, but have different molecular functions (e.g. globins)
(2) Orthology determination within eukaryotes • Not much eukaryotic orthology available at this moment: • euKaryotic Orthologous Groups (KOG,NCBI) • Inparanoid • OrthoMCL • Existing databases are either too inclusive or too restrict • Most methods rely on best bidirectional hit (E-value), while orthology is an evolutionary principle.. should be determined using phylogenetic trees!
GENOMES SELECTION OF HOMOLOGS LIST ALIGNMENTS AND TREE PHYLOME Our orthology determination within eukaryotes GENOME Hs At, Ce, Dm, Ec, Gt, Hs, Mm, Sc, Sp Z>20, RH>0.5*QL 24,263 groups Hs-Mm: 85,848 pairs Hs-Dm: 55,934 pairs etc. TREE SCANNING
Our orthology determination: using phylogenetic trees Example: BMP6 (Bone Morphogenetic Protein 6) 5 orthologous relations are defined, all Hs-Mm
The ortholog database: Eukaryortho http://t2.teras.sara.nl:4086 (only accessible from Organon, CMBI and SARA)
(3) Testing the advantages of our ortholog set • Quality of orthology difficult to test • Orthologs should have more or less the same function --> use conservation of function as an orthology benchmark • Gene Ontology (GO) database: hierarchical system of function and location descriptions • Orthologs are in same functional category when they are in the same 4th level GO Molecular Function class
GO molecular function benchmark 0 1 2 3 4 • Molecular function: one of the three ‘subroots’ (together with biological process and cellular location) • ‘True’ orthologs should share a 4th level molecular function (here: GO0019912) • Our Hs-Mm ortholog set: 67 % • KOG Hs-Mm ortholog set: 51 %
Co-expression benchmark • Second method: comparing expression profiles of each orthologous gene pair • Using GeneLogic Expressor data set: • Human chips: 3269 samples, 44792 fragments, 115 tissue categories, 15 SNOMED tissue categories • Mouse chips: 859 samples, 36701 fragments, 25 tissue categories, 12 SNOMED tissue categories
Calculating the correlation Nxy – (x)(y) r = ------------------------------------------------- sqrt( (Nx2 - (x)2)(Ny2 – (y)2) )
(4) Using evolutionary conservation of co-expression for function prediction Human Gene A Gene B Co-expression = Cab (-1<=corr.<=1) (Co-expression calculated over 115 tissues in human, 25 in mouse) Human/Mouse Gene A’ Gene B’ Ca’b’ >= Cab Increases probability that A and B are involved in the same process
GO biological process benchmark 0 1 2 3 4 • Biological process: one of the three ‘subroots’ (together with cellular location and molecular function) • Both orthologs and paralogs are often involved in the same process/pathway (=sharing a 4th level biological process, here: GO0007584)
The importance of (conserved) co-expression for function prediction • Co-expression without conservation can already be used for function prediction • Paralogous conservation gives a 2x higher accuracy • Orthologous conservation gives a 3x or 4x higher accuracy • Alternative for GO Biological Process: KEGG Pathway database similar results
(5) Evolutionary conservation of chromosomal distance and orientation Human Gene A Gene B Distance = Dab (# bp) Orientation = Oab (,,) Co-expression = Cab (-1<=corr.<=1) Da’b’ <= Dab Oa’b’ == Oab Ca’b’ >= Cab (Co-expression calculated over 115 tissues in human, 25 in mouse) Human/Mouse Gene A’ Gene B’ Increases probability that A and B are involved in the same process
Function prediction using co-expression and chromosomal distance (without conservation)
Conservation of chromosomal distance used in function prediction
The importance of chromosomal distance and orientation for function prediction • Chromosomal distance in eukaryotes less important than in prokaryotes (due to the absence of operons) • Only genes with distance < 1 Mbp seem to be coregulated • Conservation of relative orientation seems to be important only for very close gene pairs • Limited number of genes can be functional annotated using the conservation of chromosomal distance and orientation
Conclusions • Orthologous and paralogous relations can be used to improve function prediction • Our orthologous pairs of Protein World proteins perform better than KOG, in terms of co-expression and involvement in the same process • Chromosomal distance and relative orientation between genes can be used for function prediction too, in a limited number of cases • Future plans: find examples where the function of a protein can be predicted using these methods
Credits • Martijn Huynen • Peter Groenen • Others at Comics • Others at Organon Bioinf.