160 likes | 178 Views
Gene3D, Orthology and Homology-Based Inheritance of Protein-Protein Interactions. Corin Yeats yeats@biochem.ucl.ac.uk http://gene3d.biochem.ucl.ac.uk/. The Gene3D Protein Family and Annotation Resource:. Identify sequence homologues of CATH domains
E N D
Gene3D, Orthology and Homology-Based Inheritance of Protein-Protein Interactions Corin Yeats yeats@biochem.ucl.ac.uk http://gene3d.biochem.ucl.ac.uk/
The Gene3D Protein Family and Annotation Resource: • Identify sequence homologues of CATH domains • HMMs & hit resolution protocol DomainFinder. • UniProt, RefSeq, Ensembl (with generous help of SIMAP at MIPS). • Integrate with sequence annotation resources. • Pfam, GO, KEGG, UniProt annotation, IntAct, String • Flexible cross-resource comparisons, including CATH PDB domains. • Import sequence families - In-house OrthoFams, HAMAP, SIMAP clusters.
{A} Last Common Ancestor A’ A Species 1 Species 2 Defining Orthology W.M. Fitch (1970) Distinguishing homologous from analogous proteins. Syst. Zool.19:99–113.
{A} Last Common Ancestor {a} A’ A a’ Species 1 Species 2 Defining Paralogy W.M. Fitch (1970) Distinguishing homologous from analogous proteins. Syst. Zool.19:99–113. a
{A} Last Common Ancestor A’’ A A’’’ A’ Co-orthologues Species 1 Species 2 Co-orthology
Updating The Terminology: * E.L.L. Sonnhammer & E.V. Koonin (2002) Orthology, paralogy and proposed classification for paralog subtypes. TiG18:619-620. • InParalogues: • “paralogs in a given lineage that all evolved by gene duplications that happened after the radiation (speciation) event that separated the given lineage from the other lineage under consideration” • OutParalogues: • “paralogs in the given lineage that evolved by gene duplications that happened before the radiation (speciation) event”
Defining “Ortholog Families”: • Strict Definition: • Families split at every duplication event. • Many small families. • Normal Definition: • Set root at appropriate level of interest. • Accept inparalogues. • More useful for function prediction.
APC Prot A CD-HIT Prot C …. Near Identical Creating the OrthoFams: SIMAP protein similarity matrix UniProt & RefSeq Prot A Prot B Prot C Prot D ….
A Simple Test of the OrthoFams: • 99.9% OrthoFams map to one HAMAP family in bacteria. • Each HAMAP family tends to map to several OrthoFams => Too conservative? • >80% map to a single KEGG Orthologue term.
Inheriting Protein-Protein Interactions: • Protein-protein interactions (including mechanism) can be conserved after gene duplication and speciation events. • Some interactions are ancient and well conserved, many are not. • Interactions within species are better conserved between homologues than between species. • Interactions are not binary, but are based on affinity • Not all detectable interactions are biologically relevant. Refs: Mika & Rost 2006, Shoemaker & Panchenko 2007
Interaction Inheritance Approaches: • Homology-based approaches have struggled… • Mika & Rost, 2007 • Problems: • High coverage or high quality input, not both. • Interaction networks re-arrange rapidly • No simple universal accurate sequence identity threshold can be found. • Need to separate those that can be inherited reliably, and those that can’t.
The hiPPI Idea:homology inferred Protein-Protein Interactions • Assume OrthoFams provide more reliable functional groupings than simple similarity measures. • Assume high affinity ~= high conservation ~= low experimental false positive rate. • Require more than one piece of supporting evidence.
iLevel cLevel S30 …. S100 Hs Mm Ce ? Hs ? Mm Mm S30 …. S100 Hs Mm Ce Hs ? Mm Ofam A Ofam B
Interactions derived from MIPS, IntAct and MINT. • GO Term semantic similarity calculated with the Lord method (Lord et al, 2003).
Links and References http://gene3d.biochem.ucl.ac.uk/ “Gene3D: comprehensive structural and functional annotation of genomes” Corin Yeats, Jonathan Lees, Adam Reid, Paul Kellam, Nigel Martin, Xinhui Liu, and Christine Orengo NAR (2008) 36:D414–D418.