1 / 41

The evolution of domain superfamilies from a structural and functional perspective

The evolution of domain superfamilies from a structural and functional perspective. Oliver Redfern CATH-GENE3D group Dept. Structural and Molecular Biology University College London UK. The CATH and Gene3D Domain and protein family resources (sequence, structure, function).

joey
Download Presentation

The evolution of domain superfamilies from a structural and functional perspective

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The evolution of domain superfamilies from a structural and functional perspective Oliver Redfern CATH-GENE3D group Dept. Structural and Molecular Biology University College London UK

  2. The CATH and Gene3D Domain and protein family resources (sequence, structure, function) Domain structures Domain structure predictions Homologous Superfamily Function 2 Function 1 Classifying domain superfamilies The impact of structural divergence on function Predicting protein function from structure

  3. The CATH and Gene3D Domain and protein family resources (sequence, structure, function) Domain structures Domain structure predictions Homologous Superfamily Function 2 Function 1 Classifying domain superfamilies The impact of structural divergence on function Predicting protein function from structure

  4. Why domains? • Unit of evolution • ~2000 domain superfamilies (have we found them all?) • 10,000s different domain combinations (37,000 already) • Domain-based function annotation can allow functional predictions of novel domain combinations

  5. Other domain databases Domain structures grouped by superfamily • links to sequences through Gene3D Domain structures grouped by superfamily • Links to sequence through SUPERFAMILY Domain sequences grouped into families Integration of domain families from Pfam, SCOP, CATH etc. for sequence databases

  6. The domain structure database I Class Achitecture Topology Homologous Superfamily e.g. 2.40.50.100 (toxin) superfamily

  7. PDB Split PDB into chains Split chain into CATH domains Assign domain to superfamily HomCheck DomChop The CATH pipeline: Flow Chart

  8. The domain structure database II ~114,000 domain structures ~2200 superfamilies

  9. How do we define a “domain”? • Unit of evolution • Hydrophobic core • Compact unit, with few contacts with other domains

  10. Multi-domain proteins • ~40% of structures in the PDB comprise more than one domain (i.e. multi-domain) • ~60-80% of genes are thought to code for multi-domain proteins

  11. Algorithms for recognising domain boundaries • DETECTIVE Swindells, 1995 each domain should have a recognisable hydrophobic core. • DOMAK Siddiqui & Barton, 1995 residues comprising a domain make more internal contacts than external ones. • PUUHolm & Sander, 1994 parser for protein folding units: maximal interaction within domains and minimal interaction between domains • CATHEDRALRedfern and Orengo, 2007 structure comparison algorithm which uses alignment to known structural domains

  12. DomChop

  13. <15% sequence identity 1dnpA01 Deoxyribo-dipyrimidine photo-lyases 1o97D01 Electron transfer flavoprotein How do we define a “superfamily”? • Related through a common ancestor • Evidence from sequence, structural, and/or functional similarity

  14. Detecting homology using structure: CATHEDRAL CATHs Existing Domain Recognition Algorithm • Rapid graph theory secondary structure filter • Double dynamic programming for accurate residue alignment Redfern et al. PLOS Comp. Biol. (2007)

  15. Coverage Error CATHEDRAL vs. other structure comparison methods CATHEDRAL method for structural comparison Redfern et al. PLOS Comp. Biol. 2007

  16. Advantages of other popular structure comparison methods • Combinatorial Extension (CE) • Fast • Linked to PDB • Dali • Accurate, “industry standard” • FatCat • Allows for flexible alignment. • Vast/MSDFold • Secondary structure based • Fast and linked directly to PDB/MSD

  17. Sequence-based homology recognition methods • PSI-BLAST • HMM scans (HMMer, SAM-T, PRC) • Needlemann-Wunsch • PFam scan

  18. Gene3D: Expanding CATH with sequence relatives from genomes

  19. Scan against CATH HMM library protein sequences from genomes assign domains to CATH superfamilies Expanding CATH with sequence relatives from genomes Library of HMMs built for representative sequences from each CATH domain superfamily Up to 60% of sequences in completed genomes can be assigned to CATH domain superfamilies

  20. Are all superfamilies equally populated? CATH domain structures in the PDB CATH domain sequences in the genomes Largest 100 account for more than half the sequences of known structure in the genomes

  21. Why is the distribution of superfamilies uneven? • FunctionalityCertain families expand with genome size (e.g. metabolic genes, Ig domains) • DesignabilityStable folds are compatible with more sequences • Stochastic effects Large families just got bigger Goldstein Curr Op Structural Biology 2008

  22. The CATH and Gene3D Domain and protein family resources (sequence, structure, function) Domain structures Domain structure predictions Homologous Superfamily Function 2 Function 1 Classifying domain superfamilies The impact of structural divergence on function Predicting protein function from structure

  23. Correlation of sequence and structural variability of CATH-Gene3D families with the number of different functional groups

  24. Domain structure embellishments in the P-loop Hydrolase Superfamily Fold spin plot

  25. 2DSEC algorithm Some superfamilies show great structural diversity Multiple structural alignment allows identification of consensus secondary structures and secondary structure embellishments Gabrielle Reeves J. Mol. Biol. (2006) In 117 superfamilies relatives expanded by >2 fold or more

  26. Correlation of sequence and structural variability of CATH-Gene3D families with the number of different functional groups

  27. ligand binding site Conservation of binding site region I ligand binding site Arginyl-tRNA synthetase 1f7uA01 Pantetheine-phosphate adenyltransferase

  28. Conservation of binding site region II Deamido-NAD Sulfate L-Tyrosine ATP ATP ATP NH3-dependent NAD+ synthetase ATP sulfurylase Tyrosyl-tRNA synthetase

  29. The CATH and Gene3D Domain and protein family resources (sequence, structure, function) Domain structures Domain structure predictions Homologous Superfamily Function 2 Function 1 Classifying domain superfamilies The impact of structural divergence on function Predicting protein function from structure

  30. Methods to predict function from structure • Which bit of function are you interested in? • Diversity of structural data (apo-, holo-, non-cognate ligands). • Different similarity cut-offs for different functions/families?

  31. Using pre-defined binding site templates • Ligand/Catalytic prediction: SiteEngines, PDBSiteScan, MSDSite, Catalytic Site Atlas, Evolutionary Trace

  32. first explode the structure into 3 residue fragments (templates) green and purple – identical residues; orange and white – similar residues Automatic binding site templates Matching reverse templates and assessing relevance of hits by looking at sequence conservation within the local environment • e.g. GASP, DRESPAT, PINTS, FLORA, Reverse templates. Laskowski and Thornton (2005)

  33. Methods for analysing ligand binding • Surface comparison: SURF’S UP, pvSOAR, Consurf. • Mapping sequence conservation: Evolutionary trace, many other methods

  34. Can characterising structure-function families help with function prediction? 1q77A00 Unknown function 1o97D01 Electron transfer flavoprotein 1dnpA01 Deoxyribo- dipyrimidine photo-lyases 1ej2A00 Nucleotidylyl- transferases 1n3lA01 AA tRNA synthetases

  35. Align domains within FSG Determine FSG-specific positions Functional group A Functional group B FLORA: Collate functional groups

  36. FLORA: Extracting enzyme family-specific vectors Comparing unclassified structures to templates - score similarity over all template vectors

  37. FLORA: Performance of FLORA compared to structure comparison Coverage Error

  38. Useful methods of function prediction from structure • PROFUNC • Several methods (e.g. BLAST, MSDFold, template methods) • PROKNOW • Annotation with Gene Ontology terms • FLORA • Direct link to CATH

  39. Summary • CATH decomposes PDB structures into their component domains and classifies domains into superfamilies. • There are some very large superfamilies, which are structurally and functionally diverse and dominate the genomes. • Structural data can help us understand how different protein functions have evolved.

  40. Practical http://www.cathdb.info Click on Documentation at the top, then Tutorials, then “Combining structural and functional analysis”

More Related