500 likes | 708 Views
James Watson. Protein Function Prediction From Structure In Structural Genomics. Its Contribution To The Study Of Health And Disease. Erice 40 th School of Crystallography. UniProt Growth. PDB Growth. From Genome To Proteome?. Estimate of 20,000 to 25,000 genes in the human genome.
E N D
James Watson Protein Function Prediction From StructureIn Structural Genomics Its Contribution To The Study Of Health And Disease Erice 40th School of Crystallography
UniProt Growth Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
PDB Growth Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
From Genome To Proteome? Estimate of 20,000 to 25,000 genes in the human genome - Too expensive and time consuming • Determine every structure? - Ultimately might be technically impossible? • Homology Modelling? - Find closest sequence match in PDB and use as start point to simulate structure - Dependant on widespread fold coverage Structural Genomics Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
Structural Genomics Aims Pathogens and disease Automation / high throughput ? Coverage of fold space Human proteins Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
Structural Genomics Collaborators MCSG – Mid-west Centre for Structural Genomics SPINE – Structural Proteomics in Europe SGC – Structural Genomics Consortium Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
Structural and Functional Analysis MCSG pipeline Automated Crystal Mounting and Structure Refinement Web Site Workshops Publications FUNCTIONAL STUDIES Reductive Methylation Domain Definition GENOMIC SEQUENCES Target Refinement Domain Parsing Cleavage On Column New Tags and Expression Systems Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
MCSG Collaboration Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
How Do We Define Function? Question: What is the function of a cooker? Boil water? Burn Natural Gas? Grill fish? Kitchen appliance? Bake pie? Central Heating? Gene Ontology EC nomenclature Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
Ask the right questions • Ask enough questions Function Prediction Also Known As: Guess Who? Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
Don't always believe what programs tell you • they're often misleading & sometimes wrong! Don't always believe what databases tell you • they're often misleading & sometimes wrong! Don't always believe what lecturers tell you • they're sometimes misleading & occasionally wrong! In short, you need to determine whether theinformation is reliable or not • when computers are applied to biology, it is vital to understand the difference between mathematical & biological significance • computers don’t do biology A Friendly Warning! Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
Genome organisation Biological multimeric state Evolutionary relationships Electro statics Metabolome Clefts and surfaces Catalytic clusters, mechanisms & motifs Ligands MACiE Methods Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
Sequence scans Structure scans Sequence search vs Uniprot and PDB Fold search (SSM and DALI) Sequence motifs (PROSITE, BLOCKS, SMART, Pfam, etc) Surface clefts Nest analysis Gene neighbours Superfamily HMM library Templates Residue conservation Reverse templates Enzyme active sites Ligand binding sites DNA binding sites Laskowski RA, Watson JD & Thornton JM (2005). ProFunc: a server for predicting protein function from 3D structure.Nucleic Acids Res., 33, W89-W93. Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
Cholesterol oxidase IIAglc histidine kinase GARTfase Carbamoylsarcosine amidohhydrase Ser-His-Asp catalytic triad Dihydrofolate reductase Templates Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
3-residue templates 1 2 3 6 4 5 7 8 9 … Reverse Templates Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
Car Park Marsala Library Church No False Positives Template Matching – False Positives Cambridge Erice Barcelona High False Positive Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
Ser Match to template: Arg Glu Template structure – 1mbb Query structure – 1hsk Comparison Of Template Environments Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
Ser Match to template: Arg Glu Template structure – 1mbb Query structure – 1hsk Comparison Of Template Environments Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
Comparison Of Template Environments Identical residues in neighbourhood: Template structure – 1mbb Query structure – 1hsk Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
Ser Arg Glu Comparison Of Template Environments Similar residues in neighbourhood: Template structure – 1mbb Query structure – 1hsk Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
… Arg 295 Asp296 Gly 297 Ala 298 Gly 299 His 300 Tyr 301 Gly 302 X Templates X X … List of Structures Single Structure Tempura Templates – User Refined Approach www.ebi.ac.uk/thornton-srv/databases/tempura/ Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
??? Functional Annotation Transfer It can be hard to judge whether something “makes sense”. The lack of labeling on many web pages makes it hard to know the source. Calculations based on databases are even harder to deal with Logical deductions may be worse. “tacR gene regulates the human nervous system” “tacQ gene is similar to tacR but is found in E. coli” “so tacQ gene regulates the E. coli nervous system” Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
It’s not all depressing though….. “Sigh!” Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
(Sep 2005) 319 MCSG Structures Unknown Function 47% Putative Function 282 non- redundant 20% 30% Seq ID Known Function 33% (93) MCSG as a test dataset (1) Reference: Watson et al (2007), J.Mol.Biol. 367,1511-1522 Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
Results Top Hit Backdating Manual Checking MCSG as a test dataset (1) Known Function Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
SSM protein fold match Enzyme active site templates Ligand-binding templates DNA-binding templates Reverse templates Manual Assessment of Methods Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
Example 1: Predicted function confirmed • BioH protein from Escherichia coli • Contains Pfam domain • PF00561: alpha/beta hydrolase fold common to a number of hydrolytic enzymes • Search against the enzyme templates database provides a significant hit to Ser-His-Asp catalytic triad (rmsd = 0.28 Å) PDB entry: 1m33 R.Sanishvili et al. (2003). Integrating Structure, Bioinformatics, and Enzymology to Discover Function. J. Biol. Chem.278(28): 26039-26045. Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
Catalytic triad = lipase, protease, or esterase activity? Serine nucleophile (Ser82) is located within one of the two Gly-Xaa-Ser-Xaa-Gly motifs present = acyltransferase or thioesterase activity ? Experimentally demonstrated carboxylesterase activity (EC 3.1.1.1). A novel carboxylesterase with broad substrate specificity and a preference for short chain substrates. Example 1: Predicted function confirmed Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
Method: • Performed targeted docking of high-energy metabolic intermediates • Results dominated by adenine analogues undergoing C6-deamination • Structure determined with S -adenosylhomocysteine (SAH) • Clone provided by the JCSG Example 2: Identifying previously published function • Tm0936 from Thermotoga maritima • Contains Pfam domain • PF01979: Amidohydrolase family which contains a number of deaminases and is part of a wider Amidohydrolase superfamily clan. • Publication suggests function: • adenosine deaminase (E.C.3.5.4.4) PDB entry: 2plm J.C.Hermann et al. (2007). Structure-based activity prediction for an enzyme of unknown function. Nature, 448, 775-779. Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
Template Searches 1a4l 2plm Identifying previously published function 2plm 1a4l • Strong Enzyme template match (e-value = 2.45 E-04) • Structure: Adenosine deaminase (E.C.3.5.4.4) Sequence Identity = 17.5%, Local sequence identity = 27.7%. Structural Similarity = 95% Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
Example 3: Putative Function APC29563: Crystal structure of a hypothetical protein from Enterococcus faecalis V583. Evidence: Sequence: Superfamily hit to metallo-hydrolase/oxidoreductase BLAST hits to putative metallo-beta-lactamases Fold, Ligand Templates (Zn) and Reverse Templates: Hits to metallo-beta-lactamase proteins and RNA degradation enzymes. Some metallo-b-lactamases have been shown to have phosphodiesterase activity Function prediction: Metallo-beta-lactamase/nuclease or phosphodiesterase Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
Amino Acids Aldehyde Alcohol Sugar Acids First pass screening Phosphatase NADPH Oxidase Phosphodiesterase Thioesterase Protease Lipase Dehydrogenase Oxidase Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
Phosphodiesterase Assay 2’3’ Cyclic mononucleotides are preferred substrates Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
Adenosine 2’cAMP • Saturation curve • 2’3’ cAMP saturation curve calculated • Suggests kinetics: • Km near 1.2 mM • Vmax near 2.9 mmol min-1 mg-1 3’cAMP 2’3’cAMP • No preference for 2’ or 3’ position Samples Detailed assays • Preferred metal • Cobalt gives strongest activity Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
Putative Function • Shown phosphodiesterase activity against 2’3’ cyclic mononucleotides • Interestingly, is structurally similar to PDB entry 2dkf • Identified by fold and reverse template matches • Published as an RNA degradation protein of the metallo-beta-lactamase superfamily • Possible RNA degradation protein? More to be done…… Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
Hypothetical Proteins Hypothetical Proteins NAMSANDKLTILW 1. Sequence Methods No Motifs 2. Fold Comparison Plant “Stable” Proteins 3. Templates No Hits 4. Reverse templates Lots of Hits Problem Structures Hypothetical protein from Bacillus subtilis (PDB entry 1Q8B) Hypothetical Proteins Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
Medically relevant structures Studying the molecular basis for ligand selectivity in a family of transcriptional regulators from Pseudomonas aeruginosa Dimeric transcription factors which respond to small phenolic molecules and are responsible for antibiotic resistance. 13PA sequences 5 structures solved, 1 with ligand bound Implications to Cystic Fibrosis patients Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
Viral hemorrhagic fevers Toxoplasma Rabies The Future Of Structural Genomics? NIAID Category A, B, and C Priority Pathogens Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
CleftXplorer IsoCleft Finder Rafael Najmanovich Abdullah Kahraman Future Methods Using binding site 3D atomic similarities to predict ligand binding & Protein Function Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
CleftXplorer - Algorithm for shape comparison abdullah@ebi.ac.uk Kahraman, A., Morris, R. J., Laskowski, R. A. & Thornton, J. M. (2007). Shape variation in protein binding pockets and their ligands. J Mol Biol368, 283-301. Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
IsoCleft Finder – query a database of binding sites + + • No sequence- or structure-based hints from Profunc • Bound MES (blue) used to define binding site • Top 2 hits (red and green respectively) are analogs of the product and substrate of the same reaction in Humans and E.coli. Putative function: Purine nucleoside phosphorylase: 2pd0: Cryptosporidium parvum protein, unknown function rafael.najmanovich@ebi.ac.uk Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
- In ProFunc the fold and reverse templates most successful - Successes but still need new methods and assays Conclusions - Need to use as many techniques as possible HTP ligand binding assays, HTP enzyme assays, IsoCleft, CleftXplorer, etc. Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
James Watson EBI Resources And Services Outreach and training
Interactive training for all levels of experience • Hands-on training in our purpose-built IT training suite at EMBL-EBI, Hinxton, Cambridge • EBI Roadshows bring expert trainers in our resources to your site with a variety of modules on offer • New e-learning platform currently in development • Full programme at www.ebi.ac.uk/training/ Wellcome Images Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
2008 2009 Coming up in our Hands-on Training Patterns, similarities and differences in biological data 9–11 June Programmatic access of Proteomics Resources 28–31 July Interactions and Pathways 26–27 August ENFIN Advanced Course on Protein Function Prediction 1–3 September 8–11 September Programmatic access in Perl: webservices and workflows 6–8 October A two-day dip into the EBI’s resources 24–27 November Programmatic access in Java: webservices and workflows Transcriptomics resources and data analysis 19–22 January 23–26 February Bioinformatics resources for protein structure Sequence to genes: genome informatics 16–18 March 27–29 April Programmatic access to biological databases 11–15 May A walk through EBI Bioinformatics Resources Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
The Bioinformatics Roadshow 06/07 Liverpool, Apr 07 (modules tbc) Cambridge, Nov 06 (MSD) Leuven, Oct 06 (MSD) Oxford, Dec 06 (all core services) Stanford and UCSD, Jun 06 (all core services) Trieste (ICGEB, Jun 07 (all core services) Portsmouth, Nov 06 (MSD) Basel, Sep 07 (modules tbc; BioSapiens) Exeter, Feb 07 (MSD) Harvard & MIT Mar 07 (all core services) Valencia, Apr 07 (modules tbc; BioSapiens) Melbourne, Jan 07 (proteomics) Subscribe to the EBI-FELICS Roadshows calendar at http://www.google.com/calendar/ Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
Roadshow modules Genomes Ensembl, EMBL-Bank, Integr8 Structures MSD, PDBSum, ProFunc Transcriptomes ArrayExpress, Expression Profiler, R/Bioconductor Proteomes UniProt, InterPro, IntAct, PRIDE, OLS Mini modules Web services; BioMart; SRS; Chemistry GO/GOA; Alignments; Literature Pathways Reactome, BioModels, BRENDA Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
eLearning pilot project Sequence searching Introduction BLAST for beginners Intermediate BLAST Patterns, profiles & HMMs Other tools: SSAHA, FASTA, MPSrch For each module… Video tutorial Print tutorial Key concepts quiz Reflective tasks More to come … Basic and advanced courses on core data resources Web services Structural Biology Resources Looking for beta-testers! Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease
Acknowledgements • Funding: MCSG, NIH/NIGMS PSI, BioSapiens • Structures: MCSG + many others • Enzyme Assays: Alexei Savchenko, Alexander Yakunin, Mike Proudfoot. • Thornton Group • EBI Outreach and Training Team • Organisers and of course, you! Structure To Function In Structural Genomics: Contribution To The Study Of Health And Disease