650 likes | 872 Views
Dr. K. Sivakumar Department of Chemistry SCSVMV University chemshiva@gmail.com. Computational Analysis of Proteins. National Workshop on Modern Techniques in Analytical Chemistry. Chemistry – Our Life, Our Future. www.kanchiuniv.ac.in/DrKSivakumar_chemistry.html. Amino Acid.
E N D
Dr. K. Sivakumar Department of Chemistry SCSVMV University chemshiva@gmail.com Computational Analysis of Proteins National Workshop onModern Techniques in Analytical Chemistry Chemistry – Our Life, Our Future www.kanchiuniv.ac.in/DrKSivakumar_chemistry.html
Amino Acid Triple letter code Single letter code General structureof an amino acid Alanine Ala A Cysteine Cys C Aspartic acid Asp D Glutamic acid Glu E Phenylalanine Phe F Glycine Gly G Histidine His H Isoleucine Ile I Lysine Lys K Leucine Leu L Methionine Met M Asparagine Asn N Proline Pro P Glutamine Gln Q Arginine Arg R Serine Ser S Threonine Thr T Valine Val V Tryptophan Trp W Tyrosine Tyr Y AMINO ACIDS: THE BUILDING BLOCKS OF PROTEINS Triple & single letter codes of amino acids
Protein sequencer Alanine Leucine Serine Phenylalanine Methionine MALSFTVGQLIFLFWTMRITEASPD Protein sequence PROTEIN SEQUENCING ( Order of amino acids in proteins) • Protein sequencing - determining the order of amino acid sequence • Methods– Mass Spec., Edman degradation,…. • Amino acids in a protein -determines the properties of proteins • Proteins are sequenced - by microbiologists and biotechnologists for various purposes.
Refer “GENOME” by Sujatha, for simple explanations on sequencing process www.writersujatha.com
Methane C for carbon C for single atom Primary structure Secondary structure Tertiary structure M for Metheonine M for group of atoms Protein Primary structure Tertiary structure Secondary structure
Protein sequences are continuously submitted by sequencing centers and updated in protein databases. • Till date more than 10 Lac proteins are sequenced and publicly made available through protein databases. For example, Protein Sequence Databases No. of Sequences 524,420 1,365,912 13,593,921
Sequence growth in Protein sequence databases: Ref: SwissProt – Feb’ 2011 Ref: GenomeNet – Feb’ 2011
Protein Sequence Databases No. of Sequences 524,420 - ~ 5 Lac 1,365,912 - > 10 Lac 13,593,921 - ~ 1 Cr The ONLY Protein Structure Database No. of Structure 70,947 Till 01, Feb, 2011 Ref: K. Sivakumar, Advanced BioTech, V (9), 20-27 (2007)
PDB contains (70,947) structures determined by X-ray, NMR & Electron microscopy EM ~350 X-ray ~60,500 NMR ~8,700
Most of the sequenced proteins lack a descriptive, documented physico-chemical and STRUCTURAL characterization. • Because, experimental methods (X-ray, NMR, EM) are, • Trial and error based • Time consuming • Expensive Computational methods are, • Minimizing the number of experimental trials. • Reduces the cost of experimental investigation. • Facilitates experimental analysis be more focused. Ref: K. Sivakumar, S. Balaji, Ganga Radhakrishnan, Journal of Theoretical and Computational Chemistry, 6 (1), 127-140 (2007).
Need for computational analysis • > 10 Lac sequences are available in public databases • Sequences are highly valuable resources, because… • Huge amount of structural, functional & evolutionary information are locked up in sequences • By contrast, the # of unique protein structures is very less • - this represents a hugeinformation deficit • So, We need to construct 3D Models by COMPUTATIONAL METHODS
3D Structure can be modelled by… • Homology Modeling • Threading • Ab initio
Homology Modeling – Principle… Repeated with other suitable templates Ref: K. Sivakumar, Advanced BioTech, IV (11), 18-23 (2006)
Predicting Protein Structure: Comparative Modeling (formerly, homology modeling) Homologous KQFTKCELSQNLYDIDGYGRIALPELICTMFHTSGYDTQAIVENDESTEYGLFQISNALWCKSSQSPQSRNICDITCDKFLDDDITDDIMCAKKILDIKGIDYWIAHKALCTEKLEQWLCEKE KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNTDGSTDYGILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDGNGMNAWVAWRNRCKGTDVQAWIRGCRL Share Similar Sequence 1alc ? Use as template & model 8lyz Template sequence Target sequence Template structure
What is Homology Modeling? • Predicts the three-dimensional structure of a given protein sequence (TARGET) based on an alignment to one or more known protein structures (TEMPLATES) • If similarity between the TARGET sequence and the TEMPLATE sequence is detected, structural similarity can be assumed. • In general, 30% sequence identity is required for generating useful models.
Homology Modeling • Get protein sequence from sequence database http://expasy.org/sprot/
protein sequence in fasta format • Save it in a notepad for further use
Using Protein Blast server to find similar STRUCTURE http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins Paste sequence in Fasta format Choose PDB Click to search, similar structures in PDB
Graphical summary of Blastp suite Blast search of O70456 Vs PDB
Method1: EsyPred3D server - Submit the sequence and PDB ID Paste sequence only Type the PDB ID Click to submit
Method2: SWISS-MODEL server Click for modeling
Submit sequence only in Fasta format (without PDB ID) • Similarity search (BlastP) will be done by SWISS-MODEL server Paste sequence Click to submit
The links in the email will lead to Click to download 3D structure
Structure retrieval from Protein 3D Structure Database – PDB……….
Structure retrieval from Protein 3D Structure Database – PDB………. 491 sequence in SwissProt for « Keratin » PDB ID Click for protein details
Structure retrieval from Protein 3D Structure Database – PDB………. Click for downloading structure
Structure retrieval from Protein 3D Structure Database – PDB………. Save & Know the location
Open and visualize the *.pdb file in RasMol Structure of 3EUU
MNRVDLSLFIPDSLTAETGDLKIKTYKVVLIARAASIFGVKRIVIYHDDADGEARFIRDILTYMDTPQYLRRKVFPIMRELKHVGILPPLRTPHHPTGMNRVDLSLFIPDSLTAETGDLKIKTYKVVLIARAASIFGVKRIVIYHDDADGEARFIRDILTYMDTPQYLRRKVFPIMRELKHVGILPPLRTPHHPTG Sequence data Structural data (in notepad) Structural data (in RasMol)
Built model validation by ProQ server Click for uploading structure
Built model validation by ProQ server Click & upload the structure
Built model validation by ProQ server Submit after uploading
Built model validation by Ramachandran Plot Click & upload the structure
Built model validation by Ramachandran Plot…. Submit after uploading
Built model validation by Ramachandran Plot…. RESULTS G.N.Ramachandran
3D structure modeling and validation Ref: K. Sivakumar, S. Balaji, Ganga Radhakrishnan, Journal of Chemical Sciences,119 (5), 571-579 (2007)
Disulphide bridges in 3D structure of Q01758 • Backbone of Q01758 (rainbow smelt fish) • 10 Cysteines - ball and stick • 10 Sulphur in Cysteines and 5 SS bonds (dotted lines)
Disulphide bridges in 3D structure of P05140 • Ribbon model of P05140 (sea raven) • 10 Cysteines - ball and stick • 10 Sulphur in Cysteines and 5 SS bonds (dotted lines)