1 / 51

July 10, 200 3

III Latin American Course on Bioinformatics for Tropical Disease Research Universidade de São Paulo, Brasil. Protein Architecture. Osmar NORBERTO DE SOUZA Laboratório de Bioinformática, Modelagem e Simulação de Biossistemas (LABIO) E-mail: osmarns@inf.pucrs.br

vianca
Download Presentation

July 10, 200 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. III Latin American Course on Bioinformatics for Tropical Disease Research Universidade de São Paulo, Brasil Protein Architecture Osmar NORBERTO DE SOUZA Laboratório de Bioinformática, Modelagem e Simulação de Biossistemas (LABIO) E-mail: osmarns@inf.pucrs.br PPGCC, PPGBCM e Instituto de Pesquisas Biomédicas Pontifícia Universidade Católica do Rio Grande do Sul – PUCRS Porto Alegre - RS July 10, 2003

  2. Why Study Proteins ? In the drama of life on a molecular scale, proteins are where the action is (Arthur M. Lesk, Introduction to Protein Architecture, OUP, 2001) Proteins are fascinating molecular devices that play a variety of roles in life processes: 12 Function Categories (Gene Ontology Project). • Cellular Processes • Metabolism • DNA Replication/Modification • Transcription/Translation • Intracellular Signaling • Cell-Cell Communication • Protein Folding/Degradation • Transport • Multifunctional Proteins • Cytoskeletal/Structural • Defense and Immunity • Miscellaneous Function

  3. Example of an Eukaryotic Cell

  4. Why Study Proteins ? Until recently, molecular biologists have studied individual proteins, learning their secrets one by one. Proteins have an underlying chemical unity; They have the ability to organize themselves in three dimesnions (3D); The system that produces them can create inherited structural variations, conferring them the ability to evolve. Technical developments in the late 80’s and early-mid 90’s is now permitting the study of the entire spectrum of proteins of an organism. Genomics or Genome Sequencing has transformed molecular biology: For the first time we can have a complete information about all or possible proteins in an organism.

  5. Central Dogma of Molecular Biology DNA CTCGAGGGGCCTAGACATTGCCCTCCAGAGAGAGCA ··· mRNA CUCGAGGGGCCUAGACAUUGCCCUCCAGAGAGAGCA ··· Protein L E G P R H C P P E R A ···

  6. Homo sapiens Arabidopsis thaliana Caenorhabitis elegans Drosophila melanogaster Mus musculus Vibrio cholerae Plasmodium falciparum Mycobacterium leprae Mycobacterium tuberculosis Neisseria meningitidis Chlamydia pneumoniae Pseudomonas aeruginosa Helicobacter pyroli Xylella fastidiosa Bacillus subtilis Saccharomyces cerevisiae Salmonella enterica E. coli Yersinia pestis Schizosaccaromyces pombe 1130 Genome Projects 200 published Ongoing: 508 prokaryote 422 eukaryote Source: GOLD, Genome Online Database (30/06/2004)

  7. Biological Sequence and Structure Explosion Number of Deposited Sequences in GenBank (07/2004) Number of Protein 3D Structures Solved and Deposited in PDB – Protein Data Bank (RCSB) (06/07/2004) 32,5 millions Proteins Sequences ~1,921,851 nr (07/2004) 23,747 Proteins 3D Structures (06/07/2004) 800 distinct folds (SCOP, 07/2004)

  8. Why Study Proteins ? This development has brought together several scientific disciplines, including biology, physics, chemistry, computer science, mathematics and statistics Bioinformatics: is the systematization of the data into structured and interlinked computer data banks, and the development of tools for access to these data. The results support research and development of methods to draw structural inferences from sequence data and vice versa. (Lesk, 2001) The major goal of this LECTURE is to introduce YOU to protein architecture and to the practice (WORKSHOP) of visualizing and manipulating them using freely available web sites and computer programs. The counterpart of Genomics is Structural Genomics: The solution of all protein structures in an organism

  9. What is Structural Bioinformatics ? Structural (Molecular) Bioinformatics is: the conceptualization of biology in terms of molecules (in the sense of Physical Chemistry) and applying information techniques(derived from disciplines such as applied mathematics, computer science and statistics) to understand and organize the information associated with these molecules, on a large scale. In short, bioinformatics is a management information system for molecular biology and has many practical applications. Luscombe, N.M.; Greembaum D.; Gerstein, M. Wha is Bioinformatics? A Proposed Definition and Overview of the Field. (2001). Method Inform Med, 40: 346-358, 2001.

  10. Why Study Proteins ? Protein 3D structures reveal the variety of spatial patterns that nature produces in this family of molecules. They, in turn, support a number of fascinating and useful scientific endeavours, including: • Interpretation of the mechanisms of function of individual proteins • Approaches to the “protein folding” problem • Prediction of the structure of closely-related proteins: homology modelling • Protein Engineering • Modifications to probe mechanism of function • Attempts to enhance thermostability • Clinical applications • Drug design

  11. Location of Mutants Conserved Residues SNPs Clefts (active sites) Dynamics (breathing) Surface Shape & Charge Antigenic Sites, surface patches Crystal Packing Functional Oligomerization Relative Juxtaposition Fold Interaction Interfaces Catalytic Clusters Motifs Catalytic Mechanism Evolutionary Relationships Structure – Key to Dissect Function Structure

  12. GenBank GenPept PDB RCSB EMBL DDBJ PIR (NRL_3D) SwissProt Names and Contents of Some Biological Databases DNA: Proteins:  3D Structures:

  13. Sequences DB 3D Structure DB D N A + D N A P R O T E I N S Sequences DB P R O T E I N S Biological Sequence and Structure Databases >Homo sapiens hemoglobin, alpha 2 (HBA2 ACTCTTCTGGTCCCCACAGACTCA... >gi|191273|gb|K02233.1|GPIINS complete cds CTGCAGACCCAGCACCAGGGAAATG...  >gi|387059|gb|AAA37041.1| insulin MALWMHLLTVLALLALWGPNTNQ... >gi|4504347|ref|NP_000549.1| hemoglobin, alpha 1 [Homo sapiens] MVLSPADKTNVKAAWGKVGAHA... 

  14. Some Addresses of Biological Sequence and Structure Databases 1.GenBank:Mantained by the National Center for Biotechnology Information (NCBI) of the National Institutes of Health (NIH), Bethesda, Maryland, United States. 2. EMBL (EBI):European Molecular Biology Laboratories. EMBL consists of five facilities: the main Laboratory in Heidelberg [Germany], Outstations in Hamburg [Germany], Grenoble [France], the EBI in Hinxton [U.K.], and an external Research Programme in Monterotondo [Italy]. 3.DDBJ:or DNA Database of Japan is based in Mishima, Japan. 4.PDB/RCSB:or Protein Data Bank/Research Collaboratory forStructural Bioinformatics, located at Rutgers University, New Jersey, United States.

  15. Protein Structure Proteins are polymers containing a backbone or main chain of repeating units – the peptides – with a sidechain attached to each. The atoms in each main chain of each residue are denoted N, C, C abd O. The side chain is attached to the C. Side chain atoms are identified by their chemical names and successive letters from the Greek alphabet, proceeding out from C.

  16. Amino Acids Structure Natural proteins contain a basic repertoire of 20 amino acids

  17. Amino Acids Structure Natural proteins contain a basic repertoire of 20 amino acids

  18. Amino Acids Structure Natural proteins contain a basic repertoire of 20 amino acids

  19. Amino Acids Structure Kyte J. and Doolittle R.F.: A simple method for displaying the hydropathic character of a protien. J Mol Biol 157:105, 1982. Hoop T.P. and Woods K.R.: Prediction of protein antigenic determinants from amino acid sequences. Proc. Natl. Acad. Sci. USA 78:3824, 1981.

  20. Protein Conformation The condensation of amino acids produces a polypeptide chain, with the backbone atoms linked through the peptide bond.

  21. Main Chain Conformation The folding pattern of a polypeptide chain can be described in terms of angles of internal rotation around the bonds in the main chain. The bonds in the polypeptide backbone between N and C, and between C and C, are single bonds. Internal rotations around these bonds are not restricted by the electronic structure, but only by steric classhes in the conformations produced.

  22. Main Chain Conformation The folding pattern of a polypeptide chain can be described in terms of angles of internal rotation around the bonds in the main chain. ω is restricted to be close to 180° (trans) o 0° (cis).

  23. Main Chain Conformation The Sasikharan-Ramkrishan-Ramachandran diagram Because most residues in proteins have trans peptide bonds, the main chain conformation of each residue is determined by the two angles φ and ψ. There are two main allowed regions, one around φ= -57° and ψ= -47° (denoted by R) and the other around φ = -125° and ψ = +125° (denoted by ).

  24. Sidechain Conformation Sidechain conformations are also described by angles of internal rotation, denoted by χ1 up to χ5, working out along the sidechain. Different sidechains have different numbers of degrees of freedom. A phenylalanine side chain has two angles of internal rotation, whereas lysine has 4.

  25. Levels of Protein Structure Primary structure Secondary Structure Tertiary Structure Quaternary Structure

  26. Levels of Protein Structure Primary structure The linear sequence of amino acids MTATATEGAK PPFVSRSVLV TGGNRGIGLA IAQRLAADGH KVAVTHRGSG APKGLFGVEC DVTDSDAVDR AFTAVEEHQG PVEVLVSNAG LSADAFLMRM TEEKFEKVIN ANLTGAFRVA QRASRSMQRN KFGRMIFIGS VSGSWGIGNQ ANYAASKAGV IGMARSIARE LSKANVTANV VAPGYIDTDM TRALDERIQQ GALQFIPAKR VGTPAEVAGV VSFLASEDAS YISGAVIPVD GGMGMGH-247

  27. Levels of Protein Structure Secondary structure

  28. Levels of Protein Structure Supersecondary structure

  29. Levels of Protein Structure Tertiary structure = Fold

  30. Levels of Protein Structure Quaternary structure

  31. A Small Album of Protein Structures

  32. Comparison of Protein Structures

  33. Protein Domains • Most proteins have modular, discrete structural units • 25 – 500 residues long • Most < 200 residues • Less than 50 residues are usually stabilized by S–S bonds or metal ions • Domains (conserved sequences or structures) are identified by multiple sequence alignments

  34. Protein Sequence Homology (1) Protein Match with Known or Unknown Function Query Match (2) Domain Match with Known or Unknown Function Query Match

  35. Protein Domain Architecture (1) Single-domain Protein (2) Multi-domain Protein Domain A B C D • Prokaryotic Proteome: 2/3 proteins are > 2 domains • Eukaryotic Proteome: 4/5 proteins are multi-domain

  36. Protein Motifs

  37. CATH – Protein Structure Classification • Hierarchical classification of protein domain structures in the Brookhaven Protein Databank (PDB). • Domains are clustered at four major levels: • Class • Architecture • Topology • Homologous superfamily • Sequence family

  38. CATH – Hierarchical classification • Classsecondary structure content: mainly α, mainly β,α – β, low 2nd structure content. • Architecturegross orientation of secondary structures, independent of connectivity. • Topology ( = fold)clusters structures according to their topological connections.

  39. Structural Classification of Proteins Superfamily: Probable common evolutionary origin - Proteins that have low sequence identities, but with structural and functional features suggest that a common evolutionary origin. Family: Proteins clustered together into families are evolutionarily related with pairwise residue identities between the proteins of 30% and greater. Some cases with low identity could have similar folds and evolution. Fold: Major structural similarity. Similar in secondary structure elements with same topological connections. Proteins with same fold may not have common evolutionary origin.

  40. CATH – architectures

  41. CATH – architectures (cont.)

  42. SCOP: Structural Classification of Proteins. 1.65 release20619 PDB Entries (1 August 2003). 54745 Domains. (excluding nucleic acids and theoretical models) Class No. of folds Supefamilies Families All α proteins 179 299 480 All β proteins 126 248 462 α and β proteins (α/β) 121 199 542 α and β proteins (α+β) 234 349 567 Multi-domain proteins 38 38 53 Membrane and cell 36 66 73 surface proteins Small proteins 66 95 150 _______________________________________________ Total 800 1294 2327

  43. Structural Bioinformatics Genomes Sequences Analyses (Blundell & Mizuguchi, 2000; Rubin et al., 2000) Class 1 - 50% can be inferred by homology Class 2 - 10% a 20% only have motifs or domains Classe - 30% a 40% can not be identified – unknown families

  44. expression cloning grow cristals purification NMR X-ray Structural Bioinformatics Genomes Sequences Analyses

  45. Structural Bioinformatics Comparative Protein Modelling by Homology (Shortle, 2001; Martí-Renom et al., 2000; Blundell et al., 1987) Based on similarity between protein sequences Identity ≥ 30% Small changes in protein sequences means small changes in protein structure (Chothia & Lesk, 1986) The most reliable method, at present, for the prediction of Protein 3D structure

  46. Target sequence Template structure Start SRSVLVTGGNRGIGLAIAQRLAADGHKVAVTHRGSG ... Identify similar structures (templates) Select templates Alignment Target...SRSVLVTGGNRGIGLAIAQRLAADGHKVAVTHRG... Template...SPVVVVTGASRGIGKAIALSLGKAGCKVLVNYAR... Align target and template sequences * *:***..**** *** *. * ** * Build target model using template structure(s) Target structure Evaluate the quality of the model No Is the model OK? Ramachandran plot Quality of the model (Target) PROCKECK (Laskowski et al.) Yes End

More Related