420 likes | 542 Views
Introduction to Protein Translation, Databases and Structural Alignment BMI 730. Victor Jin Department of Biomedical Informatics Ohio State University. Review of Protein Function and Translation Database and Software 3-D Alignment. Review of Protein Function and Translation
E N D
Introduction to Protein Translation, Databases and Structural AlignmentBMI 730 Victor Jin Department of Biomedical Informatics Ohio State University
Review of Protein Function and Translation Database and Software 3-D Alignment
Review of Protein Function and Translation Database and Software 3-D Alignment
Protein function • Proteins are basic building blocks for every cellular structure from smallest membrane-bound receptor to largest organelle. • Proteins are involved in all processes inside a cell. • a) Gene regulation • b) Metabolism • c) Signalling • d) Development • e) Structure
Proteins serve crucial roles in a cell • Catalysis:Almost all chemical reactions in a living cell are catalyzed by protein enzymes. • Transport:Some proteins transports various substances, such as oxygen, ions, and so on. • Information transfer:For example, hormones. Alcohol dehydrogenase oxidizes alcohols to aldehydes or ketones Haemoglobin carries oxygen Insulin controls the amount of sugar in the blood
Eukaryotic Translation Translation of mRNA is highly regulated in multi-cellular eukaryotic organisms, whereas in prokaryotes regulation occurs mainly at the level of transcription. There is global regulation of protein synthesis. • E.g., protein synthesis may be regulated in relation to the cell cycle or in response to cellular stresses such as starvation or accumulation of unfolded proteins in the endoplasmic reticulum. • Mechanisms include regulation by signal-activated phosphorylation or dephosphorylation of initiation and elongation factors.
microRNA Translationof particular mRNAs may be inhibited by small single-stranded microRNA molecules about 20-22 nucleotides long. MicroRNAs bind via base-pairing to 3' un-translated regions of mRNA along with a protein complex RISC (RNA-induced silencing complex), inhibiting translation and in some cases promoting mRNAdegradation. • Tissue-specific expression of particular genome-encoded microRNAs is an essential regulatory mechanism controlling embryonic development. • Some forms of cancer are associated with altered expression of microRNAs that regulate synthesis of proteins relevant to cell cycle progression or apoptosis.
Protein factors Protein factors that mediate & control translation are more numerous in eukaryotes than in prokaryotes. Eukaryotic factors are designated with the prefix "e". • Some factors are highly conserved across kingdoms. E.g., the eukaryotic elongation factor eEF1A is structurally and functionally similar to the prokaryotic EF-TU (EF1A). • In contrast, eEF1B, the eukaryotic equivalent of the GEF EF-Ts, is relatively complex, having multiple subunits subject to regulatory phosphorylation.
Initiation • Initiation of protein synthesis is much more complex in eukaryotes, & requires a large number of protein factors. • Some eukaryotic initiation factors (e.g., eIF3 & eIF4G) serve as scaffolds, with multiple domains that bind other proteins during assembly of large initiation complexes.
pre-initiation complex Usually a pre-initiation complex forms, including: • several initiation factors • the small ribosomal subunit • the loaded initiator tRNA, Met-tRNAiMet. This then binds to a separate complex that includes: • mRNA • initiation factors including ones that interact with the 5' methylguanosinecap & the 3' poly-A tail, structures unique to eukaryotic mRNA. • Within this complex mRNA is thought to circularize via interactions between factors that associate with the 5' cap & with a poly-A binding protein.
Translocation • After the initiation complex assembles, it translocatesalong the mRNA in a process called scanning, until the initiation codon is reached. • Scanning is facilitated by eukaryotic initiation factor eIF4A, which functions as an ATP-dependent helicase to unwindmRNA secondary structure while releasing bound proteins. • A short sequence of bases adjacent to the AUG initiation codon may aid in recognition of the start site. • After the initiation codon is recognized, there is hydrolysis of GTP and release of initiation factors, as the large ribosomal subunit joins the complex and elongation commences.
Review of Protein Function and Translation Database and Software 3-D Alignment
Protein Databases UniProt is the universal protein database, a central repository of protein data created by combining Swiss-Prot, TrEMBL and PIR. This makes it the world's most comprehensive resource on protein information. The Protein Information Resource (PIR), located at Georgetown University Medical Center (GUMC), is an integrated public bioinformatics resource to support genomic and proteomic research, and scientific studies. Swiss-Prot is a curated biological database of protein sequences from different species created in 1986 by Amos Bairoch during his PhD and developed by the Swiss Institute of Bioinformatics and the European Bioinformatics Institute. Pfam is a large collection of multiple sequence alignments and hidden Markov models covering many common protein domains and families. PDB NCBI http://proteome.nih.gov/links.html
PubMed – Protein Databases The Protein database contains sequence data from the translated coding regions from DNA sequences in GenBank, EMBL, and DDBJ as well as protein sequences submitted to Protein Information Resource (PIR), SWISS-PROT, Protein Research Foundation (PRF), and Protein Data Bank (PDB) (sequences from solved structures). The Structure database or Molecular Modeling Database (MMDB) contains experimental data from crystallographic and NMR structure determinations. The data for MMDB are obtained from the Protein Data Bank (PDB). The NCBI has cross-linked structural data to bibliographic information, to the sequence databases, and to the NCBI taxonomy. Use Cn3D, the NCBI 3D structure viewer, for easy interactive visualization of molecular structures from Entrez. Tutorial: http://www.pdb.org/pdbstatic/tutorials/tutorial.html
Example – UniProt - Expasy • http://www.uniprot.org/ http://www.expasy.org/
Example – PDB • http://www.pdb.org • Only proteins with known structures are included.
Protein Visualization Softwares • Cn3d • RasMol • TOPS • Chime • DSSP • Molscript • Ribbons • MSMS • Surfnet • …
Review of Protein Function and Translation Database and Software 3-D Alignment
Why Align Structures • For homologous proteins (similar ancestry), this provides the “gold standard” for sequence alignment – elucidates the common ancestry of the proteins. • For nonhomologous proteins, allows us to identify common substructures of interest. • Allows us to classify proteins into clusters, based on structural similarity.
Example of Structural Homologs Sequence alignment SLSAAEADLAGKSWAPVFANKNANGLDFLVALFEKFPDSANFFADFK-GKSVADIKA-S VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHG PKLRDVSSRIFTRLNEFVNNAANAGKMSAMLSQFAKEHVGFGVGSAQFENVRSMFPGFVA KKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTP Structural alignment XSLSAAEADLAGKSW-APVFANKN-ANGLDFLVALFEKFPDSANFF-ADFKGKSVA—-DIK V-LSPADKTNVKAAWGK-VGAHA-GEYGAEALERMFLSFPTTKTYFPHF-------DLS-H ASPKLRDVSSRIFTRLNEFVNNAANAGKMSA-MLSQ-FAKEHV-GFGVGSAQFENVRSM-F GSAQVKGHGKKVADALTNAVAHV-D—-DMPNAL—-SALSDLHAHKLRVDPVNFKLLS-HCL PGFVA LVTLAAHLPAEFTP
Sequence/Structure Homology • The existence of large numbers of remote homologs shows us that true structural similarity is hard to see in the primary amino acid sequence • Structural conservation is stronger than sequence conservation
Remote Homology • Remote homologs sometimes conserve function (all SH3-like domains bind peptides), and often conserve active site locations (TIM barrels active sites are at the ends of the barrels). • Remote homologs probably are evolutionarily related and fold using the same folding pathway.
Example of Structural Homologs 4DFR: Dihydrofolate reductase 1YAC: Octameric Hydrolase of Unknown Specificity 5.9% sequence identity (best alignment) 1YAC structure solved without knowing function. Alignment to 4DFR and others implies it is a hydrolase of some sort.
Example of Structural Homologs Sheets only Helices only DHFR: yellow & orange YAC: green & purple
Sander-Schneider Relationship- “Naturally occurring sequences with more than 25% sequence identity over 80 or more residues always adopt the same basic structure”. - It only applies to naturally occurring proteins of known structure seen so far except for a few exceptions.- It is the basis of comparative modeling. Guaranteed structural similarity given by the relationship is a means to predict structure.
How to Align Structures • Visual inspection (by eye) • Computational approach • Point-based methods using point distances and other properties to establish correspondences • Secondary structure-based methods use vectors representing secondary structures to establish correspondences.
Global versus Local Global alignment
Local Alignment motif
Structural Alignment Algorithms Alignment algorithms create a one-to-one mapping of subset(s) of one sequence to subset(s) of another sequence. Structure-based alignment algorithms do this by minimizing the structure difference score or root-mean-square difference(rmsd) in alpha-carbon positions. The Problem Is: we don’t know the alignment. Structure-based alignment programs determine the alignment that minimizes the rmsd.
Evaluating Structural Alignments • # of aligned residues • Percent identity in aligned residues • # of gaps • Size of two proteins • Conservation of known active site environments • RMSD (root mean square deviation) of corresponding residues • Dihedral angle difference … • No universal criterion • Application dependent
Least Squares Superposition Problem: find the rotation matrix, R and a vector, v, that minimize the following quantity: Where xi are the coordinates from one molecule and yi are the equivalent* coordinates from another molecule. *equivalent based on alignment
Comparing dihedral angles Torsion angles (f,y) are: - local by nature (error propagation) - invariant upon rotation and translation of the molecule - compact (O(n) angles for a protein of n residues) Add 1 degree To all f, y
Structural Alignments Methods • STRUCTAL [Levitt, Subbiah, Gerstein] • Using dynamic programming with a distance metric • DALI [Holm, Sander] • Analysis of distance maps • LOCK [Singh, Brutlag] • Analysis of secondary structure vectors, followed by refinement with distances • SSAP [Orengo and Taylor, 1989] • VAST [Gibrat et al., 1996] • CE [Shindyalov and Bourne, 1998] • SSM [Krissinel and Henrik, 2004] • …
Two Subproblems • Find correspondence set • Find alignment transform • (protein superposition problem) • Chicken-and-egg
DALI (Distance ALIgnment) • DALI has been used to do an ALL vs. ALL comparison of proteins in the PDB, and to create a hierarchical clustering of families. • http://www.ebi.ac.uk/dali/ • FSSP = fold classification based on structure-structure alignment of proteins • http://ekhidna.biocenter.helsinki.fi/dali/start
VAST (Vector Alignment Search Tool) • It places great emphasis on the definition of the threshold of significant structural similarity to avoid (many) similarities of small substructures that occur by chance in protein structure comparison. • At the heart of VAST's significance calculation is definition of the "unit" of tertiary structure similarity as pairs of secondary structure elements (SSE's) that have similar type, relative orientation, and connectivity. In comparing two protein domains the most surprising substructure similarity is that where the sum of superposition scores across these "units" is greatest. • http://www.ncbi.nlm.nih.gov/Structure/RESEARCH/iucrabs.html#Ref_6
Exercises • Look up Human Catalase in www.expasy.org. Find out: • How long is the protein chain? Where is its active site? • Is its 3D structure available? If so, how was it obtained? • How long is its longest helix chain and where is it located? • Look up PDB ID 1DGB in PDB. Find out: • What protein is it? • What is the resolution of its x-ray structure? • Visualize its structure using the tools provided on PDB website (try them all). • Look up PDB ID 1DGB in MMDB (PubMed Structure Database). Find out: • What is its MMDB ID? • Visualize its 3D structure using Cn3D. Export the images for different rendering effects (e.g., worm, spacefill). • Search its structure neighbors using VAST. How many neighbors are found for the entire chain? • Perform a VAST search for 2CZU chain A. • View its alignment (in sequence) with 1X8P chain A, 1GKA chain B, and 1BJ7. • Compare the structure alignment results with sequence alignment results (using ClustalW). • View its alignment with 1X8P chain A in Cn3D.