1 / 51

Basic bioinformatics tools for studying proteins

Basic bioinformatics tools for studying proteins. Dong Xu Computer Science Department C. S. Bond Life Sciences Center University of Missouri, Columbia http://digbio.missouri.edu. Introduction. Broaden knowledge for undergraduate education

Download Presentation

Basic bioinformatics tools for studying proteins

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Basic bioinformatics tools for studying proteins Dong Xu Computer Science Department C. S. Bond Life Sciences Center University of Missouri, Columbia http://digbio.missouri.edu

  2. Introduction • Broaden knowledge for undergraduate education • Many opportunities for biomedical and agricultural related jobs • Practice basic protein tools: • Useful for biological studies • Intellectually stimulating • Dong’s picks for beginners : • Not unnecessarily the most accurate tool • Easy to use and understand • Very popular

  3. Proteins – Some Basics • What Is a Protein? • Linear Sequence of Amino Acids... • What is an Amino Acid?

  4. 20 Amino acids Leucine (L) Isoleucine (I) Valine (V) Alanine (A) Glycine (G) Proline (P) Asparagine (N) Methionine (M) Tryptophan (W) Phenylalanine (F) Tyrosine (Y) Threonine (T) Serine (S) Cysteine (C) Glutamine (Q) Histidine (H) Glutamic acid (E) Arginine (R) Asparatic acid (D) Lysine (K) White: Hydrophobic,Green: Hydrophilic,Red: Acidic,Blue: Basic

  5. Peptide Bond • Amino Acids connect via PEPTIDE BOND D F T A A S K G N S G

  6. An Overview • A protein folds into a unique 3D structure under the physiological condition Lysozyme sequence (129 amino acids): KVFGRCELAA AMKRHGLDNY RGYSLGNWVC AAKFESNFNT QATNRNTDGS TDYGILQINS RWWCNDGRTP GSRNLCNIPC SALLSSDITA SVNCAKKIVS DGNGMNAWVA WRNRCKGTDV QAWIRGCRL Protein backbones: Side chain

  7. Primary, Secondary and Tertiary Structures of Proteins

  8. Protein Structure Representations Lysozyme structure: ball & stick strand surface

  9. Structure Visualization • Rasmol (http://www.umass.edu/microbio/rasmol/getras.htm) • MDL Chime (plug-in) (http://www.mdl.com/products/framework/chime/) • Protein Explorer (http://molvis.sdsc.edu/protexpl/frntdoor.htm) • Jmol: http://jmol.sourceforge.net/ • Pymol: http://pymol.sourceforge.net/ • Vmd: http://www.ks.uiuc.edu/Research/vmd/

  10. VLSPADKTNVKAAWAKVGAHAAGHG ||| | | |||| | |||| VLSEAEWQLVLHVWAKVEADVAGHG Sequence Homology Software • NCBI-BLAST • http://www.ncbi.nlm.nih.gov/BLAST/ • Comparing 2 (pairwise) or more (multiple) sequences. • Searching for a series of identical or similar characters in the sequences.

  11. Typical BLAST Output

  12. InterPro Scanhttp://www.ebi.ac.uk/InterProScan/

  13. InterPro Scan PCNA http://www.ebi.ac.uk/InterProScan/

  14. MyHits Local Motifs Searchhttp://myhits.isb-sib.ch/

  15. MyHits Local Motifs Summaryhttp://myhits.isb-sib.ch/

  16. MyHits Local Motif Hitshttp://myhits.isb-sib.ch/

  17. Multiple Alignment VTISCTGSESNIGAG-NHVKWYQQLPG VTISCTGTESNIGS--ITVNWYQQLPG LRLSCSSSDFIFSS--YAMYWVRQAPG LSLTCTVSETSFDD--YYSTWVRQPPG PEVTCVVVDVSHEDPQVKFNWYVDG-- ATLVCLISDFYPGA--VTVAWKADS-- AALGCLVKDYFPEP--VTVSWNSG--- VSLTCLVKEFYPSD--IAVEWWSNG--

  18. Phylogeny Tree Multiple protein sequence alignment conserved sites and hence possibly functional sites phylogenetic tree

  19. MSA with ClustalW ClustalW: http://www.ebi.ac.uk/Tools/clustalw2/index.html

  20. Cell localization

  21. Typical Sorting Signals

  22. Localizations Cell localization • PSORT: http://psort.nibb.ac.jp/ • TargetP: http://www.cbs.dtu.dk/services/TargetP/ Signal peptide • SingalP: http://www.cbs.dtu.dk/services/SignalP/

  23. SignalP result

  24. Membrane Bilayer with Proteins

  25. Helix Bundle TM Proteins PDB = 1QHJ PDB = 1RRC Single helix or helical bundles (> 90% of TM proteins) Examples: Human growth hormone receptor, Insulin receptor ATP binding cassette family - CFTR Multidrug resistance proteins 7TM receptors - G protein-linked receptors

  26. Beta Barrel TM Proteins

  27. Transmembrane Prediction http://bp.nuap.nagoya-u.ac.jp/sosui/ (alpha) http://psfs.cbrc.jp/tmbeta-net/ (beta)

  28. Secondary Structure Prediction SSpro 4.1: http://sysbio.rnet.missouri.edu/multicom_toolbox/ PSI-PRED: http://bioinf.cs.ucl.ac.uk/psipred/psiform.html SAM: http://compbio.soe.ucsc.edu/SAM_T08/T08-query.html PHD: http://www.predictprotein.org/

  29. Coiled coil prediction http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_lupas.html

  30. Special motif prediction Helix-turn-helix motif prediction http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_hth.html Kinase related motifs http://scansite.mit.edu/motifscan_seq.phtml Leucine Zippers http://2zip.molgen.mpg.de/index.html

  31. Protein disorder prediction PreDisorder: http://sysbio.rnet.missouri.edu/multicom_toolbox/ A collection of disorder predictors: http://www.disprot.org/predictors.php

  32. 2D: Contact Map Prediction 2D Contact Map 3D Structure 1 2 ………..………..…j...…………………..…n 1 2 3 . . . . i . . . . . . . n Distance Threshold = 8Ao

  33. Contact Prediction • SVMcon: http://casp.rnet.missouri.edu/svmcon.html • NNcon: http://casp.rnet.missouri.edu/nncon.html • SCRATCH: http://scratch.proteomics.ics.uci.edu/ • SAM: http://compbio.soe.ucsc.edu/HMM-apps/HMM-applications.html

  34. Structure Comparison Visualize structure alignment using VAST: http://www.ncbi.nlm.nih.gov/Structure/ Two ferredoxins, 1DOI and 1AWD, are aligned structurally, showing an insertion in 1DOI that contains potassium-ion binding sites. This may be the result of adaptations to the high salt environment of the Dead Sea.

  35. Structure Alignment Tools • CE (http://cl.sdsc.edu/) • DALI (http://www.ebi.ac.uk/dali/) • TM-Align: http://zhang.bioinformatics.ku.edu/TM-align/

  36. Structure-Based Search Comparing a query protein structure against all the structures in the PDB The DALI server: http://www2.ebi.ac.uk/dali/ When new structures are solved, researchers often submit them to the DALI server to find structural neighbors and their alignments.

  37. Swiss Model: Comparative Modeling Serverhttp://swissmodel.expasy.org/

  38. Protein Structure Homology Modeling: Modeller

  39. Analysis software • PROCHECK • WHATCHECK • Suite Biotech • PROSA

  40. Entrez Databaseshttp://www.ncbi.nlm.nih.gov/Entrez/

  41. Design Program • DEZYMER (Hellinga) • Given a ligand and a protein with known structure, suggest residues to be mutated so that the resulting protein binds the ligand. • ORBIT (Mayo) • Given a backbone structure, design a sequence such that it folds to that backbone. • Rosetta (Baker) • One program to treat diverse problems • Prediction and design

  42. DEZYMER 1. Define the expected binding geometry 2. Find backbone places where if appropriate side chains are added, the predefined geometry is satisfied 3. Place the side chains and ligand, and optimize there position 4. Repack residues in positions other than binding residues. If necessary, change residue type Hellinga and Richards, JMB, 1991. Construction of new ligand binding sites in protein of known structure

  43. ORBIT 1. Divide the target structure into three parts: core, surface and boundary 2. Core: Ala, Val, Leu, Ile, Phe, Tyr, Trp Surface: Ala, Ser, Thr, His, Asp, Asn, Glu, Gln, Lys, and Arg Boundary: union of the above two 3. 1.9*1027 possible sequence 4. Select best sequence efficiently, using dead end elimination (DDE) Solution structure of the designed protein. Stereoview showing the best-fit superposition of the 41 Comparison between the designed backbone (averaged NMR structure, blue) and the target backbone (red)

  44. Calciomics • Calciomics is a specialized area of biochemistry focusing on the study of calcium-binding biological macromolecules and proteins to understand the factors that contribute to calcium-binding affinity and the selectivity of proteins and calcium-dependent conformational change. • http://lithium.gsu.edu/faculty/Yang/Calciomics.htm

  45. Original sequence Set of domain sequences SignalP Remove signal region ProDom sequence analysis and processing Coiled coils Remove disorder regions SOSUI Remove transmembrane regions Modified sequences PSI-BLAST Function annotation toolkit SWISS-PROT annotation SSP Secondary Structure prediction Iterations: Analysis of E-value, set of profile sequences Enzyme structure DB PROSPECT PSORT Subcellular location function inference structure prediction and evaluation STOP if homolog found in PDB PFAM Family classification MODELLER / Jackle Evaluate & adjust alignments Motif Active sites Medline Literature search WHATIF / PROCHECK 3D model

  46. Summary • Practice 10 selected tools • Help answer the question: what does this protein do? • Collaborate with experimentalists • Find more tools at • http://us.expasy.org/tools/ • http://infosuite.welch.jhmi.edu/BS/pt

More Related