1 / 17

Introduction to Bioinformatics - Tutorial no. 8

This tutorial introduces the 3D Protein Structure PDB, a database of molecular structures obtained by crystallography or NMR. It covers the features and functionalities of the PDB, including searching, navigating, and using the SCOP classification. The tutorial also explains the TOPITS method for predicting remote homologues based on sequence information and provides insights into its accuracy.

robinettem
Download Presentation

Introduction to Bioinformatics - Tutorial no. 8

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Bioinformatics - Tutorial no. 8 3D Protein Structure

  2. PDB • http://www.pdb.org • Database of molecular structures • Obtained by crystallography or NMR • Carefully curated and validated • Founded in 1971 • 26,051proteins • 2,597 other structures: • Carbohydrates • Nucleic Acids • Protein/Nucleic Acid Complexes • Additional protein information • Secondary structure • References, external links

  3. PDB: Summary Information Molecule in PDB entry Chains in molecule Experimental method Link to SCOP

  4. PDB: 3D Structure • Each PDB entry contains the 3D coordinates for all Protein’s atoms. • Still images at fixed orientation • Generate at any size • Interactive molecule explorer • Requires Java or Chime plug-in • Download structure file • Display in RasMol,Swiss-PDBViewer, etc…

  5. PDB: Searching • By four-character PDB ID (for example 9ins) • Text-search: • Against all fields • Against author names • Words can be combined with boolean expressions • Example: protein kinase - will find only protein kinase protein and kinase – will find all structures containing the word protein and the word kinase • Wildcards (*) can be used • Example: h*moglobin – will find both the protein hemoglobin and the protein haemoglobin • Parts of words (uncheck the option “match exact word”) • Example: hemoglobin – will find also hemoglobinase

  6. PDB: Searching (2) Text Search (Authors/ Full Text) PDB ID Chain types in the molecule Experimental Technique

  7. PDB Searching (3) Then click here Click on any field you would like to search

  8. PDB Searching (4) And then we get:

  9. PDB Searching • Only the Text Search field can use boolean searches. • The different criteria from the different fields are automatically combined with AND. • Iterative Searching: you can narrow down the search by performing another search only on the results. • You can filter out the results manually before performing the next search.

  10. SCOP • Structural Classification of Proteins • Based on known protein structures • Manually created by visual inspection • Hierarchical database structure • Class, fold, superfamily, family • Proteins/domains, species instances • Founded in 1995 • 800 folds, 1295 superfamilies, 2327 families

  11. Path from root to node Children of node SCOP: Navigation Node description Node name

  12. TOPITS • 20% of the proteins in SwissProt are remote homologues to a protein in PDB database, i.e. the structures are homologous but pairwise sequence identity is not significant. • Threading techniques attempt to predict such remote homologues based on sequence information to thus increase the scope of homology modelling. • Principle: • Remote homologues (0-25% sequence identity) are detected by a prediction-based threading method. The principle idea is to detect similar motifs of secondary structure and accessibility between a sequence of unknown structure and a known fold .

  13. TOPITS • Strategy: • Project 3D structures onto 1D strings of secondary structure and relative solvent accessibility. • Predict secondary structure and solvent accessibility by neural network systems (PHD) for a query sequence. • Alignment of the predicted and observed 1D strings is done by dynamic programming. • The resulting alignment is used to detect remote 3D homologues.

  14. TOPITS • Accuracy - results should be taken with caution: • The first hit of the prediction-based threading is on average in 30% of the cases correct. • Hits with z-scores above 3.0 are more reliable (accuracy > 60%). • For exceptional cases the resulting alignments suffice for building correct homology-based models.

  15. TOPITS Output (1) Alignment score Alignment length % sequence identity Matched sequence Length of indels Number of indels Alignment significance Length of sequence

  16. TOPITS Output (2) Predicted structure Query sequence Buried / Outside Amino acid matches Database sequence Database known secondary structure

  17. GenTHREADER Output Prediction confidence Energy measurements Sequence alignment score and length Score from neural network Expected errors Length of sequence Structure from PDB

More Related