170 likes | 180 Views
This tutorial introduces the 3D Protein Structure PDB, a database of molecular structures obtained by crystallography or NMR. It covers the features and functionalities of the PDB, including searching, navigating, and using the SCOP classification. The tutorial also explains the TOPITS method for predicting remote homologues based on sequence information and provides insights into its accuracy.
E N D
Introduction to Bioinformatics - Tutorial no. 8 3D Protein Structure
PDB • http://www.pdb.org • Database of molecular structures • Obtained by crystallography or NMR • Carefully curated and validated • Founded in 1971 • 26,051proteins • 2,597 other structures: • Carbohydrates • Nucleic Acids • Protein/Nucleic Acid Complexes • Additional protein information • Secondary structure • References, external links
PDB: Summary Information Molecule in PDB entry Chains in molecule Experimental method Link to SCOP
PDB: 3D Structure • Each PDB entry contains the 3D coordinates for all Protein’s atoms. • Still images at fixed orientation • Generate at any size • Interactive molecule explorer • Requires Java or Chime plug-in • Download structure file • Display in RasMol,Swiss-PDBViewer, etc…
PDB: Searching • By four-character PDB ID (for example 9ins) • Text-search: • Against all fields • Against author names • Words can be combined with boolean expressions • Example: protein kinase - will find only protein kinase protein and kinase – will find all structures containing the word protein and the word kinase • Wildcards (*) can be used • Example: h*moglobin – will find both the protein hemoglobin and the protein haemoglobin • Parts of words (uncheck the option “match exact word”) • Example: hemoglobin – will find also hemoglobinase
PDB: Searching (2) Text Search (Authors/ Full Text) PDB ID Chain types in the molecule Experimental Technique
PDB Searching (3) Then click here Click on any field you would like to search
PDB Searching (4) And then we get:
PDB Searching • Only the Text Search field can use boolean searches. • The different criteria from the different fields are automatically combined with AND. • Iterative Searching: you can narrow down the search by performing another search only on the results. • You can filter out the results manually before performing the next search.
SCOP • Structural Classification of Proteins • Based on known protein structures • Manually created by visual inspection • Hierarchical database structure • Class, fold, superfamily, family • Proteins/domains, species instances • Founded in 1995 • 800 folds, 1295 superfamilies, 2327 families
Path from root to node Children of node SCOP: Navigation Node description Node name
TOPITS • 20% of the proteins in SwissProt are remote homologues to a protein in PDB database, i.e. the structures are homologous but pairwise sequence identity is not significant. • Threading techniques attempt to predict such remote homologues based on sequence information to thus increase the scope of homology modelling. • Principle: • Remote homologues (0-25% sequence identity) are detected by a prediction-based threading method. The principle idea is to detect similar motifs of secondary structure and accessibility between a sequence of unknown structure and a known fold .
TOPITS • Strategy: • Project 3D structures onto 1D strings of secondary structure and relative solvent accessibility. • Predict secondary structure and solvent accessibility by neural network systems (PHD) for a query sequence. • Alignment of the predicted and observed 1D strings is done by dynamic programming. • The resulting alignment is used to detect remote 3D homologues.
TOPITS • Accuracy - results should be taken with caution: • The first hit of the prediction-based threading is on average in 30% of the cases correct. • Hits with z-scores above 3.0 are more reliable (accuracy > 60%). • For exceptional cases the resulting alignments suffice for building correct homology-based models.
TOPITS Output (1) Alignment score Alignment length % sequence identity Matched sequence Length of indels Number of indels Alignment significance Length of sequence
TOPITS Output (2) Predicted structure Query sequence Buried / Outside Amino acid matches Database sequence Database known secondary structure
GenTHREADER Output Prediction confidence Energy measurements Sequence alignment score and length Score from neural network Expected errors Length of sequence Structure from PDB