Introduction to Bioinformatics

Introduction to Bioinformatics - Tutorial no. 8 3D Protein Structure

PDB • http://www.pdb.org • Database of molecular structures • Obtained by crystallography or NMR • Carefully curated and validated • Founded in 1971 • 26,051proteins • 2,597 other structures: • Carbohydrates • Nucleic Acids • Protein/Nucleic Acid Complexes • Additional protein information • Secondary structure • References, external links

PDB: Summary Information Molecule in PDB entry Chains in molecule Experimental method Link to SCOP

PDB: 3D Structure • Each PDB entry contains the 3D coordinates for all Protein’s atoms. • Still images at fixed orientation • Generate at any size • Interactive molecule explorer • Requires Java or Chime plug-in • Download structure file • Display in RasMol,Swiss-PDBViewer, etc…

PDB: Searching • By four-character PDB ID (for example 9ins) • Text-search: • Against all fields • Against author names • Words can be combined with boolean expressions • Example: protein kinase - will find only protein kinase protein and kinase – will find all structures containing the word protein and the word kinase • Wildcards (*) can be used • Example: h*moglobin – will find both the protein hemoglobin and the protein haemoglobin • Parts of words (uncheck the option “match exact word”) • Example: hemoglobin – will find also hemoglobinase

PDB: Searching (2) Text Search (Authors/ Full Text) PDB ID Chain types in the molecule Experimental Technique

PDB Searching (3) Then click here Click on any field you would like to search

PDB Searching (4) And then we get:

PDB Searching • Only the Text Search field can use boolean searches. • The different criteria from the different fields are automatically combined with AND. • Iterative Searching: you can narrow down the search by performing another search only on the results. • You can filter out the results manually before performing the next search.

SCOP • Structural Classification of Proteins • Based on known protein structures • Manually created by visual inspection • Hierarchical database structure • Class, fold, superfamily, family • Proteins/domains, species instances • Founded in 1995 • 800 folds, 1295 superfamilies, 2327 families

Path from root to node Children of node SCOP: Navigation Node description Node name

TOPITS • 20% of the proteins in SwissProt are remote homologues to a protein in PDB database, i.e. the structures are homologous but pairwise sequence identity is not significant. • Threading techniques attempt to predict such remote homologues based on sequence information to thus increase the scope of homology modelling. • Principle: • Remote homologues (0-25% sequence identity) are detected by a prediction-based threading method. The principle idea is to detect similar motifs of secondary structure and accessibility between a sequence of unknown structure and a known fold .

TOPITS • Strategy: • Project 3D structures onto 1D strings of secondary structure and relative solvent accessibility. • Predict secondary structure and solvent accessibility by neural network systems (PHD) for a query sequence. • Alignment of the predicted and observed 1D strings is done by dynamic programming. • The resulting alignment is used to detect remote 3D homologues.

TOPITS • Accuracy - results should be taken with caution: • The first hit of the prediction-based threading is on average in 30% of the cases correct. • Hits with z-scores above 3.0 are more reliable (accuracy > 60%). • For exceptional cases the resulting alignments suffice for building correct homology-based models.

TOPITS Output (1) Alignment score Alignment length % sequence identity Matched sequence Length of indels Number of indels Alignment significance Length of sequence

TOPITS Output (2) Predicted structure Query sequence Buried / Outside Amino acid matches Database sequence Database known secondary structure

GenTHREADER Output Prediction confidence Energy measurements Sequence alignment score and length Score from neural network Expected errors Length of sequence Structure from PDB

Introduction to Bioinformatics - Tutorial no. 8