280 likes | 783 Views
An Introduction to Protein Fold Recognition. Protein Fold Recognition and Threading Algorithms. Folding vs. Prediction. Folding Determining the way in which a polypeptide really fold (biophysics approach) Prediction
E N D
An Introduction to Protein Fold Recognition Protein Fold Recognition and Threading Algorithms
Folding vs. Prediction Folding • Determining the way in which a polypeptide really fold (biophysics approach) Prediction • Determining secondary, tertiary, or quaternary structure given a polypeptide sequence (computational approach)
MPGAVEG.….GGTGTDS Primary Structure Tertiary Structure Prediction Task
Assumptions in Prediction Anfinsen’s Dogma • Tertiary structure is determined by amino acid interactions and the surrounding medium (i.e. polar effects from water and ions) • a.k.a. Thermodynamic Hypothesis
Types of Prediction Algorithms 1. Ab initio (thermodynamic optimization) 2. Homology Modelling (sequence-similarity) 3. Fold Recognition(sequence-structure similarity)
Definition of Fold Recognition • Given • A database of known 3D structures mapped to a concise format (templates, folds) • A primary sequence of unknown tertiary structure • Find • The database structure with the best global sequence-structural alignment (threading)
Related Definitions • Inverse Folding • Given a 3D structure find all primary sequences which are likely to fold to it. • Threading • Align one polypeptide to one structure optimally (global optimal sequence-structural alignment) • The core of many fold recognition systems
Motivation for Fold Recognition • Proteins have about 1000 structural families (est.) • Over 16,000 primary sequences in PDB alone. • Fold recognition produces a first-order approximation of structure.
Fold Recognition Algorithm Mapping 3D structures to 1D profile • We have a set of 3D structures • How do we represent them as a 1D string (for threading) ? • Map each residue to an environmental class
Fold Recognition Algorithm Mapping 3D structures to 1D profile • Environmental classes • {B1, B2, B3, P1, P2, E} x {, , } • Assign each residue an environmental class by • Area of side chain buried by protein atoms • Area of side chain exposed to polar atoms • Local secondary structure (in -helix, -sheet ?)
Fold Recognition Algorithm Build 3D Structure Profile Matrix
Fold Recognition Algorithm • Compatibility Search • Align unknown polypeptide sequence (probe sequence) to each 1D profile • Thread probe onto each profile • Scoring function is the profile matrix • Use dynamic programming
Alternative Scoring Functions • Learn as a neural network • Input: Arguments to scoring function • Output: score • Knowledge Based Potentials (Sippl) • include intraresidue affects
PROSPECT • Variant on earlier fold recognition alg’m • Divides each template in database into core and loop regions • Optimizes using energy function:
PROSPECT • Divide and Conquer • Subdivide templates s.t. each core has its own region • To merge cores A and B • Find subsequences a and b in probe and do ungapped alignment to A and B • Align loop between A and B to subsequence between a and b • Total alignment is the sum of the above scores
PROSPECT • Optimal (!) • Reasonably fast • Can include domain knowledge in energy function • Does well with remote homologs
Alternative Threaders • Branch and Bound • Double Dynamic Programming • Monte Carlo • Heuristic Search
Results • More accurate and faster than ab initio • Can detect relations homology search cannot • Structure is preserved more than sequence • Doolittle’s “twilight zone” • PROSPECT does well even in “twilight zone”