Construyendo modelos 3D de proteinas ‘fold recognition / threading’

Construyendo modelos 3D de proteinas ‘fold recognition / threading’

Why make a structural model for your protein ? The structure can provide clues to the function through structural similarity with other proteins ‰ÙÚÈÏÈÌ·ÁÏ·ÂÔWith With a structure it is easier to guess the location of active sites With a structure we can plan more precise experiments in the lab We can apply docking algorithms to the structures (both with other proteins and with small molecules)

Protein Modeling Methods • Ab initio methods: solution of a protein folding problem search in conformational space • Energy-based methods: energy minimization molecular simulation • Knowledge-based methods: homology modeling fold recognition / threading

Why do we need Ab Initio Methods?data taken from PDBhttp://www.rcsb.org/pdb/holdings.html New folds and those sequences with very little sequence homology <15%

Protein Modeling Methods • Ab initio methods: solution of a protein folding problem search in conformational space • Energy-based methods: energy minimization molecular simulation • Knowledge-based methods: homology modeling fold recogniion

Predicting Protein Structure: Threading / Fold Recognition Basis * It is estimated there are only around 1000 to 10 000 stable folds in nature * Select the best sequence-fold alignment using a fitness scoring function * Fold recognition is essentially finding the best fit of a sequence to a set of candidate folds

The Threading Problem • Find the best way to “mount” the residue sequence of one protein on a known structure taken from another protein

Why is it called threading ? • threading a specific sequence through all known folds • for each fold estimate the probability that the sequence can have that fold

Threading: Basic Strategy Library of folds Scoring & selection Spatial Interactions Template Sequence dhgakdflsdfjaslfkjsdlfjsdfjasd Query

Protein Threading • Conserved Core Segments J K I L Protein A Conserved Core Segments Protein B

Two structurally similar proteins Spatial adjacencies (interactions) Possible threading with a sequence

Input/Output of Protein Threading T H R E A D I N G Core segments C[1..m] Amino acid sequence a[1..n] Pairwise amino acid scoring function g(…)

M A A G Y A V L S Fold recognition (Threading) The sequence: + Known protein folds structural model

Input: sequence H bond donor H bond acceptor Glycin Hydrophobic Library of folds of known proteins

H bond donor H bond acceptor Glycin Hydrophobic S=-2 S=5 S=20 Z=5 Z= -1 Z=1.5

A C D … Y Gop Gext 1 10 -50 101 -80 100 10 2 -24 87 -99 167 100 10 : : : : : : : : N 100 10 Amino acid type Position on sequence

Fold recognition/ Threading Disadvantages: • threading methods seldom lead to the alignment quality that is needed for homology modeling. • less than 30% of the predicted first hits are true remote homologues (PredictProtein).

Threading resources • TOPITS Heuristic Threader, part of larger structure prediction system • 3DPSSM Integrated system, does its own MSA and secondary structure predictions and then threading • GenThreaderSimilar to 3DPSSM

Side chain construction In homology modelling, construction of the side chains is done using the template structures when there is high similarity between the built protein and the templates Without such similarity the construction can be done using rotamer libraries A compromise between the probability of the rotamer and its fitness in specific position determines the score. Comparing the scores of all the rotamer for a given amino acid determines the preferred rotamer. In spite of the huge size of the problem (because each side chain influences its neighbours) there are quite succesful algorithms to this problem.

M A A G Y A V L S Ab initio The sequence structural model

Ab initio methods for modelling This field is of great theoretical interest but, so far, of very little practical applications. Here there is no use of sequence alignments and no direct use of known structures The basic idea is to build empirical function that simulates real physical forces and potentials of chemical contacts If we will have perfect function and we will be able to scan all the possible conformations, then we will be able to detect the correct fold

Predicting Protein Structure: Ab Initio Methods Tertiary structure Sequence Prediction Secondary structure Low energy structures Predicted structure Validation Mean field potentials Energy Minimization

Ab initio Methods Simplified models simplified alphabet (HP) simplified representation (lattice) Build-up techniques Deterministic methods quantum mechanics diffusion equations Stochastic searches Monte Carlo genetic algorithms

Rosetta approach • Rosetta (David Baker) consistently outstanding performer in last two CASPs • Integrated method • I-Sites: much finer grained substructures than secondary structures. A library of all structures each AA 9mer is found in (taken from PDB) • Heuristic global energy function to estimate quality of folds • Monte Carlo search through assignments of I-Sites to minimize energy function. • Also, HMMSTR, HMM-driven method for assigning I-Sites.

Rosetta prediction method • Define global scoring function that estimates probability of a structure given a sequence • Generate version of I-sites with fixed length subsequences (9 amino acids) • Calculate P(I-Site|sequence) for all sequences and I-sites • Generate structures by Monte Carlo sampling of assignments of fixed size I-sites to subsequences • End up with ensemble of plausible structures

Rosetta is way ahead • CASP 4 results. • CASP 5 similar, but not as dramatic.

Fully automated predictions • CAFASP-2 • Meta-servers work best • Integrate predictions from several other servers • Significantly better predictions than any individual approach • Several public metaservers available: • http://bioinfo.pl/Meta/ is best all-around

Construyendo modelos 3D de proteinas ‘fold recognition / threading’