1 / 60

Center for In Silico Protein Science Korea Research Institute of Standards and Science In-Ho Lee

Center for In Silico Protein Science Korea Research Institute of Standards and Science In-Ho Lee 10th Protein Folding Winter School February 7 – 11, 2011. Comparative Protein Modelling by Satisfaction of Spatial Restraints A. Šali and T. L. Blundell J. Mol. Biol. 234, 779-815 (1993).

Download Presentation

Center for In Silico Protein Science Korea Research Institute of Standards and Science In-Ho Lee

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Center for In Silico Protein Science Korea Research Institute of Standards and Science In-Ho Lee 10th Protein Folding Winter SchoolFebruary 7 – 11, 2011 Comparative Protein Modelling by Satisfaction of Spatial RestraintsA. Šali and T. L. Blundell J. Mol. Biol. 234, 779-815 (1993) 10th Protein Folding Winter School, Feb. 7-11, 2011

  2. Constraints and Restraints Restraint implies use of ‘an energy function’ without absolute fixing of the desired quantity. Constraint implies absolutely fixed values. Constraint algorithm RATTLE, SHAKE, LINCS algorithm 10th Protein Folding Winter School, Feb. 7-11, 2011

  3. Conceptual scheme Ca-Ca distances Main-chain N-O distances Main-chain dihedral angles Side-chain dihedral angles The model violates the input restraints as little as possible! 10th Protein Folding Winter School, February 7-11, 2011

  4. A description of features... 10th Protein Folding Winter School, February 7-11, 2011 p. 785

  5. Due to the ring formation connected to the beta-carbon, the ψ and φ angles about the peptide bond have less allowable degrees of rotation. As a result it is often found in "turns" of proteins as its free entropy (ΔS) is not as comparatively large to other amino acids and thus in a folded form vs. unfolded form, the change in entropy is less. 10th Protein Folding Winter School, February 7-11, 2011

  6. Outline Ca-Ca distances Main-chain N-O distances Main-chain dihedral angles Side-chain dihedral angles A smoothing procedure is used in the derivation of these relationships to minimize the problem of a sparse database. pdf  ‘entropy’ of pdf A combination of pdfs = molecular pdf Optimization of molecular pdf 10th Protein Folding Winter School, February 7-11, 2011

  7. Stereochemical restraints Stereochemistry, a subdiscipline of chemistry, involves the study of the relative spatial arrangement of atoms within molecules. An important branch of stereochemistry is the study of chiral molecules. Bond lengths, bond angles, planarity of peptide groups and side-chain rings, chirality of Ca atoms and side-chains, van der Waals contact distances, bond lengths, bond angles and dihedral angles of cystine disulphide bridges. 10th Protein Folding Winter School, February 7-11, 2011

  8. Mathematical model No equivalent residue type in structures No equivalent residue type in structures 10th Protein Folding Winter School, February 7-11, 2011

  9. 10th Protein Folding Winter School, February 7-11, 2011

  10. The method starts with a few restraints that involve only the atoms from residues at most ‘delta r’ residues apart and gradually incorporates all restraints. 2 residues apart in the sequence Short-range restraints  intermediate-range restraints  long-range restraints p. 804 10th Protein Folding Winter School, February 7-11, 2011 Variable target function method

  11. 10th Protein Folding Winter School, February 7-11, 2011

  12. Peptide bond NCC-NCC 10th Protein Folding Winter School, February 7-11, 2011

  13. Main chain dihedral angles 10th Protein Folding Winter School, February 7-11, 2011

  14. side chain dihedral angles Lysine with the carbon atoms in the side chain labeled 10th Protein Folding Winter School, February 7-11, 2011

  15. Four structural classes of protein • All alpha (structure is essentially formed by -helices) • All beta (structure is essentially formed by -sheets) • Alpha / beta (with -helices and -strands that are largely interspersed) • Alpha + beta (with -helices and -strands are largely segregated) 10th Protein Folding Winter School, February 7-11, 2011

  16. 10th Protein Folding Winter School, February 7-11, 2011

  17. What is the most probable structure for a certain sequence given its alignment with related structures? Spatial restraints : from the statistical analysis of the relationships between various features of protein structure A restraint is defined by conditional probablity density functions (pdfs), p(x/a,b,...,c) for the feature x that is restrained. p(x)>=0 & ∫p(x) dx = 1 10th Protein Folding Winter School, February 7-11, 2011

  18. An outline of the derivation of pdfs • W : observed relative frequencies for x given a,b,...,c • f : analytic function fitted to W. p(x/a,b,...c) gives a probability density for x when a,b,...,c are specified. p(χ1 / residue type, Φ, Ψ) could be used to predict χ1 It's not possible to obtain true p, but only its approximations: 10th Protein Folding Winter School, February 7-11, 2011

  19. W' : obtained directly by counting the number of occurrences of each (x,a,b,...,c) values in the sample • q : by applyng the least-squares which minimizes r.m.s. 10th Protein Folding Winter School, February 7-11, 2011

  20. Local Database Members of 17 families of related proteins One homologous structure per file is prepared. Multiple sequence alignments for each of the families are obtained by using COMPARER → added to the local database A number of features of structures were calculated and stored in the database. 10th Protein Folding Winter School, February 7-11, 2011

  21. Program MDT • It was written to explore the local database and to derive the best pdfs. • Inputs : • Names of selected features • A list of discrete values for tabulating these features • The list of alignments • Then, frequency tables W’(x,a,b,...,c) is calculated by counting the occurences of all the required combinations of x,a,b,...,c in the local database. 10th Protein Folding Winter School, February 7-11, 2011

  22. Composition of local database p. 784 • α, β, α+β, α/β • Seq id. : 6 %~98 % • The scales of frequencies... • A representative sample of globular proteins : suitable for uncovering the general relationships btw features. 10th Protein Folding Winter School, February 7-11, 2011

  23. Tabulating associations btw protein features Features : associated with a single element or relationships btw two or more elements A distribution of residue types → sample consists of all amino acid residues in the local database. A distri. of protein-protein comparison score → all homologous protein pairs Ca-Ca distances → all intra-molecular res-res pairs in the local database Ca-Ca distances in one protein are correlated with Ca-Ca distances in another → all pairs of equivalent Ca-Ca distances in all homologous pairs. 10th Protein Folding Winter School, February 7-11, 2011

  24. p. 784 MDT automatically constructs the correct type of the sample from the nature of the features to be tabulated. 10th Protein Folding Winter School, February 7-11, 2011

  25. A description of protein features used in MDT (1/5) • Amino acid residue type (r)‏ • 20 types, and Asx → Asn, Glx → Gln. All residues other than these 22 are ignored. • Main-chain dihedral angles (Φ and Ψ)‏ • Secondary structure class of a residue (t)‏: +Φ, helical, extended, coil 10th Protein Folding Winter School, February 7-11, 2011

  26. A description of features...(2/5) • Main-chain conformation class of a residue (M)‏ • Side-chain dihedral angles (χ1,χ2,χ3,χ4)‏ • Classes of χi-angles (ci)‏ 10th Protein Folding Winter School, February 7-11, 2011

  27. A description of features...(3/5) • Residue solvent accessibility (a)‏ • The contact area of a residue / the standard contact area of the residue in the extended tripetide Gly-X-Gly • Difference btw two equivalent residue neighborhoods in two proteins (s)‏ • Find all neighbors of protein A → equivalent residues in B from the alignment → The sum of res.-res. dissimilarity scores for these pairs + gap penalty(usually 2) / the number of considered residues 10th Protein Folding Winter School, February 7-11, 2011

  28. A description of features...(4/5) 10th Protein Folding Winter School, February 7-11, 2011 • Average residue neighbourhood difference between two proteins • Fractional sequence identity btw two proteins • The number of identical pairs / the length of the shorter protein sequence (i)‏ • Difference in two Ca-Ca (main-chain N-O) distances in two proteins (Δd,Δh)‏ • Distance of a residue from a gap (g)‏ • For each residue in the pair-alignment, the number of positions (r <-> the closest gap)‏ • Structure varies more when closer to a gap

  29. A description of features...(5/5) 10th Protein Folding Winter School, February 7-11, 2011 p. 785

  30. Ilustration of smoothing effects • A distribution of χ1 angles of 11 Cys residues from 8 proteins • After smoothing with sigma=5 • A distri. of those from 80 proteins • RHS : Serine residues 10th Protein Folding Winter School, February 7-11, 2011

  31. Calculation of a probability distribtion from a sparse data set • A: a priori distribution → make the measured event slightly more likely, iteratively. • Cases : N is large, N is small • Robust, and unbiased • If sigma = the average number of data points per bin, w1=w2=0.5 • N : # of points in W' • nx : # of bins i=1...nx • sigma : a parameter 10th Protein Folding Winter School, February 7-11, 2011

  32. Smoothing by Example • A(χi / rj) : a uniform one? • The distribution of χi angles irrespective of the residue type is better • Now p(χi) from W'? → smoothing again • A(χi)=1/nx, Note that ∫A(x)d(bins) = 1 p(χi / rj) : χi → 3 values, rj → 20 values 10th Protein Folding Winter School, February 7-11, 2011

  33. Rigorous definition for smoothing • n=1, A1 (xi)=1/nx, a uniform • p1 (xi)=w1A1 (xi) + w2W1(xi)‏ • Smoothing W' spanned by one dependent and N-1 indendent variables • Recursive pN from pN-1, pN-2 10th Protein Folding Winter School, February 7-11, 2011

  34. Smoothing • c runs over N-1Cn-2 combinations of yj1,yk2,yln-1 values Now we need A2 . 10th Protein Folding Winter School, February 7-11, 2011

  35. No information about x : a uniform distribution 10th Protein Folding Winter School, February 7-11, 2011 Paricular pn-1 provides no (many) information on x → small (big) weight A convenient measure for the amount of information is the entropy of pn-1 defined as

  36. Strength of associations among features • The Significance of an association • Significant if it is based on a large amount of data • Yet still can be weak if the values of the independent features do not provide strong restraints on the dependent feature. • Significance : measured by χ2 test • Strength : measured by the entropy of the conditional pdf • A prediction is precise if diffs.btw.predictions are small, yet it may still be inaccurate 10th Protein Folding Winter School, February 7-11, 2011

  37. The most useful pdf for modeling is that which predict the unknown feature most accurately. Provided that pdfs are from representative dataset, the most precise pdf is the most accurate pdf. Therefore, the most accurate pdf is the pdf with the sharpest shape. A quantitive measure of sharpness is entropy of pdf 10th Protein Folding Winter School, February 7-11, 2011

  38. To find the best known features (a,b,...,c) for prediction of unknown x, • We search the features that minimize entropy S • The uncertanty coefficient of x: • 0 : x is not associated with (a,b,...,c)‏ • 1 : (a,b,...,c) completely determines x 10th Protein Folding Winter School, February 7-11, 2011

  39. (e) Stereochemical restraints include bond lengths, bond angles, planarity of Ca atoms and side-chains, vdW contact distances, bond lengths, angles, dihdral angles of Cys disulphide bridges. Mean and standard deviations for lengths and angles from GROMOS86 IFP37C4 which comes from X-ray, spectroscopic study, and theoretical calculations. vdW radii from GROMOS‏ 10th Protein Folding Winter School, February 7-11, 2011

  40. vdW : the only sterochemical feature not described by the harmonic model : • sigmaω= 0.05 Å, d0=sum of vdW radii, • c for being ∫p = 1 • dmax= linear dimension Repulsion for d < d0 From E and statistical mechanics, we get 10th Protein Folding Winter School, February 7-11, 2011

  41. Disulphide bonds • Disulphide bonded pairs have to be specified. • Then the geometry is restrained by the mean and standard deviations of Gaussian pdfs for distances and angles from GROMOS IFP37C4 • The pdf for the Cβ-S-S-Cβ dihedral angle from database is used. • Bimodal with peaks at -87.1° and 93.9 ° with standard deviation of 10° 10th Protein Folding Winter School, February 7-11, 2011

  42. (f) Restraining a distance btw two Ca atoms • Unknown feature : d-d' • d’ : from known templates, d : from target • MDT finds the distri. of d-d' as a function of 4 independent variables • The corresponding Ca-Ca dist in known struct. (d')‏ • The fractional seq. id. of the two aligned seq. (i)‏ • Average solvent accessibility of the two residues in the known structure (a_bar')‏ • The average distance from a gap (g_bar)‏ 10th Protein Folding Winter School, February 7-11, 2011

  43. 10th Protein Folding Winter School, February 7-11, 2011

  44. Strength of associations p. 791 Large decrease in S when p(d) → p(d-d')‏ Small but significant decrease cases : i,g,a,d' 10th Protein Folding Winter School, February 7-11, 2011

  45. p. 792 10th Protein Folding Winter School, February 7-11, 2011 Unkown d is closer to the d' when the dist. btw two residues spanning d' is short, the residues are buired, and they are distant from the gaps

  46. Suggest p be approximated by a Gaussian with a mean of 0, and a standard deviation dependent on the values of independent variables. 10th Protein Folding Winter School, February 7-11, 2011

  47. Resulting parameters • It provides an automatic and convenient way of including the observation mentioned above. • Small distance, buried, short distance from a gap, etc. 10th Protein Folding Winter School, February 7-11, 2011

  48. (h) Restraining residue main-chain conformation p. 793 • W'(fixed Φ, Ψ) : multi-Gaussian • If we know p(A),p(P),...,p(E), p(Φ) : weighted sum of six Gaussians. sigma : Table 4 10th Protein Folding Winter School, February 7-11, 2011

  49. A small sample of the total 7249 pdfs is shown. p(M/M',r,s) was selected. 10th Protein Folding Winter School, February 7-11, 2011

  50. Features used in the derivation of the distri. for main-chain, side-chain conformations 10th Protein Folding Winter School, February 7-11, 2011

More Related