600 likes | 704 Views
Center for In Silico Protein Science Korea Research Institute of Standards and Science In-Ho Lee 10th Protein Folding Winter School February 7 – 11, 2011. Comparative Protein Modelling by Satisfaction of Spatial Restraints A. Šali and T. L. Blundell J. Mol. Biol. 234, 779-815 (1993).
E N D
Center for In Silico Protein Science Korea Research Institute of Standards and Science In-Ho Lee 10th Protein Folding Winter SchoolFebruary 7 – 11, 2011 Comparative Protein Modelling by Satisfaction of Spatial RestraintsA. Šali and T. L. Blundell J. Mol. Biol. 234, 779-815 (1993) 10th Protein Folding Winter School, Feb. 7-11, 2011
Constraints and Restraints Restraint implies use of ‘an energy function’ without absolute fixing of the desired quantity. Constraint implies absolutely fixed values. Constraint algorithm RATTLE, SHAKE, LINCS algorithm 10th Protein Folding Winter School, Feb. 7-11, 2011
Conceptual scheme Ca-Ca distances Main-chain N-O distances Main-chain dihedral angles Side-chain dihedral angles The model violates the input restraints as little as possible! 10th Protein Folding Winter School, February 7-11, 2011
A description of features... 10th Protein Folding Winter School, February 7-11, 2011 p. 785
Due to the ring formation connected to the beta-carbon, the ψ and φ angles about the peptide bond have less allowable degrees of rotation. As a result it is often found in "turns" of proteins as its free entropy (ΔS) is not as comparatively large to other amino acids and thus in a folded form vs. unfolded form, the change in entropy is less. 10th Protein Folding Winter School, February 7-11, 2011
Outline Ca-Ca distances Main-chain N-O distances Main-chain dihedral angles Side-chain dihedral angles A smoothing procedure is used in the derivation of these relationships to minimize the problem of a sparse database. pdf ‘entropy’ of pdf A combination of pdfs = molecular pdf Optimization of molecular pdf 10th Protein Folding Winter School, February 7-11, 2011
Stereochemical restraints Stereochemistry, a subdiscipline of chemistry, involves the study of the relative spatial arrangement of atoms within molecules. An important branch of stereochemistry is the study of chiral molecules. Bond lengths, bond angles, planarity of peptide groups and side-chain rings, chirality of Ca atoms and side-chains, van der Waals contact distances, bond lengths, bond angles and dihedral angles of cystine disulphide bridges. 10th Protein Folding Winter School, February 7-11, 2011
Mathematical model No equivalent residue type in structures No equivalent residue type in structures 10th Protein Folding Winter School, February 7-11, 2011
The method starts with a few restraints that involve only the atoms from residues at most ‘delta r’ residues apart and gradually incorporates all restraints. 2 residues apart in the sequence Short-range restraints intermediate-range restraints long-range restraints p. 804 10th Protein Folding Winter School, February 7-11, 2011 Variable target function method
Peptide bond NCC-NCC 10th Protein Folding Winter School, February 7-11, 2011
Main chain dihedral angles 10th Protein Folding Winter School, February 7-11, 2011
side chain dihedral angles Lysine with the carbon atoms in the side chain labeled 10th Protein Folding Winter School, February 7-11, 2011
Four structural classes of protein • All alpha (structure is essentially formed by -helices) • All beta (structure is essentially formed by -sheets) • Alpha / beta (with -helices and -strands that are largely interspersed) • Alpha + beta (with -helices and -strands are largely segregated) 10th Protein Folding Winter School, February 7-11, 2011
What is the most probable structure for a certain sequence given its alignment with related structures? Spatial restraints : from the statistical analysis of the relationships between various features of protein structure A restraint is defined by conditional probablity density functions (pdfs), p(x/a,b,...,c) for the feature x that is restrained. p(x)>=0 & ∫p(x) dx = 1 10th Protein Folding Winter School, February 7-11, 2011
An outline of the derivation of pdfs • W : observed relative frequencies for x given a,b,...,c • f : analytic function fitted to W. p(x/a,b,...c) gives a probability density for x when a,b,...,c are specified. p(χ1 / residue type, Φ, Ψ) could be used to predict χ1 It's not possible to obtain true p, but only its approximations: 10th Protein Folding Winter School, February 7-11, 2011
W' : obtained directly by counting the number of occurrences of each (x,a,b,...,c) values in the sample • q : by applyng the least-squares which minimizes r.m.s. 10th Protein Folding Winter School, February 7-11, 2011
Local Database Members of 17 families of related proteins One homologous structure per file is prepared. Multiple sequence alignments for each of the families are obtained by using COMPARER → added to the local database A number of features of structures were calculated and stored in the database. 10th Protein Folding Winter School, February 7-11, 2011
Program MDT • It was written to explore the local database and to derive the best pdfs. • Inputs : • Names of selected features • A list of discrete values for tabulating these features • The list of alignments • Then, frequency tables W’(x,a,b,...,c) is calculated by counting the occurences of all the required combinations of x,a,b,...,c in the local database. 10th Protein Folding Winter School, February 7-11, 2011
Composition of local database p. 784 • α, β, α+β, α/β • Seq id. : 6 %~98 % • The scales of frequencies... • A representative sample of globular proteins : suitable for uncovering the general relationships btw features. 10th Protein Folding Winter School, February 7-11, 2011
Tabulating associations btw protein features Features : associated with a single element or relationships btw two or more elements A distribution of residue types → sample consists of all amino acid residues in the local database. A distri. of protein-protein comparison score → all homologous protein pairs Ca-Ca distances → all intra-molecular res-res pairs in the local database Ca-Ca distances in one protein are correlated with Ca-Ca distances in another → all pairs of equivalent Ca-Ca distances in all homologous pairs. 10th Protein Folding Winter School, February 7-11, 2011
p. 784 MDT automatically constructs the correct type of the sample from the nature of the features to be tabulated. 10th Protein Folding Winter School, February 7-11, 2011
A description of protein features used in MDT (1/5) • Amino acid residue type (r) • 20 types, and Asx → Asn, Glx → Gln. All residues other than these 22 are ignored. • Main-chain dihedral angles (Φ and Ψ) • Secondary structure class of a residue (t): +Φ, helical, extended, coil 10th Protein Folding Winter School, February 7-11, 2011
A description of features...(2/5) • Main-chain conformation class of a residue (M) • Side-chain dihedral angles (χ1,χ2,χ3,χ4) • Classes of χi-angles (ci) 10th Protein Folding Winter School, February 7-11, 2011
A description of features...(3/5) • Residue solvent accessibility (a) • The contact area of a residue / the standard contact area of the residue in the extended tripetide Gly-X-Gly • Difference btw two equivalent residue neighborhoods in two proteins (s) • Find all neighbors of protein A → equivalent residues in B from the alignment → The sum of res.-res. dissimilarity scores for these pairs + gap penalty(usually 2) / the number of considered residues 10th Protein Folding Winter School, February 7-11, 2011
A description of features...(4/5) 10th Protein Folding Winter School, February 7-11, 2011 • Average residue neighbourhood difference between two proteins • Fractional sequence identity btw two proteins • The number of identical pairs / the length of the shorter protein sequence (i) • Difference in two Ca-Ca (main-chain N-O) distances in two proteins (Δd,Δh) • Distance of a residue from a gap (g) • For each residue in the pair-alignment, the number of positions (r <-> the closest gap) • Structure varies more when closer to a gap
A description of features...(5/5) 10th Protein Folding Winter School, February 7-11, 2011 p. 785
Ilustration of smoothing effects • A distribution of χ1 angles of 11 Cys residues from 8 proteins • After smoothing with sigma=5 • A distri. of those from 80 proteins • RHS : Serine residues 10th Protein Folding Winter School, February 7-11, 2011
Calculation of a probability distribtion from a sparse data set • A: a priori distribution → make the measured event slightly more likely, iteratively. • Cases : N is large, N is small • Robust, and unbiased • If sigma = the average number of data points per bin, w1=w2=0.5 • N : # of points in W' • nx : # of bins i=1...nx • sigma : a parameter 10th Protein Folding Winter School, February 7-11, 2011
Smoothing by Example • A(χi / rj) : a uniform one? • The distribution of χi angles irrespective of the residue type is better • Now p(χi) from W'? → smoothing again • A(χi)=1/nx, Note that ∫A(x)d(bins) = 1 p(χi / rj) : χi → 3 values, rj → 20 values 10th Protein Folding Winter School, February 7-11, 2011
Rigorous definition for smoothing • n=1, A1 (xi)=1/nx, a uniform • p1 (xi)=w1A1 (xi) + w2W1(xi) • Smoothing W' spanned by one dependent and N-1 indendent variables • Recursive pN from pN-1, pN-2 10th Protein Folding Winter School, February 7-11, 2011
Smoothing • c runs over N-1Cn-2 combinations of yj1,yk2,yln-1 values Now we need A2 . 10th Protein Folding Winter School, February 7-11, 2011
No information about x : a uniform distribution 10th Protein Folding Winter School, February 7-11, 2011 Paricular pn-1 provides no (many) information on x → small (big) weight A convenient measure for the amount of information is the entropy of pn-1 defined as
Strength of associations among features • The Significance of an association • Significant if it is based on a large amount of data • Yet still can be weak if the values of the independent features do not provide strong restraints on the dependent feature. • Significance : measured by χ2 test • Strength : measured by the entropy of the conditional pdf • A prediction is precise if diffs.btw.predictions are small, yet it may still be inaccurate 10th Protein Folding Winter School, February 7-11, 2011
The most useful pdf for modeling is that which predict the unknown feature most accurately. Provided that pdfs are from representative dataset, the most precise pdf is the most accurate pdf. Therefore, the most accurate pdf is the pdf with the sharpest shape. A quantitive measure of sharpness is entropy of pdf 10th Protein Folding Winter School, February 7-11, 2011
To find the best known features (a,b,...,c) for prediction of unknown x, • We search the features that minimize entropy S • The uncertanty coefficient of x: • 0 : x is not associated with (a,b,...,c) • 1 : (a,b,...,c) completely determines x 10th Protein Folding Winter School, February 7-11, 2011
(e) Stereochemical restraints include bond lengths, bond angles, planarity of Ca atoms and side-chains, vdW contact distances, bond lengths, angles, dihdral angles of Cys disulphide bridges. Mean and standard deviations for lengths and angles from GROMOS86 IFP37C4 which comes from X-ray, spectroscopic study, and theoretical calculations. vdW radii from GROMOS 10th Protein Folding Winter School, February 7-11, 2011
vdW : the only sterochemical feature not described by the harmonic model : • sigmaω= 0.05 Å, d0=sum of vdW radii, • c for being ∫p = 1 • dmax= linear dimension Repulsion for d < d0 From E and statistical mechanics, we get 10th Protein Folding Winter School, February 7-11, 2011
Disulphide bonds • Disulphide bonded pairs have to be specified. • Then the geometry is restrained by the mean and standard deviations of Gaussian pdfs for distances and angles from GROMOS IFP37C4 • The pdf for the Cβ-S-S-Cβ dihedral angle from database is used. • Bimodal with peaks at -87.1° and 93.9 ° with standard deviation of 10° 10th Protein Folding Winter School, February 7-11, 2011
(f) Restraining a distance btw two Ca atoms • Unknown feature : d-d' • d’ : from known templates, d : from target • MDT finds the distri. of d-d' as a function of 4 independent variables • The corresponding Ca-Ca dist in known struct. (d') • The fractional seq. id. of the two aligned seq. (i) • Average solvent accessibility of the two residues in the known structure (a_bar') • The average distance from a gap (g_bar) 10th Protein Folding Winter School, February 7-11, 2011
Strength of associations p. 791 Large decrease in S when p(d) → p(d-d') Small but significant decrease cases : i,g,a,d' 10th Protein Folding Winter School, February 7-11, 2011
p. 792 10th Protein Folding Winter School, February 7-11, 2011 Unkown d is closer to the d' when the dist. btw two residues spanning d' is short, the residues are buired, and they are distant from the gaps
Suggest p be approximated by a Gaussian with a mean of 0, and a standard deviation dependent on the values of independent variables. 10th Protein Folding Winter School, February 7-11, 2011
Resulting parameters • It provides an automatic and convenient way of including the observation mentioned above. • Small distance, buried, short distance from a gap, etc. 10th Protein Folding Winter School, February 7-11, 2011
(h) Restraining residue main-chain conformation p. 793 • W'(fixed Φ, Ψ) : multi-Gaussian • If we know p(A),p(P),...,p(E), p(Φ) : weighted sum of six Gaussians. sigma : Table 4 10th Protein Folding Winter School, February 7-11, 2011
A small sample of the total 7249 pdfs is shown. p(M/M',r,s) was selected. 10th Protein Folding Winter School, February 7-11, 2011
Features used in the derivation of the distri. for main-chain, side-chain conformations 10th Protein Folding Winter School, February 7-11, 2011