300 likes | 443 Views
MSD pisa a web service for studying P rotein I nterfaces, S urfaces and A ssemblies Eugene Krissinel. http://www.ebi.ac.uk/msd-srv/prot_int/pistart.html. What PISA is about. Crystal = translated Unit Cell.
E N D
MSDpisa a web service for studying Protein Interfaces, Surfaces and Assemblies Eugene Krissinel http://www.ebi.ac.uk/msd-srv/prot_int/pistart.html
What PISA is about Crystal = translated Unit Cell More than 80% of protein structures are solved by means of X-ray diffraction on crystals. An X-ray diffraction experiment produces atomic coordinates of the crystal’s Asymmetric Unit (ASU). In general, neither ASU nor Unit Cell has any relation to Biological Unit, or stable protein complex which acts as a unit in physiological processes. Is there a way to infer Biological Unit from the protein crystallography data? Unit Cell = all space symmetry group mates of ASU PDB file
in vivo in crystal crystallisation 1 2 3 ? ? ? no image or bad image good image but no associations In (very) simple words …
At first glance … • … the solution is simple as 1-2: • Evaluate all protein contacts (interfaces) in crystal • Leave only the strongest (“biologically relevant”) ones • - and what you get will have chances to be a stable protein complex. Small technical problem: How to discriminate between “real” (biologically relevant) and “superficial” (inter-assembly, or crystal packing) interfaces?
Real and superficial protein interfaces Most often used discrimination criteria - interface area. A cut-off at 900 Å2 gives about 80% success rate of discrimination between monomers and dimers. Big proteins would be always sticky if this criteria is true …
Real and superficial protein interfaces Free energy gain of interface formation. A cut-off at -8 kcal/M gives about 82% success rate of discrimination between monomers and dimers. Can energy measure be uniform for all weights and shapes?
Real and superficial protein interfaces P-value of hydrophobic patches. A measure of probability for the interface to be more hydrophobic than found. A cut-off at 0.2 gives about 60% success rate of discrimination between monomers and dimers.
Real and superficial protein interfaces packing edge interface Packing edge factor. A measure showing how closely the mass packing edge matches the actual interface. A cut-off at 0.3 gives about 60% success rate of discrimination between monomers and dimers
Real and superficial protein interfaces • No ultimate discriminating parameter for the identification of biologically relevant protein interfaces may be proposed at present even for dimeric complexes Jones, S. & Thornton, J.M. (1996) Principles of protein-protein interactions, Proc. Natl. Acad. Sci. USA, 93, 13-20. • Formation of N>2 -meric complexes is most probably a corporate process involving a set of interfaces. Therefore significance of an interface should not be detached from the context of protein complex
Making assemblies from significant interfaces Despite failure to find an ultimate measure for interface biological relevance, two approaches were developed that use scoring of individual interfaces: • PQS server @ MSD-EBI (Kim Henrick) Trends in Biochem. Sci. (1998) 23, 358 Method: progressive build-up by addition of monomeric chains that suit the selection criteria. The results are partly curated. • PITA software @ Thornton group EBI (Hannes Ponstingl) J. Appl. Cryst. (2003) 36, 1116 Method: recursive splitting of the largest complexes as allowed by crystal symmetry. Termination criteria is derived from the individual statistical scores of crystal contacts. The results are not curated.
Chemical stability of protein complexes • It is not properties of individual interfaces but rather chemical stability of protein complex in general that really matters • Protein chains will most likely associate into largest complexes that are still stable • A protein complex is stable if its free energy of dissociation is positive: How to calculate Gdiss?
Solvation energies of dissociated subunits Free energy of H-bond formation Free energy of salt bridge formation Solvation energy of protein complex Number of H-bonds between dissociated subunits Number of salt bridges between dissociated subunits Choice of dissociation subunits: Dissociation into stable subunits with minimum Protein affinity DGintis function of protein interfaces
Solvation free energy Atom’s accessible surface area Atomic solvation parameters Atom’s accessible surface area in reference (unfolded) state Eisenberg, D. & McLachlan, A.D. (1986) Nature 319, 199-203. solvent protein
Entropy of macromolecules in solutions Translational entropy Rotational entropy Sidechain entropy Solvent-accessible surface area Mass Tensor of inertia Symmetry number Murray C.W. and Verdonik M.L. (2002) J. Comput.-Aided Mol. Design 16, 741-753. ct, cr and F are semiempirical parameters
Entropy of dissociation Mass of i-th subunit k-th principal moment of inertia of i-th subunit Fitted parameter Fitted parameter DSis function of protein complex
How to identify an assembly in crystal? We now know (or we think that we know) how to evaluate chemical stability of protein complexes. Given a 3D-arrangement of protein chains, we can now say whether there are chances that this arrangement is a stable assembly, or biological unit. But how to get potential assemblies in first place?
Catch all lions and keep One living in Desert Method of Desert Lion How to catch a Desert Lion?
Assembly set Assembly set Engaged interface types Engaged interface types 6 8 5 1 7 2 3 4 111 110 000 001 101 010 011 100 - dimer N2 - all crystal - dimer N1 - only monomers - dimer N3 Enumerating assemblies in crystal • crystal is represented as a periodic graph with monomeric chains as vertices and interfaces as edges • each set of assemblies is identified by engaged interface types • all assemblies may be enumerated by a backtracking scheme engaging all possible combinations of different interface types Example: crystal with 3 interface types
Engaged interfaces Induced interface Clever backtracking The number of different interface types may reach a hundred. The algorithm is not going to complete backtracking of 2100 combinations unless it is clever enough to • check geometry and engage induced interfaces as soon as they emerge • check geometry and terminate backtracking if assembly contains two identical chains in parallel orientations • see the future and terminate backtracking if there are no stable assemblies down the current branch of the recursion tree Otherwise assembly will be infinite due to translation symmetry in crystal Based on the observation that entropy of dissociation of unstable assemblies only increases down the recursion tree … only then the algorithm completes in 0.1 secs to 1.5 hours depending on the structure …
Detection of Biological Units in Crystals Method Summary • Build periodic graph of the crystal • Enumerate all possibly stable assemblies • Evaluate assemblies for chemical stability • Leave only sets of stable assemblies in the list and range them by chances to be a biological unit : • Larger assemblies take preference • Single-assembly solutions take preference • Otherwise, assemblies with higher Gdiss take preference
198+20 <=> 198 homomers and 20 heteromers Are we any close? Assembly classification on the benchmark set of 218 structures published in Ponstingl, H., Kabir, T. and Thornton, J. (2003) Automatic inference of protein quaternary structures from crystals. J. Appl. Cryst. 36, 1116-1122. Fitted parameters: Classification error in Gdiss : ± 5 kcal/mol • Free energy of a H-bond : • Free energy of a salt bridge : • Constant entropy term : • Surface entropy factor : = 0.51 kcal/mol = 0.21 kcal/mol = 11.7 kcal/mol = 0.57·10-3 kcal/(mol*Å2)
A better method? Percent of successful classifications, as measured on the same benchmark set of 218 PDB entries: • PQS server :78%(not optimised on the benchmark set, but manually curated) • PITA software :84%(optimised with 18 parameters, system overfit(?)) • Present study :90%(optimised with 4 parameters, system underfit)
What is beyond the benchmark set? Classification results obtained for 366 recent depositions into PDB in reference to manual classification in MSD-EBI : 321+45 <=> 321 homomers and 45 heteromers Classification error in Gdiss : ± 5 kcal/mol
Is it ever going to be 100%? Nobody should be that naive, because : • theoretical models for protein affinity and entropy change upon protein complexation are primitive • coordinate (experimental) data is of a limited accuracy • there is no feasible way to take conformations in crystal into account • experimental data on multimeric states is very limited and not always reliable - calibration of parameters is difficult • protein assemblies may exist in some environments and dissociate in other - a definite answer is simply not there
Web-server PISA A new MSD-EBI tool for working around Protein Interfaces, Surfaces and Assemblies http://www.ebi.ac.uk/msd-srv/prot_int/pistart.html
Conclusions • Stable protein complexes, which are likely to be biological units, may be calculated from protein crystallography data at 80-90% success rate • Biological relevance of a particular protein interface cannot be reliably inferred from the interface properties only. Instead, one should conclude about significance of an interface from the analysis of the relevant protein assemblies Acknowledgement. This work has been supported by research grant No. 721/B19544 from the Biotechnology and Biological Sciences Research Council (BBSRC) UK.