360 likes | 372 Views
Analyzing the Simplicial Decomposition of Spatial Protein Structures. Rafael Ördög, Zoltán Szabadka, Vince Grolmusz. Aims of our research. Aims Easy to use protein database containing relevant geometrical data on proteins. (Capable of treating thousands of PDB entries at once.)
E N D
Analyzing the Simplicial Decomposition of Spatial Protein Structures Rafael Ördög, Zoltán Szabadka, Vince Grolmusz
Aims of our research • Aims • Easy to use protein database containing relevant geometrical data on proteins. (Capable of treating thousands of PDB entries at once.) • Drug discovery by data mining in the database.
Steps of our research • Steps • Cleaning and restructuring the PDB (RS-PDB) • Done by Zoltan Szabadka • Creating a database of geometrical & chemo-geometrical data • Under construction in our present research • Discovering rules, and creating learning systems for ligand pre-docking. • Mostly later work
Delaunay Decompositions • To find the Delaunay decomposition of a set, we have used the qhull algorithm, its source is available at: http://www.qhull.org/.
Important properties of Delaunay decompositions • Regions are defined by circum spheres being empty (Region is empty as well) • Regions are tetrahedra except if more than 4 points are on the same sphere.
Important properties of Delaunay decompositions • Partition of the convex hull of A. • The graph defined by the edges of the Delaunay regions: Delaunay Graph • Can be used for searching closest neighbors
Delaunay decomposition of heavy atoms of the protein in 1n9c with the ligand
Important properties of Delaunay decompositions • The “dual” structure can solve the Post Office problem. • Partitioning the city into service areas of given post offices, so that every one belongs to the closest post office. • Duality here is only theoretical, in practice it is the same structure. (Voronoi diagram.)
Previous work • Singh, Tropsha and Vaisman • The point set was chosen to be the set of Cα atoms of the protein • Aim: predict secondary protein structure • In contrast: we chose the point set to be the set of all heavy atoms. (Non hydrogen atoms.)
Tetrahedrality: 0 for regular tetrahedra, and < 1 (Si<j(li-lj)2)(15 (Sili / 6)2) Volume and tetrahedrality
Frequency • Two dimensional temperature plots of the frequency of regions with given volume and tetrahedrality. • In all proteins (Our whole database) • In a given protein
Classifying by corner atoms • Question: are the different peaks in the earlier plots in connection with the function of the corner atoms? • Classification by the symbols of corner atoms • Classification by hetid of the residues the atom is found in. • Question: How frequent are different corner atom sets?
Frequency of metals in different types of tetrahedra • Ca appears almost exclusively in the vicinity of four Oxygen • Zn prefers NOSS and NNNO type of tetrahedra, but also frequent in CNOO NNOO NOOO • Only Zn was found in NOSS
About the geometric extension • Presently we cannot handle: • Missing atoms • Precision errors, non-tetrahedral regions • The PDB is handled as a juggled input • The resulting database can only be used for quality statistical purposes. • Strongly restricted database. • No missing atoms, 2.2 Ǻ resolution, includes protein • 5757 such PDB (June 23, 2006 ) • Our current research addresses the problems above.
Recent problems • For example aromatic rings should be on one circle, in one plane, hence on one sphere, but they refuse to be: • Distortion is minor, not recognizable by eye • Is it just measuring error? • Or is it due to the structure around the ring? • In contrast some atoms not expected to fall on one sphere tend to do so.
Structure of the geometric extension • Essential: • Corner • Reference to the atoms in the RS-PDB • Region • the radius and coordinates of the center of the circum sphere • volume and tetrahedrality of the tetrahedron • three type of bond graphs code • hetid, atom name, and symbol set assigned to the regions corner set and more • Additional: Edge, Neighbor, (Ligand) Atom