420 likes | 503 Views
Substructures and Patterns in 2-D Chemical Space. Danail Bonchev Department of Mathematics and Applied Mathematics and Center for the Study of Biological Complexity Virginia Commonwealth University.
E N D
Substructures and Patterns in 2-D Chemical Space Danail Bonchev Department of Mathematics and Applied Mathematics and Center for the Study of Biological Complexity Virginia Commonwealth University Workshop CCSWS2: Optimization, Search and Graph-Theoretical Algorithms for Chemical Compound Space, IPAM, UCLA, 11-15 April, 2011
Hi - the number of subgraphs of k nodes Molecular Properties and Graph Theory • H. Wiener JACS 69(1947)17; JPC 52 (1948) 1082 • – empirical equations – “path number” • H. Hosoya Bull Chem Soc Japan 44 (1971) 2332 • – reformulation in graph theory terms • E. Smolenski Zh. Fiz. Khim. 38 (1964) 700 M. Gordon and J. W. Kennedy, J Chem Soc Faraday Trans II 69 (1973) 484. TI i– topological invariant
Molecular Connectivity Concept Randic, 1975: Kier and Hall, 1976 where SG = p (path), c (cluster), pc (path-cluster), etc. path cluster path-cluster
Definition: - Valence connectivity for the k-th atom in the molecular graph Kier and Hall Valence Connectivity Indices Zk- the total number of electrons in thek-th atom - the number of valence electrons in thek-th atom Hk- the number of hydrogen atoms directly attached to thekth non-hydrogen atom m= 0 - atomic valence connectivity indices m= 1 - one bond path valence connectivity indices m= 2 - two bond fragment valence connectivity indices m= 3 three contiguous bond fragment valence connectivity indices etc. B. Kier, L. H. Hall,Eur. J. Med. Chem.,1977, 12,307. • The success of molecular connectivity indices • Why they work so well?
From Molecular Connectivity to Overall Topological Indices • 1986, Bertz & Herndon – the idea for using the total subgraph count as a similarity measure • Bertz, S.; Herndon, W. C. In Artificial Intelligence Applications in Chemistry; ACS: Washington, D.C.,1986,pp.169-175. • 1995-1997, Bonchev/Bertz – a subgraph count-based measure of structural complexity • 1995-2005, Bonchev – overall topological indices D. Bonchev, Bulg. Chem. Commun. 28, 567-582(1995). D. Bonchev, SAR QSAR Environ. Res. 7, 23-43(1997). Bertz, S. H. and Sommer, T. J. Chem. Commun. 2409-2410(1997). S. H. Bertz and W. F. Wright, Graph Theory Notes New York Acad. Sci. 32-48 (1998). D. Bonchev, In: Topological Indices andRelated Descriptors, J. Devillers and A.T. Balaban, Eds., Gordon and Breach, Reading, U.K., 1999, p. 361-401. D. Bonchev, J. Chem. Inf. Comput. Sci. 40, 934-941(2000). D. Bonchev, J. Mol. Graphics Model., 5271 (2001) 1-11. D. Bonchev, N.Trinajstić, SAR QSAR Environ. Res. 12 (2001) 213-235. D. Bonchev,J. Chem. Inf. Comput. Sci., 41(2001) 582-592. D. Bonchev, Lect. Ser. Computer and Computational Sciences, 4, 1554-1557 (2005).
The Subgraph Count N = 5, E = 4 e=0 0SC=5 1SC=4 e=1 2SC=4 e=2 3SC=3 e=3 4SC=1 e=4 SC = 17 (5, 4, 4, 3, 1)
From Subgraph Count To Overall Topological Indices • The idea:Weight all subgraphs with graph-invariant values and sum-up to characterize the structure as a whole. Sum-up weighted subgraphs having the same number of edges to capture different levels of graph complexity. • Motivation:The more complete the molecular structure representation, the better it captures the patterns of structural complexity, the more distinctive the topological descriptor, the more accurate the structure -property relationship.
Definition 1: The Overall Topological Index OTI(G) of any graph Gis defined as the sum of the topological index values TIi(Gi)of all Ksubgraphs Gi of G: e K å = Î e e OTI ( G ) OTI ( G G ) j j = 1 j The Overall Topological Complexity Indices Definition 2:The eth -order Overall Topological Index eOTI(G) of any graph G is defined as the sum of the topological index values TIj (eGj )of all eK subgraphs eGj of G, which havee edges:
Corollary 1: The Overall Topological Index OTI(G) of any graph G can be presented as a sum over all e-orders of this index eOTI(G): Some More Definitions Definition 3: The Overall Topological Index VectorOTIV(G)of any graph G is the ordered sequence of all eOTIs: OTIV(G) = OTI(1OTI, 2OTI, … , EOTI) Corollary 2: The E -order overall topological index, EOTI(G), is the index TI (G) itself: EOTI (G) = TI(G)
Definition 4a: The average overall topological index OTIa(G), and itse-order term eOTIa(Ge)are obtained by dividing OTI(G)oreOTI(G)by the number of vertices V: Definition 4b: The normalized overall topological index OTIn (G), and its e-termeOTIn (Ge),are obtained by dividingOTI (G)by the value OTI(KV) that index has for the complete graph KVhaving the same number of vertices V: Even More Definitions
Aren’t You Tired of Definitions? • The overall topological indices work well for molecules but what about networks? Computational disaster! The Solution: Use the first several orders of the OTIs ! Definition 5:The cumulativepth-orderoverall index pOTI(G) is defined as the sum over the first e = 0, 1, 2, … , p orders eOTI(G)s
Example for overall connectivity index, OC: How to Apply the Overall Topological Indices Approach to Molecules with Heteroatoms? • For OTI(G) ≡ OC, OM1, OM2 (overall connectivity and the first and second Zagreb index) substitute vertex degree ai with the Kier and Hall atomic valence term aiv:
Total Adjacency, A(G): - degree of vertex i; N – number of vertices in G First Zagreb Index, M1(G): Second Zagreb Index, M1(G): Wiener Number, W(G): - distance between vertices i and j; Topological Indices Used in Realizing the Overall Indices Program - distance of vertex i
The Hosoya Indexz(G): p (G,k)is the number of not adjacent k edges in G, p(G,0) being unity and p(G,1) the number of edges. • The Overall Hosoya Index OZ(G) H. Hosoya, Bull. Chem. Soc. Japan 44, 2332-2339 (1971). The Overall Hosoya Index
1 N = 5, E = 4 1 1 3 2 e=0 0SC=5,0OZ=5x1=5; 0OC= 3x1+2+3=8 1SC=4,1OZ=4x2=8 ; 1OC= 2x4+5+3 = 16 e=1 2SC=4,2OZ=4x3=12; 2OC=5+3x6=23 e=2 e=3 3SC=3,3OZ=1x4+2x5=14; 3OC=3x7=21 4SC=1,4OZ=1x7=7; 4OC= 3x1+2+3=8 e=4 SC = 17 (5, 4, 4, 3, 1); OZ = 46 ( 5, 8, 12, 14, 7); OC = 76(8, 16, 23, 21, 8) Examples of Calculation of Overall Topological Indices
Formulae for the Overall Indices for Some Classes of Graphs n – total number of vertices ; q – total number of edges ; e – number of edges in a subgraph eSC – number of subgraphs having e edges each Linear (Path) Graphs eSC(Pn) = n – e; SC(Pn) = n(n+1)/2 eOC(Pn) = 2[q(e+1) - e2] ; OC(Pn) = n(n-1)(n+4)/3 eOW (Pn) = e (e+1)(e+2)(n-e)/6 ; OW(Pn) = (n+3)(n+2)(n+1)n(n-1)/120 Monocyclic Graphs eSC(Cn) = n ; qSC(Cn) = 1 ; SC(Cn) = n2 + 1 eOC(Cn) = 2n(e+1) ; qOC(Cn) = 2n ; OC(Cn) = n(n2+n+2) eOW(Cn) = e (e+1)(e+2)n/6 (for e = 1, 2, …, n-1) ; qOW(Cn) = W OW(Cn) = (n5+2n4+2n3-2n2-an)/24 ; a(even) = 0 ; a(odd) = 3
Total Walk Count, twc 1 1 3 4 3 1 3 4 Rucker, G. & Rucker, C., J.Chem. Inf. Comput. Sci. (2000), 40, 99-106.Rucker, G. & Rucker, C., J.Chem. Inf. Comput. Sci. (2001) 41, 1457-1462. 2 Example • The number of walks of length l • that start in vertex i 1 3 4 5 • The total number of walks of length l WC = 106 ( 8, 16, 28, 54) l = 1 l =2 1 1 3 3 l = 3 1 3 4 5
# (OC) 1 (4) 2 (14) 3 (32) 4(39) 5(60) 6 (76) 7 (100) 8(100) 9(127) 10 (136) 11 (164) 12 (181) 13 (154) 14 (194) 15 (214) 16 (234) 17 (246) 18 (276) 19 (284) 20 (314) 21 (369) The Six Overall Topological Indices Order Structures According to Patterns of Increasing Complexity
Table 1. Quantitative Comparison of the Six Overall Topological Indices in C2-C7 Alkanes
Table 2. Standard deviations of the best C3-C8 alkane properties models with five parameters produced by the six overall topological indices versus those obtained by the set of molecular connectivity indices
6 3 4 5 SC = 11 17 20 26 OC = 32 76 100 160 TWC = 58 106 140 150 7 9 8 10 SC = 29 31 54 57 OC = 190 212 482 522 TWC = 178 214 300 350 11 12 13 14 15 The Overall Topological Indices and Complexity of Structures Containing Cycle SC = 61 114 119 477 973 OC = 566 1316 1396 7806 18180 TWC = 337 538 608 1200 1700
1 2 The Overall Complexity Measures Can Discriminate Very Subtle Complexity Features SC28 (5, 8, 9, 5, 1) 30 (5, 9, 10, 5, 1) OC (in)111 (12, 28, 41, 25, 5) 135 (16, 40, 49, 25, 5) TWC15 (5, 5, 5) 21 (5, 7, 9) Complexity of structure 2 is higher, because it has more complex cycle Cyclicity contributes more to complexity than Branching
Some Conclusions • While the six topological indices used show degeneracy • and order differently the isomeric molecules, the overall indices are non-degenerate and order similarly the molecules in series of increasing complexity. • The sets of overall topological indices produce QSPR models with (sometimes considerably) smaller standard deviations than the corresponding models with molecular connectivity indices. • The best model statistics is shown by overall connectivity, • followed by the overall Wiener indices. • The patterns of structural complexity deserve considerable • attention due to their generality
Molecular Branching Wiener, 1947: First analyzed some aspects of branching of molecular skeleton by fitting experimental data for several properties of alkane compounds to the diversion of his “path number” W in branched alkanes from that of the linear isomeric compound. Wiener, H. Structural Determination of Paraffin Boiling Points. J. Am. Chem. Soc. 1947, 69, 17-20. Relation of the Physical Properties of the Isomeric Alkanes to Molecular Structure. J. Phys. Chem. 1948, 52, 1082-1089. Rouvray, D.H. and King, P.B., Eds.,Topology in Chemistry. Discrete Mathematics of Molecules. Horwood, Chichester, U.K. 2002. Graph-invariants tested early as “branching indices” of acyclic molecules correlating to their properties: Graph non-adjacency number, Hosoya, 1971 Graph largest eigenvalue, Lovasz and Pelikan, 1973 First and second Zagreb indices, Gutman et al., 1975 Molecular branching index, Randić, 1975
D. Bonchev, and N. Trinajstic, Information Theory, Distance Matrix, and Molecular Branching, J. Chem. Phys. 67(1977) 4517‑4533. D. Bonchev and N. Trinajstic, On Topological Characterization of Molecular Branching. Intern. J. Quantum Chem. Symp. 12(1978)293‑303. D. Bonchev, Topological Order in Molecules. 1. Molecular Branching Revisited, Theochem 336(1995)137-156. The Branching Patterns of Molecular Structures The Goal: To go beyond inventing new graph invariants and experimental data fitting, and try to understand the topological basis of molecular properties. The Hypothesis: The increase in branching complexity is associated with a decrease in the Wiener number W.
The Rules of Branching Rule 1: (N – number of vertices in the main chain; j – branch position) Rule 2: Rule 3: (N1- number of vertices in the branch) Rule 4:
Rule 5: Rule 6: Rule 7: Rule 8:
v D. Bonchev, Topological Order in Molecules. 1. Molecular Branching Revisited, Theochem 336(1995)137-156. O. E. Polansky and D. Bonchev, Commun.Math. Comput. Chem. (MATCH) 1986, 21, 133‑186; 1990, 25, 3‑40. u v1 u1 Generalization of the Branching Rules • 5 more general rules derived: • threemechanisms of formation of new branches, • one with branch transformations related to a vertex degree redistribution, • one shows the topological identity of branch elongation to branch shifting • toward a more central position. Conclusions: The number of branches and the number of vertices of higher degree are considerably stronger complexity factors than the branch length and branch centrality, however the role of centrality increases with the size of the system, and becomes dominant in polymeric macromolecules.
Cyclic complexity increases by: A) stronger link between the cycles B) reduction in the cycle size for the creation of more cycles of smaller size Molecular Cyclicity Bonchev, Mekenyan, Trinajstic, 1979-1983 Similar conjecture: All structural patterns that increase the cyclic complexity of molecules are associated with a decrease in the Wiener number. 26 2 x 14 3 x 10 4 x 8 6 x 6
C) transforming a linear chain of cycles into a zigzag-like one LUMO LUMO D) HOMO HOMO increasing the number of cycles fused to a common edge (propelerity) ΔW < 0, ΔE > 0 ΔW < 0, ΔE < 0 Rules 3, 5-7, 9, 10, 12-15 Rule 1 E) With a single exception the 15 rules derived for benzenoid hydrocarbons identify structural transformations that increase their stability Papers for cyclic complexity: Intern. J. Quantum Chem. 1980, 17, 845‑89; 1981, 19, 929‑955. Math. Comput. Chem. (MATCH) 1979, 6, 93‑115; 1981, 11, 145‑168; Croat. Chem. Acta 1983,56, 237‑261.
1 4 3 2 5 6 7 Wiener “infinite” index: the limit for the Wiener number of a polymer having N non-H atoms, normalized per unit distance and unit bond: 8 9 (N – number of atoms, C – number of cycles) 10 For structure 9: Topology of Polymers (Bonchev, Mekenyan et al., 1980-1983)
Examples: = 2/15; d = 4, N1 = 6, C1 = 1, = 4/21 D = 2, N1 = 4, C1 = 1, Improved Method for Calculating Wiener Infinite Index T.-S. Balaban, A. T. Balaban, and D. Bonchev, J. Mol. Structure (Theochem) 2001, 535, 81-92 A simple equation incorporating only topological invariants of the monomer unit was derived 10 years later. These are the numbers of atoms N1 and cycles C1 in the monomer unit, as well as the number of bonds D (or the graph distance) between two neighboring monomer units:
g is the Zimm-Stockmayer branching ratio of a branched macromolecule is the friction coefficient, and c is the number of polymer chains in a unit volume Rg2 and g are measured by laser light scattering Equations linking the Wiener number to the radius of gyration and viscosity of polymer melts and solutions g (3-arm star) = = (3x1+3x2) / (3x1 + 2x2 + 1x3) = 9/10 = 0.9 Kirchhoff-number-based generalization of the equations for polymers containing atomic rings D. Bonchev, E. Markel, and A. Dekmezian, J. Chem. Inf. Comput. Sci. 2001, 41, 1274-1285. D. Bonchev, E. Markel, and A. Dekmezian, Polymer 2002, 43, 203-222.
Topology of Crystals • D. Bonchev, O. Mekenyan, and H. Fritsche, An Approach to the Topological Modeling of Crystal Growth, • J. Cryst. Growth 1980, 49, 90‑96. • D. Bonchev, O. Mekenyan, and H. Fritsche, A Topological Approach to Crystal Vacancy Studies. • Model Crystallites with a Single Vacancy, Phys. stat. sol. (a) 1979, 55, 181‑187. • O. Mekenyan, D. Bonchev, and H. Fritsche, A Topological Approach to Crystal Vacancy Studies. • II. Model Crystallites with Two and Three Vacancies, Phys. Stat. sol.(a) 1979, 56, 607‑614. • O. Mekenyan, D. Bonchev, and H. Fritsche, A Topological Approach to Crystal Defect Studies, • Z. Phys. Chem. (Leipzig) 1984, 265, 959‑967. • H. G. Fritsche, D. Bonchev, and O. Mekenyan, Deutung der Magischen Zahlen von Argonclustern • als Extremwerte Topologischer Indizes, Z. Chem. 1987, 27, 234. • H. G. Fritsche, D. Bonchev, and O. Mekenyan, On the Topologies of (M13)13 Superclusters of Ruthenium, • Rhodium and Gold, J. Less‑Common Metals 1988, 141, 137‑143. • H. G. Fritsche, D. Bonchev, and O. Mekenyan, Are Small Clusters of Inert‑Gas Atoms Polyhedra • of Minimun Surfaces? Phys. Stat. Sol.(b) 1988, 148K, 101‑104. • H. G. Fritsche, D. Bonchev, and O. Mekenyan, The Optimum Topology of Small Clusters, • Z. Phys. Chem. (Leipzig) 1989, 270, 467‑476. • H. G. Fritsche, D. Bonchev, and O. Mekenyan, A Topological Approach to Studies of Ordered • Structures of Absorbed Gases in Host Lattices (I). The Structure of ‑PdD0.5, Crystal Res. • Technol. 1983, 18, 1075‑1081. Basic criterion used: Wiener number minimum
W=1 W=8 W=48 W=369 Reproduced shape maximally close to the spherical shape typical for the free nucleation in vapor phase, and crystallization under zero-gravity conditions: W=972 W=5536 Crystal Growth The detailed sequences of crystal growth were constructed by adding an atom at each step, and by selecting from a number of candidate-structures the one with the minimum Wiener number.
Crystallization on a substrate with a low surface energy. The crystallization on a substrate with a high surface energy also reproduced the experimentally observed monolayer shape.
Prediction of the most probable locations of crystal vacancies and defect atoms Criterion used: Equations derived for a series of two- and three-dimensional models of crystal lattice with variable vacancy locations. For a simple cubic crystallite having N = 3x3x3 atoms, the variation in the Wiener number is expressed as: where i, j, k are the lattice nodes along the x, y, and z coordinate axes, respectively. ΔW increases when going from volume to face to edge to corner in agreement with thermodynamic theory and quantum chemical calculations.
Modeling of Atomic Clusters The Wiener number minimum was used again as a criterion • Adding one atom at a time over a certain crystal face and connecting this atom to all face atoms produced cluster genetic lines. • Two of the genetic lines resulted in icosahedrons, two others yielded cubo-octahedron, and another line generated anticubo-octahedron in agreement with the experimental data. • The minimum of the Wiener number in the icosahedron cluster • also explained the “magic” number 13, for which a maximum intensity • of cluster mass spectra has been observed. • Predicted correctly the doubly magic metal super clusters [(M13)13]n, where M = ruthenium, rhodium or gold, as well as the stable argon clusters at the magic numbers 13, 19, 23, 26, 29, and 32.
“Small-World “Connectivity • Complex network properties: High Connectivity and Small Diameter • They can be integrated into a single parameter: B1 = A/D = <ai>/<di> <ai> - average vertex degree; <d,> - average node distance B1 – a quick estimate of network complexity B2 – a much more precise complexity measure bi - a measure of node centrality
2 3 4 1 A/D 0.200 0.222 0.250 0.333 B2 1.105 1.294 1.571 1.667 B3 2.385 2.554 2.628 3.871 5 6 7 8 A/D 0.313 = 0.313 0.429 0.400 B2 1.677 1.783 2.200 2.211 B3 3.641 3.650 3.387 4.972 10 11 12 13 9 A/D 0.429 0.538 = 0.538 0.818 1 B2 2.410 2.867 2.943 4.200 5 B3 4.957 6.298 6.311 9.580 11.61 Complexity Patterns Analysis