1 / 29

Daylight and Discovery

Daylight and Discovery. How do I impress the boss when I get back?. A constant fight against the hedgehogs!!. What is Discovery?. What have I learned this week?. Above all you have learned new languages that allow you to communicate chemical concepts to, and between, machines.

kdavid
Download Presentation

Daylight and Discovery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Daylight and Discovery How do I impress the boss when I get back?

  2. A constant fight against the hedgehogs!! What is Discovery?

  3. What have I learned this week? • Above all you have learned new languages that allow you to communicate chemical concepts to, and between, machines. • These languages also allow you to communicate these concepts via machines to your colleagues. • You have also learned about other descriptionsofa molecular structure, such as fingerprints.

  4. Language recap • SMILES • SMARTS • SMIRKS • (FINGERPRINTS)

  5. SMILES • SMILES contains the same information as might be found in an extended connection table. • The primary reason SMILES is more useful than a connection table is that it is a linguistic construct, rather than a computer data structure. • SMILES is a true language, albeit with a simple vocabulary (atom and bond symbols) and only a few grammar rules. • SMILES can be canonicalised. I.e. there is a unique, universal “name” for a structure • SMILES representations of structure can in turn be used as “words” in the vocabulary of other languages designed for storage and retrieval of chemical information .E.g HTML, XML or query languages such as SQL.

  6. SMILES syntax [atom]bond[atom] etc atom : ‘[‘ <mass> symbol <chiral> <hcount> <sign<charge>> <‘:’class> ‘]’ ; bond : <empty> | ’-’ | ‘=‘ | ‘#’ | ‘:’ | ‘.’ ; Common elements, in the organic subset B,C,N,O,P,S,F,Cl,Br,I, in their lowest common valence state(s), can be written without brackets. If bonds are omitted, they default to single or aromatic, as appropriate, for juxtaposed atoms.

  7. Example SMILES

  8. SMARTS • In the SMILES language, there are two fundamental types of symbols: atoms and bonds. Using these SMILES symbols, one can specify a molecule's graph (its "nodes" and "edges") and assign "labels" to the components of the graph (that is, say what type of atom each node represents, and what type of bond each edge represents). • The same is true in SMARTS: One uses atomic and bond symbols to specify a graph. However, in SMARTS the labels for the graph's nodes and edges (its "atoms" and "bonds") are extended to include "logical operators" and special atomic and bond symbols; these allow SMARTS atoms and bonds to be more general. For example, the SMARTS atomic symbol [C,N] is an atom that can be aliphatic C or aliphatic N; the SMARTS bond symbol "~" (tilde) matches any bond

  9. Example SMARTS

  10. Useful SMARTS Heavy atom [!$([#6,#7,#8,#9,#15,#16,#17,#35,#53])] Rotatable bonds [!$(*#*)&!D1]-&!@[!$(*#*)&!D1] Secondary amides [N&H1&D2]-&!@[#6&X3] H-donors [!#6;!H0] H-acceptors [$([!#6;+0]);!$([F,Cl,Br,I]);!$([o,s,nX3]);!$([Nv5,Pv5,Sv4,Sv6])] Isolating carbons [#6;!$(C(F)(F)F);!$(c(:[!c]):[!c]);!$([#6]=,#[!#6]);!$([#6;!+0])] Stereo atoms [$([X4&!v6&!v5;H0,H1]),$([SX3]([#6])([#6])~O)] Stereo bonds [CX3;!H2]=[CX3;!H2] Stereo allenes [CX3;H0]=C=[CX3;H0,H1]

  11. Rotatable bonds[!$(*#*)&!D1]-&!@[!$(*#*)&!D1] • An atom which is • NOT triply bonded to another atom • AND NOT 1-connected ( I.e. Not terminal ) • Bonded by • A single bond • AND NOT a ring bond • to the same type of atom

  12. Chemical Information Concepts in Discovery • Matching • Total • Partial • Similarity • Qualitative • Quantitative • Both matching and similarity are opinions as they depend on descriptors.

  13. Filtering • Quite often you may wish to eliminate compounds which are inappropriate for some activity or test. • E.g. Delete any molecule from a list which contains a “heavy metal” i.e. a non-common element • > $CONTRIB/smarts_filter -v \ ‘[!$([#6,#7,#8,#9,#15,#16,#17,#35,#53])]’

  14. Counting things • Count matches to patterns defined in SMARTS • Molecular formula • H-donors • H-acceptors • Rotatable bonds • Chiral centres • Rings • Fragments

  15. Molecular formula C13H22N4O3S H-donors 2 H-acceptors 6 Rotatable bonds 8 Chiral centres 1 Rings 1 Fragments 6 Example

  16. Estimating Measured Properties • Any property which is an additive constitutive property of a molecule can be calculated by • counting the matches of the constituent patterns • lookup the weight for the pattern • summing the products of the count and individual pattern weights. • apply any correction factors

  17. Examples of properties to calculate • Molecular Weight • logP • Parachor • Molar Volume • Molar Refractivity • ……….

  18. Molecular weight: a simple example • Molecular weight • Molecular formula • (count(atom(i))*atomic_weight(atom(i))) • Accuracy depends on accuracy of atomic weights ( IUPAC) • C13H22N4O3S • 314.45 (average molecular weight ) • 314.141235 ( accurate mass of commonest isotope)

  19. CLOGP: A more complicated example • Algorithmic definition of fragment • Pattern = NOT an isolating carbon • Match the pattern to find all the fragments • Look up the fragment value(s) ( if it exists ) using the unique string(s) from the match. • Accumulate the values for fragments and non-fragments (isolating carbons). • Correct for proximity

  20. CLOGP example • 2 * Cl +1.880 • guanidyl –1.930 • 2 * C +0.390 • 6 * c +0.780 • 7 * H +1.589 • Proximity –0.984 Total +1.727

  21. Estimating values for concepts • Flexibility • Ratio of number of rotatable bonds to total number of bonds • Rigidity • Molecular similarity between original molecule and molecules formed by breaking all rotatable bonds • Difficulty of synthesis • Ratio of number of potential chiral centres weighted for rings to total number of heavy atoms in a molecule

  22. Flexibility 0.38 Rigidity 0.3819 Difficulty of synthesis 0.05 Example

  23. Flexibility 0.38(0.00) Rigidity 0.3819(1.00) Difficulty of synthesis 0.05 (0.85) Figures in parentheses for morphine Example

  24. Relationships between compounds • Compound sets • Molecular descriptors • Fingerprints etc • Similarity measures • Tanimoto etc • Clustering • Jarvis-Patrick etc

  25. Relationships between compounds • Mixtures • Molecular descriptors • Modal Fingerprints etc • Similarity measures • Tanimoto etc • Prototypes • Family Resemblance

  26. Relationships between compounds • Reactions • Molecular descriptors • Fingerprints • Rôles • Schemes/pathways • Similarity and clustering

  27. Examples • Creating a spreadsheet of properties. • Non-standard fingerprinting and similarity.

  28. Don’t let the hedgehogs take over…..

  29. Don’t let the hedgehogs take over…..

More Related