Chemicals are structured as graphs. Vertices and edges correspond to atoms and bonds.

Kernel Functions for Chemical ClassificationAaron Smalter, Jun Huan, Gerald Lushington{asmalter,jhuan,glushington}@ku.edu Chemical and Graph Classification Support Vector Machine • Chemicals are structured as graphs. • Vertices and edges correspond to atoms and bonds. • Labeled, undirected. • Graph classification is critical for drug development and screening. • Sifting through large databases of compounds requires efficiency. • Costs of chemical manufacture and assay experiments necessitate accuracy. • Traditional chemical classifiers use vector representations of chemicals, neglecting the rich structure of graph models. • SVM is a fast, accurate classifier designed for vector data. • Crucially, SVM internally represents data points as inner products between pair of input vectors. • SVM can then linearly classify non-linear data distributions by applying the kernel trick, and replacing the inner product <x,y> with some similarity measurement function, K(x,y)‏ • The key is that this kernel function K can be defined on non-vector data, allowing direct operation on structured data such as graphs. Figure 1. Using graphs to model chemicals. Figure 2. A kernel function maps nonlinear data (left) into a linearly separable space (right). Graph Kernel Functions Our Work • Problem of chemical graph classification changes: • fromfinding vector representations of graphs, to defining high-quality kernel functions to compare graphs. • Previous kernel functions - • Decompose graphs into substructures such as paths, cycles, and trees. • Optimally assign vertices based on neighborhood similarity. • Respective limitations are: • Dependency on particular decompositions; pattern enumeration time. • Inefficient recursive comparison and a flaw rendering them not true kernel functions. • We can improve graph kernels with several ideas: • Embed frequent patterns by using their occurrences as features in the graph. [1] • Use wavelet functions to compress neighborhood information.[2] • Avoid finding an optimal assignment by using setmatching and summing the kernels between all vertex pairs. Figure 4. Frequent patterns annotate graph vertices. Figure 3. Finding an optimal assignment using a bipartite graph. Figure 5. A wavelet function overlays a chemical graph. Fig 6. Comparing graph kernels, our GPM method performs best overall. This work supported by K-INBRE (NIH/NCRR award #P20 RR016475), the KU CMLD (NIH/NIGM award #P50 GM069663), and NIH grant #R01 GM868665. [1] A. Smalter, J. Huan, G. Lushington. Chemical Compound Classification with Automatically Mined Structure Patterns. Proc. of the 6th Asia Pacific Bioinformatics Conference (APBC). 2008. [2] A. Smalter, J. Huan, G. Lushington. Graph Wavelet Alignment Kernels for Drug Virtual Screening. Proc. of the 7th Annual Int. Conf. On Computational Systems Bioinformatics. 2008.

Chemicals are structured as graphs. Vertices and edges correspond to atoms and bonds.

Chemicals are structured as graphs. Vertices and edges correspond to atoms and bonds.

Presentation Transcript

Assembly of DNA Graphs whose Edges are Helix Axes

How many vertices, edges, and faces of the polyhedron are there? List them.

Atoms and Bonds What is “stuff” made of?

Filters and Edges

Variables ⟿ Vertices Constraints ⟿ Edges

Variables ⟿ Vertices Constraints ⟿ Edges

Gradients and edges

Faces, Vertices and Edges

Prisms, Pyramids, Cross-Sections, Nets, Vertices, Faces, Edges and Euler’s Rule.

Faces, Edges and Vertices

Boundary vertices in graphs

The chemical bonds between atoms are not rigid :

Vertices and Fragments III

Tree edges are shown as heavy solid edges Cross edges are shown as dotted edges

Prisms, Pyramids, Cross-Sections, Nets, Vertices, Faces, Edges and Euler ’ s Rule.

Fragmentation and Edges

UNIT 1, Atoms, Bonds and Groups

Vertices, Edges and Faces By Jordan Diamond

Intro to OpenGL: Vertices and Drawing