10 likes | 137 Views
Kernel Functions for Chemical Classification Aaron Smalter, Jun Huan, Gerald Lushington {asmalter,jhuan,glushington}@ku.edu. Chemical and Graph Classification. Support Vector Machine. Chemicals are structured as graphs. Vertices and edges correspond to atoms and bonds. Labeled, undirected.
E N D
Kernel Functions for Chemical ClassificationAaron Smalter, Jun Huan, Gerald Lushington{asmalter,jhuan,glushington}@ku.edu Chemical and Graph Classification Support Vector Machine • Chemicals are structured as graphs. • Vertices and edges correspond to atoms and bonds. • Labeled, undirected. • Graph classification is critical for drug development and screening. • Sifting through large databases of compounds requires efficiency. • Costs of chemical manufacture and assay experiments necessitate accuracy. • Traditional chemical classifiers use vector representations of chemicals, neglecting the rich structure of graph models. • SVM is a fast, accurate classifier designed for vector data. • Crucially, SVM internally represents data points as inner products between pair of input vectors. • SVM can then linearly classify non-linear data distributions by applying the kernel trick, and replacing the inner product <x,y> with some similarity measurement function, K(x,y) • The key is that this kernel function K can be defined on non-vector data, allowing direct operation on structured data such as graphs. Figure 1. Using graphs to model chemicals. Figure 2. A kernel function maps nonlinear data (left) into a linearly separable space (right). Graph Kernel Functions Our Work • Problem of chemical graph classification changes: • fromfinding vector representations of graphs, to defining high-quality kernel functions to compare graphs. • Previous kernel functions - • Decompose graphs into substructures such as paths, cycles, and trees. • Optimally assign vertices based on neighborhood similarity. • Respective limitations are: • Dependency on particular decompositions; pattern enumeration time. • Inefficient recursive comparison and a flaw rendering them not true kernel functions. • We can improve graph kernels with several ideas: • Embed frequent patterns by using their occurrences as features in the graph. [1] • Use wavelet functions to compress neighborhood information.[2] • Avoid finding an optimal assignment by using setmatching and summing the kernels between all vertex pairs. Figure 4. Frequent patterns annotate graph vertices. Figure 3. Finding an optimal assignment using a bipartite graph. Figure 5. A wavelet function overlays a chemical graph. Fig 6. Comparing graph kernels, our GPM method performs best overall. This work supported by K-INBRE (NIH/NCRR award #P20 RR016475), the KU CMLD (NIH/NIGM award #P50 GM069663), and NIH grant #R01 GM868665. [1] A. Smalter, J. Huan, G. Lushington. Chemical Compound Classification with Automatically Mined Structure Patterns. Proc. of the 6th Asia Pacific Bioinformatics Conference (APBC). 2008. [2] A. Smalter, J. Huan, G. Lushington. Graph Wavelet Alignment Kernels for Drug Virtual Screening. Proc. of the 7th Annual Int. Conf. On Computational Systems Bioinformatics. 2008.