Fast Jensen-Shannon Graph Kernel

Fast Jensen-Shannon Graph Kernel Bai Lu and Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award

Structural Variations

Protein-Protein Interaction Networks

Manipulating graphs • Is structure similar (graph isomorphism, inexact match)? • Is complexity similar (are graphs from same class but different in detail)? • Is complexity (type of structure) uniform?

Goals • Can we capture determine the similarity of structure using measures that capture their intrinsic complexity. • Can graph entropies be used for this purpose. • If they can then they lead naturally to information theoretic kernels and description length for learning over graph data.

Outline • Literature Review: State of the Art Graph Kernels • Existing graph kernel methods：Graph kernels based on a) walks, b) pathsorc) subgraphor subtree structures. • Prior Work: Recently we have developed on information theoretic graph kernel based on Jensen-Shannon divergence probability distributions on graphs. • Fast Jensen-ShannonGraph Kernel: • Based on depth depth-based subgraphrepresentation of a graph • Based around graph centroid • Experiments • Conclusion

Literature Review: Graph Kernels • Existing Graph Kernels (i.e Graph Kernels from the R-convolution [Haussler, 1999]) fall into three classes: • Restricted subgraph or subtree kernels • Weisfeiler-Lehman subtree kernel [Shevashidze et al., 2009, NIPS] • Random walk kernels • Product graph kernels [Gartner et al., 2003, ICML] • Marginalized kernels on graphs [Kashima et al., 2003, ICML] • Path based kernels • Shortest path kernel [Borgwardt, 2005, ICDM]

Motivation • Limitations of existing graph kernel • Can not scale up to substructures of large size (e.g. (sub)graphs with hundreds or even thousands vertices). Compromised to substructures of limited size and only roughly capture topological arrangement within a graph. • Even for relatively small subgraphs, most graph kernels still require significant computational overheads. • Aim: develop a novel subgraph kernel for efficient computation, even when a pair of fully sized subgraphs are compared.

Approach Investigate how to kernelize depth-based graph representations by similarity for K-layer subgraphs using the Jensen-Shannon divergence. Commence by showing how to compute a fast Jensen-Shannon diffusion kernel for a pair of (sub)graphs. Describe how to compute a fast depth-based graph representation., based on complexity of structure. Combine ideas to compute fast Jensen-Shannon subgraph kernel.

Notation • Notation • Consider a graph , adjacency matrix has elements • The vertex degree matrix of is given by • NormalaisedLaplacian and its spectrum

The Jensen-Shannon Diffusion Kernel • Jensen-Shannon diffusion kernel for graphs: • For graphs Gpand Gq,the Jensen-Shannon divergence is where is entropy of composite structure formed from two (sub)graphs being compared (here we use the disjoint union). • The Jensen-Shannon diffusion kernel for Gpand Gqis where entropy H(·) is either Shannon or the von Neumann.

Composite Structure • Composite entropy of disjoint union • A disjoint union of a pair of graph of graphsGpandGqis Graphs Gp and Gq are the connected components of the disjoint union graph GDU. • Let p = |V p|/|V DU |and q = |V q|/|VDU|. • Entropy (i.e. the composite entropy) of GDU is

Graph Entropy: Measures of complexity • Shannon entropy of random walk : The probability of a steady state random walk on visiting vertex vi is . Shannon entropy of steady state random walk is • von Neumann entropy: entropy associated with normalisedLaplacianeigenvalues. Approximated by (Han PRL12)

Properties • The Jensen-Shannon diffusion kernel for graphs: • The Jensen-Shannon diffusion kernel is positive definite (pd). This follows the definitions in [Kondor and Lafferty, 2002, ICML], if a dissimilarity measure between a pair of graphs Gpand Gqsatisfies symmetry, then a diffusion kernel associated with the similarity measure is pd. • Time Complexity: For a pair of graphs Gpand Gqboth having n vertices, computing the Jensen-Shannon diffusion kernel requires time complexity O(n^2).

Idea • Decompose graph into layered subgraphs from centroid. • Use JSD to compare subgraphs. • Construct kernel over subgraphs.

The Depth-Based Representation of A Graph • Subgraphs from the Centroid Vertex • For graph G(V,E), construct shortest path matrix matrixSG whose element SG(i, j) are the shortest path lengths between vertices vi and vj . Average-shortest-path vector SV for G(V,E) is a vector with element from vertex vi to the remaining vertices. • Centroid vertex for G(V,E) as • The K-layer centroid expansion subgraph where

Depth-Based Representation • For a graph G, we obtain a family of centroid expansion subgraphs , the depth-based representation of Gis defined as where H(·)is either the Shannon entropy or the von Neumann entropy. Measures complexity via variation of entropy with depth

The Depth-Based Representation • An example of the depth-based representation for a graph from the centroid vertex

Fast Jensen-Shannon Subgraph Kernel • For a pair of graphs Gp(Vp, Ep) and Gq(Vq, Eq), similarity measure is is summed over an entropy-based similarity measure for the K-layer subgraphs. • Jensen-Shannon diffusion kernel is the sum of the diffusion kernel measures for all the pairs of K-layer subgraphs • Jensen-Shannon subgraph kernel is pd. Because, the proposed subgraph kernel is the sum of the positive Jensen-Shannon diffusion kernel.

Times Complexity • Subgraph kernel graphs for graphs with n vertices and m edges, has time complexity O(n^2L + mn), where L is the size of the largest layer of the expansion subgraph. Depth–based representation is O(n^2L+mn). Jensen-Shannon diffusion kernel is O(n^2).

Observations • Advantages • a) von Neumann entropy is associated with the degree variance of connected vertices. Subgraph kernel is sensitive to interconnections between vertex clusters. • b) For Shannon entropy vertices with large degrees dominate the entropy. Subgraph kernel is suited to characterizing a group of highly interconnected vertices, i.e. a dominant cluster. • c) The depth-based representation captures inhomogeneities of complexity with depth. Enables it go gauge structure more finely than straightforwardly applying Jensen-Shannon diffusion kernel to original graphs. • d) The proposed subgraph kernel only compares the pairs of subgraphs with the same layer size K. Avoids enumerating all the pairs of subgraphs and renders an efficient computation. • e) Overcomes the subgraph size restriction which arises in existing graph kernels.

Experiments (New, not in the paper) • We evaluate the classification performance of our kernel using 10-fold cross validation associated with C-Support Vector Machine. (Intel i5 3210M 2.5GHz) • Classification of graphs abstracted from bioinformatics and computer vision databases. This datasets include: GatorBait (3D shapes), DD, COIL5 (images), CATH1, CATH2. • Graph kernels for comparisons include: a) our kernel: 1) using the Shannon entropy (JSSS) 2) using the von Neumann entropy (JSSV) b) Weisfeiler-Lehman subtree kernel (WL), c) the shortest path graph kernel (SPGK), d) the graphlet count kernel (GCGK)

Experiments • Details of the datasets

Experiments Classification Timing

Conclusion and Further Work Conclusion Presented a fast version of our Jensen-Shannon kernel. Compares well to alternatives on standard ML datasets. Further Work Hypergraphs, alternative entropies and divergences.

Fast Jensen-Shannon Graph Kernel

Fast Jensen-Shannon Graph Kernel

Presentation Transcript

Stefan Jensen

Fast Planning through Planning Graph Analysis

Shannon

NeMa : Fast Graph Search with Label Similarity

Fast Approximate Energy Minimization via Graph Cuts

A Graph-Matching Kernel for Object Categorization

Multiple-unicast, Graph Guessing Games and Non-Shannon Inequalities

A Graph-Matching Kernel for Object Categorization

Fast Dynamic Binary Translation for the Kernel

Fast Approximate Energy Minimization via Graph Cuts

A Fast Jensen-Shannon Subgraph Kernel

BiG -Align: Fast Bipartite Graph Alignment

Pernille Jensen

Kernel Density Estimation, Kernel Methods, and fast learning

Fast Methods for Kernel-based Text Analysis

Pernille Jensen