Graph Mining Applications in Machine Learning Problems

Graph Mining Applicationsin Machine Learning Problems Max Planck Institute for Biological Cybernetics Koji Tsuda

Serial Num Name Age Sex Address … 0001 ○○ 40 Male Tokyo … 0002 ×× 31 Female Osaka … Motivations for graph analysis • Existing methods assume ” tables” • Structured data beyond this framework → New methods for analysis

Graphs..

A C G C UA CG CG U U U U Graph Structures in Biology • Compounds • DNA Sequence • RNA • Texts in literature H C C C H H O C C H C H H Amitriptyline inhibits adenosine uptake

Overview • Path representation • Graph Kernel … Its disadvantages • Substructure representation • Graph Mining • EM-based Graph Clustering (Tsuda and Kudo, ICML 2006)

Path Representations &Marginalized Graph Kernels

Marginalized Graph Kernels (Kashima, Tsuda, Inokuchi, ICML 2003) • Going to define the kernel function • Both vertex and edges are labeled

Label path • Sequence of vertex and edge labels • Generated by random walking • Uniform initial, transition, terminal probabilities

Path-probability vector

A c D b E B c D a A Kernel definition • Kernels for paths • Take expectation over all possible paths! • Marginalized kernels for graphs

Transition probability : Initial and terminal : omitted • : Set of paths ending at v • KV : Kernel computed from the paths ending at (v, v’) • KV is written recursively • Kernel computed by solving linear equations （polynomial time） A(v’) v v’ A(v) Computation

Graph Kernel Applications • Chemical Compounds (Mahe et al., 2005) • Protein 3D structures (Borgwardt et al, 2005) • RNA graphs (Karklin et al., 2005) • Pedestrian detection • Signal Processing

Strong points of MGK • Polynomial time computation O(n^3) • Positive definite kernel • Support Vector Machines • Kernel PCA • Kernel CCA • And so on…

Drawbacks of graph kernels • Global similarity measure • Fails to capture subtle differences • Long paths suppressed • Results not interpretable • Structural features ignored (e.g. loops) • No labels -> kernel always 1

Substructure Representation &Graph Mining

Substructure Representation • 0/1 vector of pattern indicators • Huge dimensionality! • Need Graph Mining for selecting features patterns

Graph Mining • Subfield of Data Mining • KDD, ICDM, PKDD • not popular in ICML, NIPS • Analysis of Graph Databases • Frequent Substructure Mining • Combinatorial algorithm • Recently developed • AGM (Inokuchi et al., 2000), gspan (Yan et al., 2002), Gaston (2004)

Graph Mining • Frequent Substructure Mining • Enumerate all patterns occurred in at least m graphs • :Indicator of pattern k in graph i Support(k): # of occurrence of pattern k

Enumeration on Tree-shaped Search Space • Each node has a pattern • Generate nodes from the root: • Add an edge at each step

Support(g): # of occurrence of pattern g Tree Pruning • Anti-monotonicity: • If support(g) < m, stop exploring! Not generated

Gspan (Yan and Han, 2002) • Efficient Frequent Substructure Mining Method • DFS Code • Efficient detection of isomorphic patterns • Extend Gspan for our works

A labeled graph G b A C 3 0 b a B A c 2 1 Depth First Search (DFS) Code DFS Code Tree on G [2,0,A,b,A] [0,1,A,a,B] [1,2,B,c,A] [0,2,A,b,A] [0,1,A,a,B] [0,3,A,b,C] G0 G1 [2,0,A,b,A] [0,3,A,b,C] [0,3,A,b,C] Isomorphic [0,3,A,b,C] Non-minimum DFS code. Prune it.

Discriminative patterns • w_i > 0: positive class • w_i < 0: negative class • Weighted Substructure Mining • Patterns with large frequency difference • Not Anti-Monotonic: Use a bound

Multiclass version • Multiple weight vectors • (graph belongs to class ) • (otherwise) • Search patterns overrepresented in a class

Summary of Graph Mining • Efficient way of searching patterns satisfying predetermined conditions • NP hard • But actual speed depends on the data • Faster for.. • Sparse graphs • Diverse kinds of labels

EM-based clustering of graphs(Tsuda and Kudo, ICML 2006)

EM-based graph clustering • Motivation • Learning a mixture model in the feature space of patterns • Basis for more complex probabilistic inference • L1 regularization & Graph Mining • E-step -> Mining -> M-step

:Feature vector of a graph (0 or 1) Probabilistic Model • Binomial Mixture • Each Component :Mixing weight for cluster :Parameter vector for cluster

Ordinary EM algorithm • Maximizing the log likelihood • E-step: Get posterior • M-step: Estimate using posterior probs. • Both are computationally prohibitive (!)

Regularization • L1-Regularized log likelihood • Baseline constant • ML parameter estimate using single binomial distribution • In solution, most parameters exactly equal to constants

E-step • Active pattern • E-step computed only with active patterns (computable!)

M-step • Putative cluster assignment • Each parameter is solved separately • Naïve way: • solve it for all params and identify active patterns • Use graph mining to find active patterns

Solution • Occurrence probability in a cluster • Overall occurrence probability

Solution

Important Observation For active pattern k, the occurrence probability in a graph cluster is significantly different from the average

Mining for Active Patterns • Active pattern • Equivalently written as • F can be found by graph mining! (multiclass)

Experiments: RNA graphs • Stem as a node • Secondary structure by RNAfold • 0/1 Vertex label (self loop or not)

Clustering RNA graphs • Three Rfam families • Intron GP I (Int, 30 graphs) • SSU rRNA 5 (SSU, 50 graphs) • RNase bact a (RNase, 50 graphs) • Three bipartition problems • Results evaluated by ROC scores (Area under the ROC curve)

Examples of RNA Graphs

ROC Scores

No of Patterns & Time

Found Patterns

Conclusion • Substructure representation is better than paths • Probabilistic inference helped by graph mining • Many possible extensions • Naïve Bayes • Graph PCA, LFD, CCA • Semi-supervised learning • Applications in Biology?

Ongoing work..

Graph Mining Applications in Machine Learning Problems