480 likes | 699 Views
Graph Mining Applications in Machine Learning Problems. Max Planck Institute for Biological Cybernetics Koji Tsuda. Serial Num. Name. Age. Sex. Address. …. 0001. ○○. 40. Male. Tokyo. …. 0002. ××. 31. Female. Osaka. …. Motivations for graph analysis.
E N D
Graph Mining Applicationsin Machine Learning Problems Max Planck Institute for Biological Cybernetics Koji Tsuda
Serial Num Name Age Sex Address … 0001 ○○ 40 Male Tokyo … 0002 ×× 31 Female Osaka … Motivations for graph analysis • Existing methods assume ” tables” • Structured data beyond this framework → New methods for analysis
A C G C UA CG CG U U U U Graph Structures in Biology • Compounds • DNA Sequence • RNA • Texts in literature H C C C H H O C C H C H H Amitriptyline inhibits adenosine uptake
Overview • Path representation • Graph Kernel … Its disadvantages • Substructure representation • Graph Mining • EM-based Graph Clustering (Tsuda and Kudo, ICML 2006)
Marginalized Graph Kernels (Kashima, Tsuda, Inokuchi, ICML 2003) • Going to define the kernel function • Both vertex and edges are labeled
Label path • Sequence of vertex and edge labels • Generated by random walking • Uniform initial, transition, terminal probabilities
A c D b E B c D a A Kernel definition • Kernels for paths • Take expectation over all possible paths! • Marginalized kernels for graphs
Transition probability : Initial and terminal : omitted • : Set of paths ending at v • KV : Kernel computed from the paths ending at (v, v’) • KV is written recursively • Kernel computed by solving linear equations (polynomial time) A(v’) v v’ A(v) Computation
Graph Kernel Applications • Chemical Compounds (Mahe et al., 2005) • Protein 3D structures (Borgwardt et al, 2005) • RNA graphs (Karklin et al., 2005) • Pedestrian detection • Signal Processing
Strong points of MGK • Polynomial time computation O(n^3) • Positive definite kernel • Support Vector Machines • Kernel PCA • Kernel CCA • And so on…
Drawbacks of graph kernels • Global similarity measure • Fails to capture subtle differences • Long paths suppressed • Results not interpretable • Structural features ignored (e.g. loops) • No labels -> kernel always 1
Substructure Representation • 0/1 vector of pattern indicators • Huge dimensionality! • Need Graph Mining for selecting features patterns
Graph Mining • Subfield of Data Mining • KDD, ICDM, PKDD • not popular in ICML, NIPS • Analysis of Graph Databases • Frequent Substructure Mining • Combinatorial algorithm • Recently developed • AGM (Inokuchi et al., 2000), gspan (Yan et al., 2002), Gaston (2004)
Graph Mining • Frequent Substructure Mining • Enumerate all patterns occurred in at least m graphs • :Indicator of pattern k in graph i Support(k): # of occurrence of pattern k
Enumeration on Tree-shaped Search Space • Each node has a pattern • Generate nodes from the root: • Add an edge at each step
Support(g): # of occurrence of pattern g Tree Pruning • Anti-monotonicity: • If support(g) < m, stop exploring! Not generated
Gspan (Yan and Han, 2002) • Efficient Frequent Substructure Mining Method • DFS Code • Efficient detection of isomorphic patterns • Extend Gspan for our works
A labeled graph G b A C 3 0 b a B A c 2 1 Depth First Search (DFS) Code DFS Code Tree on G [2,0,A,b,A] [0,1,A,a,B] [1,2,B,c,A] [0,2,A,b,A] [0,1,A,a,B] [0,3,A,b,C] G0 G1 [2,0,A,b,A] [0,3,A,b,C] [0,3,A,b,C] Isomorphic [0,3,A,b,C] Non-minimum DFS code. Prune it.
Discriminative patterns • w_i > 0: positive class • w_i < 0: negative class • Weighted Substructure Mining • Patterns with large frequency difference • Not Anti-Monotonic: Use a bound
Multiclass version • Multiple weight vectors • (graph belongs to class ) • (otherwise) • Search patterns overrepresented in a class
Summary of Graph Mining • Efficient way of searching patterns satisfying predetermined conditions • NP hard • But actual speed depends on the data • Faster for.. • Sparse graphs • Diverse kinds of labels
EM-based graph clustering • Motivation • Learning a mixture model in the feature space of patterns • Basis for more complex probabilistic inference • L1 regularization & Graph Mining • E-step -> Mining -> M-step
:Feature vector of a graph (0 or 1) Probabilistic Model • Binomial Mixture • Each Component :Mixing weight for cluster :Parameter vector for cluster
Ordinary EM algorithm • Maximizing the log likelihood • E-step: Get posterior • M-step: Estimate using posterior probs. • Both are computationally prohibitive (!)
Regularization • L1-Regularized log likelihood • Baseline constant • ML parameter estimate using single binomial distribution • In solution, most parameters exactly equal to constants
E-step • Active pattern • E-step computed only with active patterns (computable!)
M-step • Putative cluster assignment • Each parameter is solved separately • Naïve way: • solve it for all params and identify active patterns • Use graph mining to find active patterns
Solution • Occurrence probability in a cluster • Overall occurrence probability
Important Observation For active pattern k, the occurrence probability in a graph cluster is significantly different from the average
Mining for Active Patterns • Active pattern • Equivalently written as • F can be found by graph mining! (multiclass)
Experiments: RNA graphs • Stem as a node • Secondary structure by RNAfold • 0/1 Vertex label (self loop or not)
Clustering RNA graphs • Three Rfam families • Intron GP I (Int, 30 graphs) • SSU rRNA 5 (SSU, 50 graphs) • RNase bact a (RNase, 50 graphs) • Three bipartition problems • Results evaluated by ROC scores (Area under the ROC curve)
Conclusion • Substructure representation is better than paths • Probabilistic inference helped by graph mining • Many possible extensions • Naïve Bayes • Graph PCA, LFD, CCA • Semi-supervised learning • Applications in Biology?