1 / 53

Markov Cluster A lgorithm

Markov Cluster A lgorithm. Outline. Introduction Important Concepts in MCL Algorithm MCL Algorithm The Features of MCL Algorithm Summary. Graph Clustering. Intuition: High connected nodes could be in one cluster Low connected nodes could be in different clusters. Model:

ajustice
Download Presentation

Markov Cluster A lgorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Markov Cluster Algorithm

  2. Outline • Introduction • Important Concepts in MCL Algorithm • MCL Algorithm • The Features of MCL Algorithm • Summary

  3. Graph Clustering Intuition: High connected nodes could be in one cluster Low connected nodes could be in different clusters. Model: A random walk may start at any node Starting at node r, if a random walk will reach node t with high probability, then r and t should be clustered together.

  4. Markov Clustering (MCL) Markov process The probability that a random will take an edge at node u only depends on u and the given edge. It does not depend on its previous route. This assumption simplifies the computation.

  5. MCL Flow network is used to approximate the partition There is an initial amount of flow injected into each node. At each step, a percentage of flow will goes from a node to its neighbors via the outgoing edges.

  6. MCL Edge Weight Similarity between two nodes Considered as the bandwidth or connectivity. If an edge has higher weight than the other, then more flow will be flown over the edge. The amount of flow is proportional to the edge weight. If there is no edge weight, then we can assign the same weight to all edges.

  7. Intuition of MCL Two natural clusters When the flow reaches the border points, it is likely to return back, than cross the border. A B

  8. MCL When the flow reaches A, it has four possible outcomes. Three back into the cluster, one leak out. ¾ of flow will return, only ¼ leaks. Flow will accumulate in the center of a cluster (island). The border nodes will starve.

  9. Introduction—MCL in General • Simualtion of Random Flow in graph • Two Operations: Expansionand Inflation • Intrinsicrelationship between MCL process result and cluster structure

  10. Introduction-Cluster • Popular Description: partition into graph so that • Intra-partition similarity is the highest • Inter-partition similarity is the lowest

  11. Introduction-Cluster • Observation 1: • The number of Higher-Length paths in G is large for pairs of vertices lying in the same dense cluster • Small for pairs of vertices belonging to different clusters

  12. Introduction-Cluster • Oberservation 2: • A Random Walk in G that visits a dense cluster will likely not leave the cluster until many of its vertices have been visited

  13. Definitions • nxn Adjacency matrix A. • A(i,j) = weight on edge from i to j • If the graph is undirected A(i,j)=A(j,i), i.e. A is symmetric • nxn Transition matrix P. • P is row stochastic • P(i,j) = probability of stepping on node j from node i = A(i,j)/∑iA(i,j) • nxn Laplacian Matrix L. • L(i,j)=∑iA(i,j)-A(i,j) • Symmetric positive semi-definite for undirected graphs • Singular

  14. 1 1 1 1/2 1 1 1 1/2 Definitions Adjacency matrix A Transition matrix P

  15. 1 1/2 1 1/2 What is a random walk t=0

  16. 1 1 1/2 1/2 1 1 1/2 1/2 What is a random walk t=1 t=0

  17. 1 1 1 1/2 1/2 1/2 1 1 1 1/2 1/2 1/2 What is a random walk t=1 t=0 t=2

  18. 1 1 1 1 1/2 1/2 1/2 1/2 1 1 1 1 1/2 1/2 1/2 1/2 What is a random walk t=1 t=0 t=2 t=3

  19. Probability Distributions • xt(i) = probability that the surfer is at node i at time t • xt+1(i) = ∑j(Probability of being at node j)*Pr(j->i) =∑jxt(j)*P(j,i) • xt+1 = xtP= xt-1*P*P= xt-2*P*P*P = …=x0 Pt • What happens when the surfer keeps walking for a long time?

  20. Flow Formulation • Flow: Transition probability from a node to another node. • Flow matrix: Matrix with the flows among all nodes; ith column represents flows out of ith node. Each column sums to 1. 1 2 3 Flow 0.5 0.5 Matrix 1 2 3 1 1 20

  21. Motivation behind MCL • Measure or Sample any of these—high-length paths, random walks and deduce the cluster structure from the behavior of the samples quantities. • Cluster structure will show itself as a peaked distribution of the quantities • A lack of cluster structure will result in a flat distribution

  22. Important Concepts about MCL • Markov Chain • Random Walk on Graph • Some Definitions in MCL

  23. Markov Chain • A Random Process with Markov Property • Markov Property: given the present state, future states are independent of the past states • At each step the process may change its state from the current state to another state, or remain in the same state, according to a certain probability distribution.

  24. Markov Chain Example

  25. Random Walk on Graph • A walker takes off on some arbitrary vertex • He successively visits new vertices by selecting arbitrarily one of outgoing edges • There is not much difference between random walk and finite Markov chain.

  26. Some Definitions in MCL • Simple Graph • Simple graph is undirected graph in which every nonzero weight equals 1.

  27. Some Definitions in MCL • Associated Matrix • The associated matrix of G, denoted MG ,is defined by setting the entry (MG)pq equal to w(vp,vq)

  28. Some Definitions in MCL • Markov Matrix • The Markov matrix associated with a graph G is denoted by TG and is formally defined by letting its qth column be the qth column of M normalized

  29. Example

  30. Explanation to Previous Example • The associate matrix and markov matrix is actually for matrix M+I • I denotes diagonal matrix with nonzero element equals 1 • Adding a loop to every vertex of the graph because for a walker it is possible that he will stay in the same place in his next step

  31. Example

  32. Markov Cluster Algorithm • Find Higher-Length Path • Start Point: In associated matrix that the quantity (Mk)pq has a straightforward interpretation as the number of paths of length k between vp and vq

  33. Example-Associate Matrix MG (MG+I)2

  34. Example- Markvo Matrix MG

  35. Example-Markov Matrix

  36. Conclusion • Flow is easier with dense regions than across sparse boundaries, • However, in the long run, this effect disappears. • Power of matrix can be used to find higher-length path but the effect will diminish as the flow goes on.

  37. Inflation Operation • Idea: How can we change the distribution of transition probabilities such that prefered neighbours are further favoured and less popular neighbours are demoted. • MCL Solution: raise all the entries in a given column to a certain power greater than 1 (e.g. squaring) and rescaling the column to have the sum 1 again.

  38. Example for Inflation Operation

  39. Definition for Inflation Operation

  40. Apply Inflation Operation to the previous Markov Matrix

  41. Inflation Effects

  42. MCL Opeartions • Expansion Operation: power of matrix, expansion of dense region • Inflation Operation: mention aboved, elimination of unfavoured region

  43. The MCL algorithm Input: A, Adjacency matrix Initialize M to MG, the canonical transition matrix M:= MG:= (A+I) D-1 Enhances flow to well-connected nodes as well as to new nodes. Expand: M := M*M Increases inequality in each column. “Rich get richer, poor get poorer.” Inflate: M := M.^r (r usually 2), renormalize columns Prune Saves memory by removing entries close to zero. No Converged? Yes Output clusters 43 Output clusters

  44. Multi-level Regularized MCL Run R-MCL to convergence, output clusters. Input Graph Input Graph Coarsen Run Curtailed R-MCL,project flow. Intermediate Graph Intermediate Graph Initializes flow matrix of refined graph Coarsen . . . . . . Run Curtailed R-MCL, project flow. Coarsen Captures global topology of graph Faster to run on smaller graphs first Coarsest Graph 44

  45. Markov Cluster Algorithm

  46. MCL Result for the Graph

  47. An Striking Example

  48. Striking Animation • http://www.micans.org/mcl/ani/mcl-animation.html

  49. Mapping nonnegative idempotent matrces onto clusters • Find attractor: the node a is an attractor if Maa is nonzero • Find attractor system: If a is an attractor then the set of its neighbours is called an attractor system. • If there is a node who has arc connected to any node of an attractor system, the node will belong to the same cluster as that attractor system.

  50. Example Attractor Set={1,2,3,4,5,6,7,8,9,10} The Attractor System is {1,2,3},{4,5,6,7},{8,9},{10} The overlapping clusters are {1,2,3,11,12,15},{4,5,6,7,13},{8,9,12,13,14,15},{10,12,13}

More Related