1 / 22

Clustering And Community Formation

Clustering And Community Formation. By: Pinakpani Shah. Many systems can be described as network, which creates high connections between units system is made of. What is Clustering/community?

rupali
Download Presentation

Clustering And Community Formation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clustering And Community Formation By: Pinakpani Shah

  2. Many systems can be described as network, which creates high connections between units system is made of. • What is Clustering/community? - Large unit of network densely connected to each other compare to rest of the network is called a cluster or community. • Small complete sub graphs are used as motifs and distribution and clustering properties are used to identify such communities. • Essential feature of community is that each node should be reachable by subset of nodes.

  3. Uncovering the Overlapping community structure of complex networks in nature and society

  4. Communities can be of friends, relatives, hobby, games etc… • This social structure is built by considering a random individual and creating a network of his/ her friends. • This network is called a random network. • In this kind of network, it is possible that a single node can be a part of multiple network. • To identify such community one of the best method used is divide the network into small groups.

  5. Such method will make a node to be part of only one network. • Overlapping community is a crucial part of the community. • To get over this problem there is one more method is called clique percolation. • Erdos-Renyi uncorrelated random graphs are used as a prototype. • In this graph p = pc = 1/N

  6. p = probability that two vertices are connected to each other. • N = number of nodes in a network • pc= threshold percolation or critical point. • k-clique is a complete sub-graph of k vertices. • It is used to find the overlapping communities. • Should allow overlapping, should not be restrictive, Should be based on the density of links and there should not be cut nodes or edges are a basic requirements to find the overlapping communities.

  7. Clique percolation in Random Networks

  8. Both graphs are giant connected components as edge probability is much larger then the threshold percolation value which is (0.05) • On the left graph p is 0.13 which is less then the percolation threshold (0.16 from above equation) while on the right graph p is above threshold value.

  9. K-clique are adjacent if they share k-1 nodes. • K-clique chain is a sub graph which is a union of adjacent k-cliques. • Two k-cliques are connected if they are part of k-clique chain. • Union of all k-cliques that are k-clique connected to a particular k-clique is called k-clique percolation cluster. • Above graphs is a 3-clique percolation cluster.

  10. k-clique percolation cluster is like an edge percolation cluster in k-clique adjacency graph. • k-cliques are represented as nodes and there will be an edge between them if they are adjacent. • Community structure depends on the value of k, as value of k is increased community becomes disintegrated and smaller. • A k-clique template is an object of original graph, can be placed onto any k-clique of graph. • Moving a particle from one vertex to another with and edge is called a rolling a k-clique template.

  11. k-clique template can be placed to any k-clique and by rolling it’s one particle and keeping other k-1 particle fixed. • k-clique percolation cluster of a graph are all the sub graphs that are explored by k-clique template. • k-clique percolation cluster can be considered as a community. • Different values of k in this will give different strength of communities.

  12. Properties of the community structure are: • Every member can be reached by every subset of well connected nodes. • Community share nodes with each other means Overlapping.

  13. Size and age are to basic quantities to define dynamically changing community. • Both size s and age t are correlated. • To quantify relative overlap of two states of the same community auto-correlation function C(t) is used. • Intersection gives the number of common nodes in two different time stamps. • Union gives the total number of nodes in two different time stamps.

  14. To calculate stationarity of the community following equation is used: • Gives the average correlation between states. • T(0) denotes the birth and t(max) denotes the extinction of the community.

  15. Now a days web has become an advantage to information access. • Contents on web are difficult to analyze as they are decentralized and unorganized. • Because of focused search engines, content filtering and text based searching web community identification is necessary. • On the web communities web pages are treated as nodes while the hyperlinks on the pages are the edges between the nodes. • Web communities are collection of web pages such that each has more hyperlinks within the community than outside of it.

  16. This communities are collectively self organized by the independent author. • It is compare with the maximal flow problem. Graph edges with capacity and to find the maximal way from source vertex to the sink vertex. • Seed vertices are the source vertices.

  17. Self Organization and Identification of Web Communities

  18. Approximate-Flow-Community • Takes the seed vertices as the input and crawls to the finite depth which includes the inbound and outbound hyperlinks. • Uses the Exact-Flow-Community method, ranks the sites and add the non-seed member to the seed set. • May initially a small community is identified but as new seeds are added new large communities can be found.

  19. Self Organization and Identification of Web Communities

  20. Exact-Flow-Community • Source s is added with infinite capacity edges and routed to all vertices in seed set S. • All existing edges are made bidirectional with capacity value set to constant k. • All the vertices except source, sink and seed are routed to artificial sink vertex with the unit capacity.

  21. If we consider biology then protein has it’s own network. • Any cellular tasks are not performed by any individual protein, but group of functionally associated proteins. • These modules are densely connected with each other and creates a overlapped network. • Cfinder is the stand alone application used to view the overlapping of gene networks. • Input to this application is file containing two columns of strings and third column of weight of this link.

  22. References: • Cfinder : Locating cliques and overlapping modules in biological network. • Clique percolation in random networks • Uncovering overlapping community structure of complex networks in nature and society. • Quantifying Social group evolution • The critical point of k-clique percolation

More Related