260 likes | 452 Views
Analysis and Modeling of Social Networks. Foudalis Ilias. Introduction. Online social networks have become a ubiquitous part of everyday life Opportunity to study social interactions in a large-scale worldwide environment Why model such networks? Understand their evolution and formation
E N D
Analysis and Modeling of Social Networks Foudalis Ilias
Introduction • Online social networks have become a ubiquitous part of everyday life • Opportunity to study social interactions in a large-scale worldwide environment • Why model such networks? • Understand their evolution and formation • Improve current systems and build better applications • Advance the state of the art in closely related fields (such as diffusion of information)
Social and Information Networks • Social Networks • Mainly undirected graphs • Connect people • Nodes with more similar degrees (limited capacity of social ties) • Information Networks • Tend to be directed graphs • Connect web pages or other units of information • Few nodes with extremely large number of incoming links
Statistical characteristics of social networks • Exhibit small diameter and small average path length • Also known as the “small world phenomenon” • Clustering coefficients tend to be larger • Distribution of nodes tend to exhibit fat tails • High degree nodes tend to be connected with other high degree nodes • Neighbors of a high degree node are less likely to be connected with each other
Related work • Internet • Wats and Strogatz (1998), simple model that exhibits small world characteristics • Barabasi and Albert (1999), preferential attachment models, power law distributions • Kumar et al. (2000), link copying model, power law distributions • Klemm, Eguiluz (2002), preferential attachment with fertile nodes, small world properties • Social Networks • Jackson and Rogers (2006), random meetings and local search • Kumar et al. (2006), preferential attachment, different types of nodes
Our algorithm, General Description • People by default are part of certain groups • A person will have a high chance to connect to people in the same group • People also make connections to people they meet at random • To capture this effect we introduce random walks • In a random walk a person will have a higher chance to connect with social or famous persons • As time passes “older” persons will do less random walks
Our algorithm, Group Formation • First Pass Clique Formation 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
Our algorithm, Group Formation • Second Pass Clique Formation 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 3 4 1 2 6 5 3 4 1 2 6 5 6 3 2 1 5 4 6 3 2 1 5 4
Our Algorithm, Group Formation • Clique generation (Imaginary graph) • For FIRST_PASS times • While the total number of nodes in cliques are less than N • Get m nodes and put them in a clique • m will be chosen according to a power law distribution with exponent γ • Let M be the number of cliques generated from the first pass • For M times • Get m nodes and put them in a clique • m will be chosen according to a power law distribution with exponent γ
Our Algorithm, Graph Generation • Connection to groups • At each time step t a node will enter the graph • The node will try to connect to all nodes with id < t with probability:
Our Algorithm, Graph Generation • Random walks • All nodes with id ≤ t will try RW_TIMES to start a random walk with probability 1/(t-id+1) • During the random walk node i will try to connect with node j with probability sociali*qualityj • At each step the probability to stop will be (1 – 1/DEPTH)
Metrics 1/3 • Degree distribution • Description of the relative frequencies of nodes that have different degrees • Diameter and average path length • Diameter is the largest distance between any two pairs of nodes in the network • Distance is defined as the length of the shortest path between two nodes • Average path length is the average over all the shortest paths • Betweenness Centrality • Gives information on how important a node is in terms of connecting other nodes • Computed as: • Where Pi(k,j) denotes the number of shortest paths from k and j that i lies on
Metrics 2/3 • Clustering • Indicates whether two neighbors of the same node are also connected with each other • Clustering coefficient for each node i is: • Assortativity coefficient • In real networks the degrees in the endpoints of any edge tend not to be independent • This feature can be captured by computing the assortativity coefficient: • Where m is the average degree of the graph
Metrics 3/3 • Neighbor degree distribution • Average degree of the nearest neighbors of a vertex with degree k: • Where P(k’|k) is the conditional probability that a node with degree k will be connected to a node with degree k’ • Positive assortativity is translated as an increasing knn(k) function
Data Description • Facebook data from 4 large U.S. universities • Number of nodes is small compared to the real Facebook graph • Nodes represent a closed society • Much better way to analyze a social network • Large sample presents disadvantages • Difficult to analyze • How good is the sampling?
Results and Comparisons 1/5 • Average degree does not depend on the size of network
Results and Comparisons 1/5 • Average degree does not depend on the size of network • All networks present positive assortativity • High degree nodes tend to connect with other high degree nodes
Results and Comparisons 1/5 • Average degree does not depend on the size of network • All networks present positive assortativity • High degree nodes tend to connect with other high degree nodes • High clustering coefficients • Average degree does not depend on the size of network • All networks present positive assortativity • High degree nodes tend to connect with other high degree nodes • High clustering coefficients
Results and Comparisons 1/5 • Average degree does not depend on the size of network • All networks present positive assortativity • High degree nodes tend to connect with other high degree nodes • High clustering coefficients • Small diameter and average path length
Results and Comparisons 2/5 • Increasing knn(k) functions • As expected due to positive assortativity • Nodes with high degree tend to be connected to each other
Results and Comparisons 3/5 • Small betweenness values • Almost independent of node degree • No central authorities • Information flows are distributed
Results and Comparisons 4/5 • No clear power law phenomena • On the log scale we see fat tails as expected
Results and Comparisons 5/5 • Overall clustering is a simple summary characteristic • Clear clustering pattern emerges • High node degrees have small clustering • Neighbors of high degree nodes less likely to be connected to each other
Current Work • Analysis of information networks • Very large datasets from • LiveJournal, YouTube, Flickr • As expected, different structure • Clear power law distributions • Introduction of a new metric: • How close is pagerank with in-degree?
Future Work • Make our model mathematically tractable • Graph evolution over time • Densification laws • Shrinking diameters • Community detection and formation • New focus on coevolutionary models
Thank you! aiw.cs.aueb.gr/projects.html