770 likes | 790 Views
MIS 644 Social Newtork Analysis 2017/2018 Spring. Chapter 6 Network Models. Outline. Introduction Random Graphs Small World Model Network Growth Models. Introduction. how to measure structure of networks making sense of network data mathematical, statistical and computational
E N D
MIS 644Social Newtork Analysis2017/2018 Spring Chapter 6 Network Models
Outline • Introduction • Random Graphs • Small World Model • Network Growth Models
Introduction • how to measure structure of networks • making sense of network data • mathematical, statistical and computational • E.g., degree distrinution, centrality or comminity structures • Next question • network properties – degree distribution • what effects on the system? • Mathematical models of networks
Introduction (cont.) • models of structure of networks • mimic patterns of connections in real networks • understand implications of these patterns • models of processes taking place on networks • epidemics on social networks • search engines on the Web • build on top of network structure models • combining the two to shed light on • interplay between structure and dynamics • real networked systems
Random Networkss • E.g., degree distribution – power law • how structure and behavior of SF networks differ from the others? • In computer, create many SF and nonSF networks artificially and examine statistical preoperties • Random graph models • create networks with particular properties • random atherwise • E.g., degree distribution • Structure and dynamic processes on networks
Other Network Models • Generative models • the network grown – growth rules • how network structure arise in the first place • comparing results with real world • which growth processes are plausible • small world models • model the phenomena of clustering • exponential random graphs • match properties of observed networks as closely as possible
Outline • Introduction • Random Graphs • Small World Model • Network Growth Models
Random Graphs • In general a random graph • some parameters take fixed values • network is random in other aspects • E.g., • G(n,m): fix n:# of vertices, m:# of edges • place edges at random • simple graph – no self or multiple edges • uniformly at random • among n vertices and m edges
probability distribution over all graphs G • P(G) = 1/ • where is the total # of such graphs • average properties of the ensamble • average path length • 1 - analytically calculated exactly for large n • 2 - averages are good for typical properties • 3 – concentrated around the average
Some properties of G(n,m) • analyticaly calculated - large n • average # of edges : m • average degree k = 2m/n • other properties • not so easy to calculate analytically
G(n,p): fix probablityh of an edge between the vertices • # of edges is not fixed • each simple graph G appears with probabilithy • Erdös-Renyi model
Mean Number of Edges and Mean Degree • Expected # of edges: • # of graphs with n vertices and m edges • # ways of picking m from C(n,2) posible • each of these graphs with same prob P(G) • Total prob of drawing a graph with m edges: • binomial distribution • mean of binomial • expeced total edges = p X possible # of edges
Mean degree of G(n,p) • mean degree of G(n,m) is k = 2m/n • Mean degree: • denoted as c • c= (n-1)p
Degree Distribution • A given vertice is connectred with independent probability p to n-1 other vertices • prob of connecting to k other vertices and not connecting to the other n-k-1 vertices: pk(1-p)n-1-k, • there are C(n-1,k) ways of choosing these k vertices • binomial distribution
For large n • mean degree is constant as n becomes large • p = c/(n-1) for large n • as n • p = c/n • e = limn(1+1/n)n, limn(1-1/n)n = e-1, • (1-p)n-k-1 (1-p)n = (1-c/n)c(n/c) = e-c, • Poisson distribution
Clustering Coefficient • CC – transistivity in a network • probability that network neighbors of a vertex are also neighbors of each other • in G(n,p) all such probablities are p = c/(n-1) C = c/(n-1) • most real networks have high C • random network is different from real onces • as n C becomes 0
Giant Component • G(n,p) for p = 0 • no edges at all • completely disconnected • n seperate components of one vertex • p = 1 • n vertex clique • one component • size of largest component • p = 0, has size 1, independent of n • p = 1, has size n, proportional to n
Giant Component (cont.) • most networks have a large component that fills most of the network • E.g., Internet • random graph – simple model • not an accurate representation of real networks such as Internet • worth studing to understand world • How size of LC varies as p changes from 0 to 1 • sudden chasnge or phase transition • from costant to large size at a particular value of p
Giant Component (cont.) • A component whose size increases with n • giant component • calculate as n • u: average fraction of vertices not belonging to GC • or, probability that a randomly selected vertex do not belong to the GC • vertex i not conected via any edges • every other vertex j • a) i is not connected to j by any edge • b) i is connected to j but j GC
probability of • a) 1-p • b) pu • total probablity of not connected to GC via j 1 –p + pu • total probablity of not connected to GC via any of n-1 vertices: • c = p(n-1)
taking the logs • take exponents • S = 1 – u: fraction of a vetex in GC • not has an analytical solution
S = 1 – u. fraction of LC • size of thegiant component as a fraction of the size of the network in the limit of large network sizefor anygiven value of the mean degree c • Not have an analytic solution
Figure 12.1 of N-N Graphical solution for the size of the giant component. (a) The three curves in theeft panel show y = 1 - e-cSfor values of c as marked, the diagonal dashed line shows y = S, and theintersection gives the solution, S = 1 - e-cS. For the bottom curve there is only one intersection, at S = 0, so there is no giant component, while for the top curve there is a solution at S = 0.583 ... (vertical dashed line). The middle curve is precisely at the threshold between the regime where a non-trivial solution for S exists and the regime where there is only the trivial solution S = 0. (b) The resulting solution for the size of the giant component as a function of c
depending on the value of c there may be either one solution for S or two • if c is large enough (top curve) then there are twosolutions, one at S = 0 and one at S > 0. Only in this regime can there be a giant component • (
The transition between the two regimes corresponds to the middle curve in the figure and falls atthe point where the gradient of the curve and the gradient of the dashed line match at S = 0. Thatis, the transition takes place when • or
Setting S =0 the transition takes place at c=1 • The RG has a giant component only if c > 1 • At c <=1, S=0 no gient componant • There are two solutions • One :S =0 and the other at an S such that • 0 < S < 1
Another Formulation • Find a small set of connected vertices • Divide the set into its core and periphery • core: vertices connections to othes in the set • periphery: vertices at least one neighbor outside the set • Enlarge the set – connected with one edge • Old periphery becomes new core • New periphery
Figure 1.2 of N-N Growth of a vertex set in a random graph. (a) A set of vertices (inside the graycircles) consists of a core (dark gray) and a periphery (lighter). (b) If we grow the set by adding toit those vertices immediately adjacent to the periphery, then the periphery vertices become a partof the new core and a new periphery is added
Another Formulation (cont.) • n become exact • The new size snew = csold, • repeat the set again and again • each time by a factor of c • İf c>1 : grows exponentially and rich a giant component • İf c >=1: remains small • There is a GC: c>, laregst solution
Path Lengths • Small world effect: • Typical path lenghts tends to be short • RG model - diameter • longest geodesic distance in a component • diameter ln n
Justification • Average # vertices s steps away is cs, • Not very vertices to get n • cs n, or equivalently • s ln n / ln c • The diameter of the network is approximately • ln n / ln c • For RG approximate for real netwokrs
Example • Acquaintance network of world • n 7 billion • Suppose a person 1000 acquaintances
Problems with the Random Graph • Poisson RG – best studied network models • indide network structures of all kind • component sizes and diameter • simple and alalytically studied • all network phenomena • RG - severe shortcommings • in some ways unlike real networks
1 - No transitivity or clustering • C = c/(n-1) • as n C 0 • even for finite n of real networks • C by RN model is very small • E.g., the acquaintance network of all population of the world • n = 7 billion, c 1000 • RN gives c= 1000/7109 10-7, • estimated between 0.01 – 0.5
2 – Degree Correlations • degree correlations: • in RGs no degree correlation between adjacent vertices • in RNs degrees are correlated • Communities • in RNs – communities • in RGs – no such structures
3 – Degree Distribution • shpe of the degree distribution • RNs – right-skewed: most vertices low degree but small number of hubs • RGs – Poisson degree dist. not rigth-skewed any significant extend • E.g., DD of Internet in Fig. 12.7 of N-N • Poisson DD cannot expalin many characteristics of RNs • resilience, epidemic spreading, percollation • generalize the RG Model to non-Poisson DD
Fig 12.7 of N-N Degree distribution of the Internet and a Poisson random graph. The dark bars ithis plot show the fraction of vertices with the given degrees in the network representation of theInternet at the level of autonomous systems. The lighter bars represent the same measure for arandom graph with the same average degree as the Internet. Even though the two distributionshave the same averages, it is clear that they are entirely different in shape
Random Graphs with Generalized Degree Distributions • in classic RGs: pairs of vertices are connected with uniform probablity p • usefull but shortcommings: • DD: Poisson – not realistic • clustering coefficient • create more sophisticted RG models with arbitary DD and solvable for many of propreties for large n
The Configuration Model • RGs with general DD • Give vertice any DD • Poisson is restricted to RG • Different ways of defining • Two of them almost equivalent to G(n,m),G(n,p) • Configuration model – with a given degree sequence (DS) rather then a DD • exact degree of individual vertex in the model is fixed rather then a probability distribution • fixes #edges m = (1/2)ki, - G(n,m)
specifiy degree ki i=1,...,n • Create a RN with these degrees: • give each vertex i a ki “stubs” of edges • There are 2m = ki, stubs in total • choose two of these stubs uniformly at random • create an edge connecting them • choose another pair from 2m – 2 • connect them • so on until no stubs remains • Resulting netowrk: each vertex ki edges
Fig. 13.1 of N-N • The configuration model. Each vertex is given a number of “stubs” of edges equalto its desired degree. Then pairs of stubs are chosen at random and connected together to formedges (dotted line)
eny stub is GNM is equally likely to be connected to another • minor cathces: • even number of stubs 2m = ki, • sum of degrees add up to an even number • self edges and multiedges • nothing in the network generation process • an edge connects a vertex to itself • two vertices connected by more then one edge • Remove any such edges in the process
misleading • some real networks no have self-edges or multiedges • large n – average number of these edges constant
A Further Issue with CM • while all matchings with equal prob. • not mean that all networks appear with the same prob. • more then one matching can correspond to the same network
Figure 13.2: Eight stub matchings that all give the same network. This small network is • composed of three vertices of degree two and hence having two stubs each. The stubs are lettered to identify them and there are two distinct permutations of the stubs at each vertex for a total of eight permutations overall. Each permutation gives rise to a different matching of stub to stub but all matchings correspond to the same topological configuration of edges, and hence there are eight ways in which this particular configuration can be generated by the stub matching process.