MIS 644 Social Newtork Analysis 2017/2018 Spring

MIS 644Social Newtork Analysis2017/2018 Spring Chapter 6-B Models of Network Formation

Outline • Introduction • Preferential Attachment • Vertex Copying Models • Network Optimization Models

Introduction • RNM in Chapter 6 I – structural features • giant and small components, DD, average path length • modeling processes on networks • network resilience, spread of information or disieases on contact networks • parameters are fixed externally • n: # vertices, m: # of edges ,DD

Example • DD – power law • generate a network with a DD of power law • investigate its structural characteristics • analytically or computationally • But no explanation • why the network should have a power-law DD • Different kind of models: • offer such an explanation

Generative Network Models • Generative network models: • model mechanisms by which networks are created • hypothesized generative mechanisms • what structure they produce? • Compre structure generated with the obsereved real network’s structure • suggestion – not a proof • similar mechanisms at work in the real networks

Example Models • preferential attachment – generate power law DD • generaliztions of PA • vertex copying models • models based on optimization

Outline Introduction Preferential Attachment Vertex Copying Models Network Optimization Models 7

Preferential Attachments • many real networks DD approximately power-law in the tail • E.g.: Internet, www, citation and some social networks • emprical data - power-law – interesting underlying processes • Price in 1970s: a simple and elegant network formation model gives rise to a power-law DD

Price’ work • Citation net of papers having authored an important early paper – observe a PL • H. Simon’s work – economic data - wealth distributions • Explanation: people already have money gain money at a rate proportional to the money how much they have • rich get richer – power law distribution

Assumptions of PM • new papers apear citing existing papers • directed networks - acyclical • no papers disapear

new papers apear citing existing papers • c: average # papers cited by a new paper • average out degree • probability proprtional to # of citations the paper got • # of citations a paper gets increases with the citations it already had

at the beginning a paper has no citatitons • pure proportinality does not work • citations + a constant (a) – free citations • it starts off with (a) citations • another interpretation: • a certain fraction of citations goes to papers choosen uniformly at random • without regard to how many citations it currently has

Initial conditions • Specify the starting state • how to initialize the model • large n – not depends on intial conditions • but start with a set of papers with no citations • acyclical – no loops • not suitable for wwww • In degree distribution – large n • parameters c, a • directed and undirected networks

notation: in-degree of a vertex kini, - qi, • pq(n): fraction of vertices with in-degree q, whem the netowrk contains n vertices • what happens when one new vertex is added? • one of the citations made by that vertex • to a vertex i  qi + a • probability: • average in degree: q = n-1iqi, • average out-degree: c = q

expected # of new ciatations by a new paper to paper i: c x prob of ciating i • there are npq(n) vertices of degree q • expected # of citations to all vertices with degree q: • master equation – evolution of in-degree distributions: • When a new vertex is added, expeced # vertices in degree q-1 to q

expeced # vertices in degree q to q+1 • # vertices with in-degree q after adding the (n+1)th vertex: (n+1)pq(n+1) • first term in RHS: # vertices previously id q

q = 0 • newly added vertex has degree 0 • no vertex has degrees less than 0 • n: asymphtotic form of in-DD • notation pq = pq()

rearraging the second • for q >=1 • calculate pq iteratively from q0, • p2,

for general q • The Gamma function: • with the propertie:

for x > 0 iterating this • We can write: • The Euler’s Beta function • d

multiplying and dividing by  (2+a/c) = (1+a/c) (1+a/c) • or

q in the first argument of the upper Beta function • Stirling approximation for gamma function • for large q and fixed a and c

exponent >= 2 • if c=a  = 3 • relevant to emprical data • Price’s model – simplifying assumptions • simplified and incomplete ignoring • quality and relevance of papers • development and fashions im the field • repugtation of journal and author

Simulation of Price Model • Simulating netweorks • implementing rules • check analytical solutions • generate real examples of networks • metrics of real networks • E.g.: DD, clustering coef., path lengths • parameters of the simulated netork model • what are the best values of parameters leading the observed metrics

Statistics • observed data • simple models – linear regression • estimate pareters • make inference – form and tezt hypothesis • The same methodology with simulation

A simple simulation • out-degree of vertices fixed – c • selection of vertices that receive edges • as a function of their in-degrees • random but not uniformly

Fast way of simulating the Price Model • i: probability of receiving an edge for node i • with probability  , attach the edge to a vertex proportional to its in-degree • with 1- attach the edge to a uniformly selected vertex – 1/n • total probability: • =c/(c+a)N-N pp 497-8

Figure 14.1: The vertex label list used in the simulation of Price’s model. The list (bottom) contains one entry for the target of each edge in the network (top). In this example, there are three edges that point to vertex 1 and hence there are three elements containing the number 1 in the list. Similarly there are two containing the number 2, because vertex 2 is the target of two edges. And so forth

Figure 14.2: Degree distribution in Price’s model of a growing network. (a) A histogram of the in-degree distribution for a computer-generated network with c = 3 and a = 1.5 which was grown until it had n = 108 vertices. The simulation took about 80 seconds on the author’s computer usingthe fast algorithm described in the text. (b) The cumulative distribution function for the samenetwork. The points are the results from the simulation andthe solid line is the analytic solution,Eq. (14.34)

The Model of Barabasi and Albert • BA model – undirected network • vertices are added one by one • suitablelly choosen set of vertices • connections – undirected • # of connection by each vertex – c (fixed) • c being an integer • connections to vertices  their degree ki, • vertices are only added (not removed) • no vertices with k < c, smallest degree k=c

DD of the BA Model • can be solved by a master equation from scrach • equivalent to a special case of the Price’s model • imagine – give each edge added a direction • from the vertex just added to existing that the edge connects • convert into directed net – each vertex • out-degree: c • ki= qi+c, qi: in-degree as before • prob  ki,  c+qi, • Price’s model with a=c

in the limit of large q • The degree distribution is given by • the BA model generates a degree distribution with a power-law tail always has an exponent with = 3

BA model can be simulated • treting as a directed network • a=c the uniform prob = ½ • BA not require the offset parameter a • DD without using gamma or beta furnctions • never matches with real world exponents • as = 3

Extensions of PA Models • Extensions and generalizations of PA addressing • what heppends when details of model definition are varied • more faitful to how real newtorks behave • www links are added and removed and any time a link can be added not just the vertex is created • entire web page can disapear or apear • PA process can be non-linear indegree • not all vertices are equal • some pages are more interesting or imporant

Addition of extra edges • Price’s model bibliography • no edges are added after a paper is published • www is changing • links are added and removed • still has a power-law DD • Simple case: edges are added but not removed • generalization of BA model • vertices are added one by one • each started with c undirected edges • atached to vertex i with prob prortional to degree ki,

a second process is added to the model: • at each step some number w edges are added • with both ends attaching to vertices in proportion to their degree • when n vertices – n(c+w) total edges • w: average # of edges – can be non-integer • for every new vertex added: • c+2w new ends of edges to old vertices • two extra for each of w new extra edges

prob of attachment of any one of ends of edges to any vertex i: ki/iki, iki=2n(c+w) • pk(n): fraction of vertices with degree k when there are n vertices • # of vertices of degree k, receiving a new edge when one vertex added: • the master equation: • k > c • k =c

taking the limit for large n, with pk = pk() • rearranging these equations • where B(x,y) is the Eulers beta function • since B(x,y) goes as x-y for large x

DD has a power law with exponent  • for the special case of w=0 (no additional edges are addded) •  =3 BA model • w > 0: exponents in the range 2 <  < 3 • DD of www – directed net • Generalizations of the Price’s model

Removal of edges • Simple case: edges can be removed at any time but added only at the initial creation of a vertex • General case: adding and removing at any time • removal – uniformly at random • probability that any vertex i loses an edge when a single edge is removed • its two ends vanish • prob one of these ends attached to i:  total # of ends attaced to i: ki, • prob of i losing an edge: 2 ki/iki,

A vertex with degree c is added then • average of v edges are delated at random • c – v > 0 so # of edges grow • when there are n vertices: # edges: n(c-v) • master eq: • # of vertices with degree k increases: • whenever a vertex with degree k-1 gains an edge • decreases: • when a vertex with degree k gains a new edge

# vertices of degree k gaining an edge: • a new process : a vertex can lose an edge: • # of vertices with degree k increases: • whenever a vertex with degree k+1 loses an edge • decreases: • when a vertex with degree k loses an edge • # vertices of degree k losing an edge:

vertices can have any degree k >=0 • can lose all of their edges • different form BA with k >= c • master eq: kc • for k=c

can be combined • where kc is Kroniker delat =1 if k=c, 0 ow • exception k=0 term proportional withk-1 vanishes • put p-1(n) = 0 • applys for k>=0

w extra edges per vertex addition • c+w-v edges are added per new vertex • the master equation becomes: • the eq with only edge removel is a special case with w=0 • assumption net # of edges added > 0, v < c+w • taking the limit for large n pk = pk()

rigth hand side contains degrees of k-1,k,k+1 • not simply solve for pk in terms of pk-1, • Solution using moment generating functions • pk  k-, • exponent can take values < or > 2 • v=(1/2)c+w becomes infinite • DD not have power law

for v < (1/2)c+w • DD power-law with a very large exponent • for v > (1/2)c+w • non sensical solution with negative , • Vertex removel rather then edge • solution very similar • with an exponent depending on the vertex lost rate • diverging as the rate of loss approaching to rate of vertex addition

Nonlinear Preferential Attachment • prob that a new edge attachs to a vertex is linear in the degree of the vertex • reasonable at first place • attachment processes might not be linear • Emplrical evidence • Jeong et al – growth of several real networks • growth rate depends on network size as well • They restrict observations to a relatively short periods of time • measured rates plotted as a function of vertex degree

MIS 644 Social Newtork Analysis 2017/2018 Spring