Exploring Small World Models in Social Networks

MIS 644Social Newtork Analysis2015/2016 Spring Chapter 6 III The Small World Model

Outline • Introduction • Small world model

Introduction • One of the least well-understood - real networks • trnsitivity: propensty of two neighbors of a vertex being neighbors of one another • Neither RG or CM nor network growth models • generate significant lvvel of transitivity • Measured by clustering coefficients • E.g. As n becomes large – CC vanishes • Orders of magnitudes smaller then observed for real nets

A simple triangular latice has a trnsitivity • # of triangles = 2 n • C(6,2) =15 connected triples for each vertex • 0.4 competible with many real social networks • Not depends on size of the network

Fig. 15.1 of N-N • A triangular lattice. Any vertex in a triangular lattice, such as the one highlightedhere, has six neighbors and hence pairs of neighbors, of which six are connected by edges,giving a clustering coefficient of = 0.4 for the whole network, regardless of size

Another model with high CC • Fig. 15.2a of N-N • Vertices – on a one dimensional line • connectd to c nearest vertices – c being even

A simple one-dimensional network model. • (a) Vertices are arranged on a line andeach is connected to its c nearest neighbors, where c = 6 in this example. • (b) The same networkwith periodic boundary conditions applied, making the line into a circle

Traversing a “triangle” in our circle model means taking two steps forward around the circle andone step back. • C(c/2,2) total # of possibilities for two steps • # ways of observing the target c/2 steps • forward = (1/2)(c/2)(c/2-1) = (1/4)c(c/2-1)

# of connected triples centered on each vertex c/2=c(c-1)/2 • Total # connected triples: nc(c-1)/2 • C: • As c varied from 2 to infinite • CC varies from 2 to ¾ • Net depends on n • Unrealistic • Regular networks with DD = c

A more serious problem • Large worlds: - do not display small world effect observed in most real networks • The shortest path distances between most pairs of vertices is small (a few steps) even for notworks of billions of nodes • acquaintance network of entire World population

The shortest distance between two vertices in the circle model: • fastest move in a ring c/2 spacings • Two vertices m spaces apart can be connected by a path of 2m/c • Averaging over all possible m from 0 to n/2 • gives: n/2c • e.g.: for the acquaintance net of world pop • n - O(109), c - O(103), l - millions • measured l – 6 to 10

By contrast the RG capture the SW effect rather well • As with many other network models CM pass • Average shortest path – ln n/ln c • For the acquaintance network – 9/3 =3 • But has an unrealistically low CC • The two models: RG and simple circle (SC) • Each capture one property of real nets • RG – SW effect , SC – CC • Small world (SM) by Wattsw and Stogatz

SWM interpolat between SC and RG • By moving or rewiring edges from cicle to random possitions • Starting from a CM of n vertices each having c edges • Go through each of the edges in turn • With some prob p: • remove that edge and • Replace with one joining two vertices choosen uniformly at random - shortcuts

The parameter p controls the interpolation between the CM and RGM • p=0: no edges are rewired – original CM • P=1: all edges are rewired - RGM • For intermediate values of p: • Networks in between • For p=0, the SWM shows clustering (c>2) but not SW effect • For p=1: reverse -RGM • For modest values of p both high CC and SW efect

For analytical tracability • Variant of the original model: • Edges are added randomly • No edges are removed from the circle • p – same as in the original model • For every edge with independent prob p • add a shorcut between vertices choosen uniformly at random

Downside: • for p=1 – not RG • GR+original SC • Most interst p small • The two modesl are hardly difer • Small # edges around the cicle • absent in the orignal model

Two versions of the small-world model. • (a) In the original version of the smallworldmodel, edges are with independent probability p removed from the circle and placedbetween two vertices chosen uniformly at random, creating shortcuts across the circle as shown. Inthis example n = 24, c = 6, and p = 0.07, so that 5 out of 72 edges are “rewired” in this fashion. • (b)In the second version of the model only the shortcuts are added and no edges are removed from the circle.

Fig. 15.3 of N-N

Degree Distribution • Degree of a vertex: c #shortcut edges • For each of non-shortcut edges • #: nc/2 • Add a shortcut with prob p • There are npc/2 shortcuts on average • ncp ends of shortcuts • cpn/n = cp shortcuts end in any vertex on average • s vertex Poisson distributed • with a mean of cp

Total degree of a vertex: k = c+s • Putting s = k-c • Fig 15.4 of N-N • c=6,p=1/2 • Not mimic the DD of real networks • But the model does not intend to mimic that

Fig. 15.4 of N-N • The degree distribution of the small-world model. The frequency distribution ofvertex degrees in a small-world model with parameters c = 6 and p = 1/2

Clustering Coefficient • The CC is givenby • # of triangles and connected triples in the metwork • # triangles: • The circle does not change • # triangles in the cicle: nc(c/2-1)/4 • Some new triangles by the shortcuts: • Vertex pairs c/2+1 to c by one or more paths of length two - if by shortcuts also - triangles

# of such paths of length two  n • average #of shortcuts in the SWM: ncp/2 • C(n,2):n(n-1)/2 places they can fall • any particular pair of vertices is connected with prob: • just cp/n whn n , • # paths of length two compleated by shortcuts to form trianges  n x cp/n = cp – constant • For large network size they can be neglected compaird to triangles from CM – O(n)

triangles can be formed from two or three shortcuts • Those turn out to be neglegible in number • Leading order in n • # of triangles in the SWM - # of triangles in the CM = (nc)(c/2-1)/4

# of connected triples: • All connected triples in the CM present in SWM • nc(c-1)/2 • Triples from shortcuts combining with an edge in the circle • # shortcuts: ncp/2 , • c edges – they can form a triple – each of two ends • Total: (ncp/2) x c x 2 = ncp: - connected triples

triples by pairs of shortcuts: • If a vertex connected to m shortcuts • # triples of shortcuts: C(m,2) = m(m-1)/2 • averaging over Poisson distribution of m • With a mean cp • expected # of conected triples centered at a vertex: c2p2/2 • total of nc2p2/2 for all vertices • Expected # of triples of all kind: • nc(c-1)/2 + nc2p + nc2p2/2

Substituting into clustering eq • for p=0, same asfor the CM • as p increases - becomes smaller • when p=1 – minimum value of : • (3/4)(c-2)/(4c-1) • e.g., when c=6 CC: 3/23 = 0.130

Contrast with original WS-SWM • edges are removed from the circle • CC 0 as n   • when p=1 • Since the network – RG • Fig 15.5 of N-N shows • CC as a function of p for SWM, for c=6

Clustering coefficient and average path length in the small-world model. • Thesolid line shows the clustering coefficient, Eq. (15.7), for a small-world model with c = 6 and n =600, as a fraction of its maximum value ,CCmax:0.6, plotted as a function of theparameter p. • The dashed line shows the average geodesic distance between vertices for the samemodel as a fraction of its maximum value ℓmax = n/2c = 50, calculated from the mean-fieldsolution, Eq. (15.14). Note that the horizontal axis is logarithmic.

Fig. 15.5 of N-N

Average Path Lengths • Calculating the average path length or mean geodesic path distances between pairs of vetrices in SWM - • harder than DD or CC • No exact expression • Some approximate expressins • by simulations – reasonaby accurate

Known about path lenghts • How they scale with model parameters • Simple SWM with c=2: • each vertex is connected to its immediate neighbors • Argument: • distane covered by an edge – one unit: meter • What other quantities in terms of – length • distance around the whole cicle :n

Mean distance between the ends of the shortcuts • suppose s shortcuts – 2s ends • s= ncp/2 • average distance  between end around the circle:  =n/2s • Once specifiy n and  - specify the entire model • from n, to s to p, when c=2

Ratio of l to n • l: average shortest path • as function of n and  - specify the model • ratio of two distances - dimensionless • one such dimensionless combination n/  • F(x) does not depends on any of the parameters – universal function in scaling • Hence mean geodesic path in SWM wih c=2

For larger c • c  lenghts of shortest paths (SP)between vertices • Keep everything the same – n, s • s from 2 to 4  halve SPs • If the SP includes a shortcuts – distance does not change • If density of shortcuts low – most paths are on ciicles • For c length of the path  by c/2 • For low densty of shortcuts

Alternatively • s = npc/2 • l= 2(n/c)F(npc) • Absorbing the leading factor of 2 into the functional form • definng f(x)= 2F(x) • l= (n/c)f(npc) • How average path lengh depends on parameters • n,p,c for low values of shortcut density

We do not know the form fo f(x) • Numerical simulation • generate SW metworks • measure mean distance l – between vertices • Many runs with different parameters • the combination cl/n same function of npc • Fig 15.6 of N-N • Many networks – all roughly the same function

Scaling function for the small-world model. • The points show numerical results forcℓ/n as a function of ncp for the small-world model with a range of parameter values n = 128 to 32768 and p = 1 × 10−6 to 3 × 10−2, and two different values of c as marked. • Each point is averagedover 1000 networks with the same parameter values. The points collapse, to a reasonableapproximation, onto a single scaling function ƒ(ncp) in agreement with Eq. (15.13). • The dashedcurve is the mean-field approximation to the scaling function given in Eq. (15.14)

Fig. 15.6 of N-N

Another approach • Calculate f(x) approximately • Series, distributional or mean-field methods • A mean field approximation : • becomes exact when # shortcuts is very small or very large • In between around x=1 – approximate • Show as the dashed line in Fig 15.6 of N-N • agrees well with numerical resutls at the ends • less in the middle • Enough to prove that SWM expalins SW effect

for large velues of x: x >>1 – npc >>1 • npc : 2 x # of shortcuts • average of l  logarithmically with n – very slowly • for fixed p and c • n very large • average l: remains small • SW effect

# shortcuts not (# shortcuts per vertex) large • small density of random shortcutrs to a large networks – SW behavior • Why most real networks show SW effects • Have long range connections with some randomness • Very few are regular or with short range connctions

SWM shows not only • SW effect but displaying clustering • # shortcuts = npc/2 •  when n holding p and c fixed • CC independent of n • retains (non-zere) as n • In this limit - simultaneously • non-zere clustering and SW effect • The two are not with odds with one antoher

Fig 15.5 of N-N • plot of approximate values of l s a function of p • for a SWM n=600,c=6 • Along with CC • substantial values of p • the value of l is low • the value of CC is high

Clustering coefficient and average path length in the small-world model. Thesolid line shows the clustering coefficient, Eq. (15.7), for a small-world model with c = 6 and n =600, as a fraction of its maximum value ,CCmax:0.6, plotted as a function of theparameter p. The dashed line shows the average geodesic distance between vertices for the samemodel as a fraction of its maximum value ℓmax = n/2c = 50, calculated from the mean-fieldsolution, Eq. (15.14). Note that the horizontal axis is logarithmic. 45

Fig. 15.5 of N-N 46

Exploring Small World Models in Social Networks