280 likes | 394 Views
The impact of sampling and network topology on the estimation of social intercorrelations. Kaibin He. Main idea. Network Topology. Sampling methods. Estimation of Social Intercorrelations. How to estimate intercorrelation ?. Spatial Model. independent variable dependent variable ,
E N D
The impact of sampling and network topology on the estimation of social intercorrelations Kaibin He
Main idea Network Topology Sampling methods Estimation of Social Intercorrelations
Spatial Model • independent variable • dependent variable , ,? an N1 vector, , is the variance parameter, is an NN identity matrix, is an constant,
Spatial Model • W is an NN matrix reflecting the network. If i and j are connected others is the total number of connections that member i has in the network.
Network Topology Extensively studied in literature: • Power-law network • WS network More realistic: • Power-cluster network • Real social network: Flickr.com
Scale-free?power-law distribution • A scale-free network is a network whose degree distribution follows a power law, at least asymptotically. • That is, the fraction P(k) of nodes in the network having k connections to other nodes goes for large values of k as • Where is a parameter whose value is typically in the range 2 << 3, although occasionally it may lie outside these bounds. • Source: wikipedia.
Small world • A small-world network is a type of mathematical graph in which most nodes are not neighbors of one another, but most nodes can be reached from every other by a small number of hops or steps. • Specifically, a small-world network is defined to be a network where the typical distance L between two randomly chosen nodes (the number of steps required) grows proportionally to the logarithm of the number of nodes N in the network, that is: • Source: wikipedia
WS network Source: Watts and Strogatz, 1998
Power cluster network? • The power cluster network has been proved a high-clustering network with power law degree distribution both analytically and numerically.
Flickr.com • Flickr.com allows a user to establish friendship with any other user by adding the person’s name to his or her “Contact” list, which creates the link or connection among others.
Sampling method • Random sampling • Snowball sampling • Forest-fire sampling
Random sampling • Method: • Randomly select the nodes or links. • Property: A random sample does not preserve the topology of a network, especially when the sample is only a small proportion of the population.
Snowball sampling • Method: • 1\ randomly choose a member, i; • 2\ choose all the members that are connected to iand denoting this set as {i}; • 3\ choose all the members connected to {i}, excluding duplicated members; • 4\ repeat till the total number reaches n.
Forest-fire sampling • Method: • 1\ randomly choose a member i; • 2\ generate a random number r that is geometrically distributed with mean pf/(1-pf)?, where (1-pf) is the parameter in the geometric distribution. • 3\ if r=0, add the new member I and randomly select another r; • 4\ if r!=0, select member i’s up to r outlinks to members, denoted as j1,j2,…,jr, and add them to sample?; • 5\ repeat till the sample number reaches n.
Findings • 1\ ᵖtends to be underestimated with rare exceptions. (?table 1C) • most important • 2\Slight trade-off between snowball and forest-fire method. • 3\ Interaction between network topology and sampling methods.
Why is biased? • VS
Will the bias affect the inference of alpha and beta? • Method: simulation instead of mathematics • Results: SE of alpha and beta are close between two v-cov matrix are specified, and there is no clear pattern whether the SE will be overestimated or underestimated.
The estimation bias is caused by sampling methods not being able to preserve the topology of the entire network. • How to demonstrate this in a clear way?
Method to obtain unbiased estimation • Step1: recover • Step2: construct new