1 / 28

The impact of sampling and network topology on the estimation of social intercorrelations

The impact of sampling and network topology on the estimation of social intercorrelations. Kaibin He. Main idea. Network Topology. Sampling methods. Estimation of Social Intercorrelations. How to estimate intercorrelation ?. Spatial Model. independent variable dependent variable ,

orpah
Download Presentation

The impact of sampling and network topology on the estimation of social intercorrelations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The impact of sampling and network topology on the estimation of social intercorrelations Kaibin He

  2. Main idea Network Topology Sampling methods Estimation of Social Intercorrelations

  3. How to estimate intercorrelation?

  4. Spatial Model • independent variable • dependent variable , ,? an N1 vector, , is the variance parameter, is an NN identity matrix, is an constant,

  5. Spatial Model • W is an NN matrix reflecting the network. If i and j are connected others is the total number of connections that member i has in the network.

  6. Example

  7. Network Topology Extensively studied in literature: • Power-law network • WS network More realistic: • Power-cluster network • Real social network: Flickr.com

  8. Scale-free?power-law distribution • A scale-free network is a network whose degree distribution follows a power law, at least asymptotically. • That is, the fraction P(k) of nodes in the network having k connections to other nodes goes for large values of k as • Where is a parameter whose value is typically in the range 2 << 3, although occasionally it may lie outside these bounds. • Source: wikipedia.

  9. Source: Barabasi & Albert, 1999

  10. Small world • A small-world network is a type of mathematical graph in which most nodes are not neighbors of one another, but most nodes can be reached from every other by a small number of hops or steps. • Specifically, a small-world network is defined to be a network where the typical distance L between two randomly chosen nodes (the number of steps required) grows proportionally to the logarithm of the number of nodes N in the network, that is: • Source: wikipedia

  11. WS network Source: Watts and Strogatz, 1998

  12. Power cluster network? • The power cluster network has been proved a high-clustering network with power law degree distribution both analytically and numerically.

  13. Flickr.com • Flickr.com allows a user to establish friendship with any other user by adding the person’s name to his or her “Contact” list, which creates the link or connection among others.

  14. Sampling method • Random sampling • Snowball sampling • Forest-fire sampling

  15. Random sampling • Method: • Randomly select the nodes or links. • Property: A random sample does not preserve the topology of a network, especially when the sample is only a small proportion of the population.

  16. Example

  17. Snowball sampling • Method: • 1\ randomly choose a member, i; • 2\ choose all the members that are connected to iand denoting this set as {i}; • 3\ choose all the members connected to {i}, excluding duplicated members; • 4\ repeat till the total number reaches n.

  18. Example

  19. Forest-fire sampling • Method: • 1\ randomly choose a member i; • 2\ generate a random number r that is geometrically distributed with mean pf/(1-pf)?, where (1-pf) is the parameter in the geometric distribution. • 3\ if r=0, add the new member I and randomly select another r; • 4\ if r!=0, select member i’s up to r outlinks to members, denoted as j1,j2,…,jr, and add them to sample?; • 5\ repeat till the sample number reaches n.

  20. Simulation

  21. Simulation

  22. Findings • 1\ ᵖtends to be underestimated with rare exceptions. (?table 1C) • most important • 2\Slight trade-off between snowball and forest-fire method. • 3\ Interaction between network topology and sampling methods.

  23. Why is biased? • VS

  24. Why most likely to be underestimated? • VS

  25. Will the bias affect the inference of alpha and beta? • Method: simulation instead of mathematics • Results: SE of alpha and beta are close between two v-cov matrix are specified, and there is no clear pattern whether the SE will be overestimated or underestimated.

  26. The estimation bias is caused by sampling methods not being able to preserve the topology of the entire network. • How to demonstrate this in a clear way?

  27. Regression analysis

  28. Method to obtain unbiased estimation • Step1: recover • Step2: construct new

More Related