1 / 20

Multigraph Sampling of Online Social Networks

Multigraph Sampling of Online Social Networks. Minas Gjoka , Carter Butts, Maciej Kurant , Athina Markopoulou. Outline . Multigraph sampling Motivation Sampling method Internet Measurements Conclusion. Problem statement.

chelsey
Download Presentation

Multigraph Sampling of Online Social Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multigraph Sampling of Online Social Networks Minas Gjoka, Carter Butts, MaciejKurant, AthinaMarkopoulou

  2. Outline • Multigraph sampling • Motivation • Sampling method • Internet Measurements • Conclusion Minas Gjoka

  3. Problem statement Obtain a representative sample of OSN users by exploration of the social graph. E B I C H F A D G Minas Gjoka

  4. Motivation for multiple relations • Principled methods for graph sampling • Metropolis Hastings Random Walk • Re-weighted Random Walk “Walking in Facebook: A Case Study of Unbiased Sampling of OSNs,” INFOCOM ‘10 • But..graph characteristics affect mixing and convergence • fragmented social graph • highly clustered areas Minas Gjoka

  5. Fragmented social graph Largest Connected Component Other Connected Components Friendship Event attendance Group membership Union

  6. Highly clustered social graph Friendship Event attendance Union Minas Gjoka

  7. Proposal • Graph exploration using multiple user relations • perform random walk • re-weighting at the end of the walk • online convergence diagnostics applicable • Theoretical benefits • faster mixing • discovery of isolated components • Open questions • how to combine relations • implementation efficiency • evaluation of sampling benefits in a realistic scenario Minas Gjoka

  8. E E E B B B I I I D D D H H H K K K A A A F F F J J J C C C G G G Friends Events Groups Minas Gjoka

  9. E E E E B B B B I I I I D D D D H H H H K K K K A A A A F F F F J J J J C C C C G G G G Friends Events Groups Minas Gjoka

  10. deg(F, tot) = 8 deg(F, red) = 1 deg(F, blue) = 3 deg(F, green) = 4 E E B B I I D D H H K K A A J J C C G G Combination of multiple relations G* = Friends + Events + Groups ( G* is a union multigraph ) F G = Friends + Events + Groups ( G is a union graph ) F Minas Gjoka

  11. Multigraph samplingImplementation efficiency Degree information available without enumeration Take advantage of pages functionality Minas Gjoka

  12. Multigraph samplingInternet Measurements • Last.fm, an Internet radio service • social networking features • multiple relations • fragmented graph components and highly clustered users expected • Last.fm relations used • Friends • Groups • Events • Neighbors Minas Gjoka

  13. Data CollectionSampled node information • Crawling using Last.fm API and HTML scraping userID country age registration time … Minas Gjoka

  14. Summary of datasetsLast.fm - July 2010 Friend:0.3% Events:5.4% Groups:94.2% Neighbors:0.02% Minas Gjoka

  15. Comparison to UNI% of Subscribers % of Subscribers Minas Gjoka

  16. Last.fm Charts EstimationApplication of sampling Minas Gjoka

  17. Last.fm Charts EstimationArtist Charts Minas Gjoka

  18. Related Work • Fastest mixing Markov Chain • Boyd et al - SIAM Review 2004 • Sampling in fragmented graphs • Ribeiro et al. Frontier Sampling – IMC 2010 • Last.fm studies • Konstas et al - SIGIR ‘09 • Schifanella et al - WSDM ‘10 Minas Gjoka

  19. Conclusion Introduced multigraph sampling simple and efficient discovers isolates components better approximation of distributions and means multigraph dataset planned for public release Future work on multigraph sampling selection of relations weighted relations Minas Gjoka

  20. Thank you Questions? Minas Gjoka

More Related