1 / 28

Epidemics in Blogspace

Epidemics in Blogspace. Hasan T Karaoglu. Outline. Introduction Blogs are different! Methods are different! Contents are different! Some methods on Some Content of Some Blogs Discussion. Introduction. Blogs are a popular way to share personal journals,

newton
Download Presentation

Epidemics in Blogspace

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Epidemics in Blogspace Hasan T Karaoglu

  2. Outline • Introduction • Blogs are different! • Methods are different! • Contents are different! • Some methods on Some Content of Some Blogs • Discussion

  3. Introduction • Blogs are a popular way to • share personal journals, • discuss matters of public opinion, • have collaborative conversations, • aggregate content on similar topics. • Blogs also disseminate • new content • novel ideas • How does content spread across, what kinds of content spreads, and at what rate?

  4. Introduction - Epidemics • Epidemics : one way of modeling these aspects • Physics of Information Diffusion • Disease Propagation Model • Susceptible • Infected • Recovered • Mutation? • Threshold Model for Social Networks

  5. Blogs are different • Youtube, Flickr (Content Sharing ) • Amazon • CNN, MSNBC (Web) • Linkedln (Professional Networking) • Orkut, Facebook, Yonja (Social Networking) • Twitter (?) • Blogger, Blogspot, LiveJournal, Slashdot (Blogspace)

  6. Blogs are different • High level of reciprocity • Symmetric indegree – outdegree • In contrast to Web (high authority sites)

  7. Blogs are different

  8. Blogs are different Average Path Length is very short in compared to Web. (Directionality ?)

  9. Blogs are different Joint Degree Distribution (High Degree Nodes Connect to Other High Degree Nodes) Epidemics on Network Core? Youtube Celebrities?

  10. Blogs are different • Strongly Connected Core Analysis • Slowly Increasing Shortest Path • High Clustering

  11. Blogs are different Strong Local Clustering (people tend to be introduced to other people via mutual friends)

  12. Methods are different • Epidemics • Gossip • Influence Map (Word of Mouth) • Recommendation Based • Web (Data) Mining • Mathematical Modeling (Markov Chains, Information Theory, …) • …

  13. Contents are different • Recommendation • News (Political, Fun, Paparazzi) • Gossip • Media (Music, News, Excerpts)

  14. Some methods on Some Content of Some Blogs • Infection Inference technique introduced by Adamic et al. • Link inference • Link classification • Classifier training • Problems and Challenges

  15. Some methods on Some Content of Some Blogs • Pattern Used for Classifier Training • The number of common blogs explicitly linked to by both blogs (indicating whether two blogs are in the same community) • The number of non-blog links (i.e. URLs) shared by the two • Text similarity • Order and frequency of repeated infections. • Specifically, the number of times one blog mentions a URL before the other and the number of times • They both mention the URL on the same day. • In-link and out-link counts for the two blogs

  16. Some methods on Some Content of Some Blogs • Text Similarity • s(A,B) = nAB / √nA / √nB

  17. Some methods on Some Content of Some Blogs • Timing of Infection

  18. Some methods on Some Content of Some Blogs • Link Inference • Blog URL and Text Similarity Patterns • Three-way Classifier (57%) • reciprocated links, • one way links, • unlinked pairs • Two-way Classifier (SVM 91.2% Logistic Regression 91.9%) • linked • unlinked pairs • Infection Inference • nA-before-B /nA, nA-after-B /nA, nA-same-day-B /nATiming Patterns (75%) • with all 6 timing patterns and text/blog similarity patterns (61 – 75%) • link-in / link-out counts

  19. Some methods on Some Content of Some Blogs • Visualization • Heuristics using classifiers • Two types of graph • Directed Acyclic Graph • Most likely tree

  20. Some methods on Some Content of Some Blogs • Epidemic Propagation Model by Gruhl et al. • Topics • Individuals • Topics • Topic = Chatter + Spike + (Resonance)

  21. Some methods on Some Content of Some Blogs • Epidemic Propagation Model by Gruhl et al. • Topics • Individuals • Topics • Topic = Chatter + Spike + (Resonance)

  22. Some methods on Some Content of Some Blogs

  23. Some methods on Some Content of Some Blogs aoccdrnig to rscheearch at an elingshuinervtisy it deosn’tmttaer in wahtoredr the ltteers in a wrod are, the olnyiprmoetnttihng is taht the frist and lsatltteer is at the rghitpclae

  24. Some methods on Some Content of Some Blogs Different Posting Behaviors for Individuals Power-law Characteristic for Individuals

  25. Some methods on Some Content of Some Blogs • Propagation Model • Cascading Model • Copy Probability κ(v,w) • Noticing Probability r(v,w) • For 7K topics, r mean 0.28 and std 0.22, • κ quite low, mean 0.04 and std 0.07, • Even bloggers who commonly read from another source are selective in the topics they choose to write about.

  26. Discussion • Could we use these models to extract further pattern or characteristics ? • Classification of Hoax, Fake News ? • Prediction of Popular songs, videos at their inception • …..

  27. Q & A • Thanks!

  28. References • D. W. Drezner, and H. Farrell, “Web of Influence,” Foreign Policy, vol. 145, pp. 32-40, Dec. 2004 • E. Adar and L. A. Adamic, “Tracking Information Epidemics in Blogspace,” Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 207–214, 2005. • D. Gruhl, R. Guha, D. Liben-Nowell, and A. Tomkins, “Information diffusion through blogspace,” Proceedings of the 13th international conference on World Wide Web, pp. 491-501,2004. • A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee, “Measurement and Analysis of Online Social Networks,” Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, pp. 29-42, 2007 • M. Cha, J. A. N. Perez, and H. Haddadi, "Flash Floods and Ripples: The Spread of Media Content through the Blogosphere", 3rd Int'l AAAI Conference on Weblogs and Social Media (ICWSM) Data Challenge Workshop, May 17 - 20, 2009, San Jose, CaliforniaM. Young, The Technical Writer's Handbook. Mill Valley, CA: University Science, 1989. • Z. Fanzi, Q. Zhengding, L. Dongsheng, and Y. Jianhai, “Shape-based time series similarity measure and pattern discovery algorithm”, Journal of Electronics, vol. 22, pp. 142-148, Aug. 2007

More Related