310 likes | 460 Views
Implicit Structure and Dynamics of BlogSpace Eytan Adar, Li Zhang, Lada Adamic, & Rajan Lukose HP Labs, Palo Alto, CA. list of read blogs. date and time stamps. URL that is being commented on. via link. Blogs (web logs) contain online stamped entries.
E N D
Implicit Structure and Dynamics of BlogSpaceEytan Adar, Li Zhang, Lada Adamic, & Rajan LukoseHP Labs, Palo Alto, CA
list of read blogs date and time stamps URL that is being commented on via link Blogs (web logs) contain online stamped entries
Blogs: structure and transmission • Blog use: • Record real-world and virtual experiences • Note and discuss things “seen” on the net • Blog structure: blog-to-blog linking • Use + Structure • Great to track “memes” (catchy ideas) • Patterns of information flow • How does the popularity of a topic evolve over time? • Who is getting information from whom? • Ranking algorithms that take advantage of transmission patterns
Related Work Link prediction in social networks: Butts, C. Network Inference, Error, and Information (In)Accuracy: A Bayesian Approach, Social Networks, 25(2):103-140. Dombroski, M., P. Fischbeck, and K. Carley, An Empirically-Based Model for Network Estimation and Prediction, NAACSOS conference proceeding, Pittsburgh, PA, 2003. O’Madadhain J., Smyth P., Adamic L., Learning Predictive Models for Link Formation, Sunbelt 2005 (hope you were there!) Getoor, L., N. Friedman, D. Koller, and B. Taskar, Learning Probabilistic Models of Link Structure, Journal of Machine Learning Research, vol. 3(2002), pp. 690-707. Adamic L., Adar E., Friends and neighbors on the Web, Social Networks, 2003. Kleinberg, J., and .D. Liben-Nowell, The Link Prediction Problem for Social Networks’, in Proceedings of CIKM ’03 (New Orleans, LA, November 2003), ACM Press. Blog ranking: Technorati, BlogPulse, Daypop… Blog epidemic tracking: Blogdex at MIT media lab, Cameron Marlow, Sunbelt 2003 BlogPulse
Intelliseek’s BlogPulse Service for tracking trends in the blogosphere: popular URLs, phrases, people
BlogPulse Data analyzed 37,153 blogs Differential daily crawls (to find new posts) for May 2003 Full page crawl for May 18, 2003 to capture blogrolls 175,712 URLs occurring on > 2 blogs
Slashdot Effect BoingBoing Effect Tracking popularity over time Popularity Time Blogdex, BlogPulse, etc. track the most popular links/phrases of the day
Election Map Cartograms Michael Gastner, Cosma Shalizi, and Mark NewmanUniversity of Michigan http://www-personal.umich.edu/~mejn/election/
Tracking popularity over time Popularity Time
Total # of mentions substantial (40) URL mentioned for the first time in May Clustering information popularity profiles May 2003
K-means clustering 259 URLs in the sample satisfy criteria Take normalized cumulative profiles all mentions day K-means minimizes the sum of the differences within each cluster 4 clusters captured most of the differences
Different kinds of information have differentpopularity profiles 1 2 3 4 1 Major-news site (editorial content) – back of the paper Products, etc. Slashdotpostings Front-pagenews 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 5 10 15 5 10 15 5 10 15 5 10 15
cluster 4 cluster 1 cluster 3 cluster 2 1 0.9 0.8 0.7 0.6 Popularity profiles 0.5 0.4 0.3 0.2 0.1 0 2 4 6 8 10 12 14 16 18 20 22
What do we need track specific info ‘epidemics’? Timings Underlying network b2 b3 Microscale Dynamics b1 t0 Time of infection t1
Challenges Root may be unknown Multiple possible paths Uncrawled space, alternate media (email, voice) No links b2 b3 Microscale Dynamics bn b1 ? ? t0 Time of infection t1
Via Links (< 2 % of links, 50% within sample) unambiguous Multiple explicit links: which link is more likely No explicit links (70%) which implicit path is more likely Microscale Dynamics who is getting info from whom
Link Inference • Use machine learning algorithms: • A) Support Vector Machine (SVM) • B) Logistic Regression • What we can use • Full text • Blogs in common • Links in common • History of infection BoingBoing WIRED
Similarity in links between reciprocated, unreciprocated, and non-linked blog pairs
Training on positive and negative examples of ‘infection’ Positive Example Negative Example Blog A Blog A - + Blog B Blog B Tinfection(Blog B) > Tinfection(Blog A) Infected Uninfected
Prediction results Link Inference: SVM 91% accuracy regression 92% accuracy (blog-blog links most predictive) Infection inference: SVM 71.5% accuracy: using blog and non-blog link similarity + timing features (AbeforeB)/nA, (BbeforeA)/nA, (A same day B)/nA,, … Regression: 75% accuracy using only timing features
Sources of error Incomplete crawls uncrawled blog or media source C inferred time A actual B Coarseness and sparseness of timing data (1 day resolution) Mirror URLS (actually helps)
GUESS tool (build your own, see demo @ 5:30!) Using GraphViz (by AT&T) layouts Simple algorithm If single, explicit link exists, draw it (add node if needed) Otherwise use ML algorithm Pick the most likely explicit link Pick the most likely possible link Tool lets you zoom around space, control threshold, link types, etc. Visualizationby Eytan Adar http://www-idl.hpl.hp.com/blogstuff
Giant Microbes epidemic visualization via link inferred link blog explicit link
Find early sources of good information using inferred information paths or timing iRank b1 True source b2 Popular site b3 b4 … b5 bn
iRank Algorithm • Draw a weighted edge for all pairs of blogs that cite the same URL • higher weight for mentions closer together • run PageRank • control for ‘spam’ t0 Time of infection t1
02:00 AM Friday Mar. 05, 2004 PSTWired publishes: "Warning: Blogs Can Be Infectious.” 7:25 AM Friday Mar. 05, 2004 PSTSlashdot posts: "Bloggers' Plagiarism Scientifically Proven" 9:55 AM Friday Mar. 05, 2004 PSTMetafilter announces "A good amount of bloggers are outright thieves." Do Bloggers Kill Kittens?
For more info Information Dynamics Lab @ HP http://www.hpl.hp.com/research/idl Blog Epidemic Analyzer http://www-idl.hpl.hp.com/blogstuff Eytan, Li, Lada & Rajan http://www.hpl.hp.com/research/idl/people/eytan/ http://www.hpl.hp.com/personal/Li_Zhang/ http://www.hpl.hp.com/personal/Lada_Adamic http://www.hpl.hp.com/research/idl/people/lukose/