590 likes | 702 Views
Information diffusion in online communities. Lada Adamic ICOS Sept. 29, 2006. Talk outline. discussions in the political blogosphere. online person-to-person product recommendations. The political blogosphere and the 2004 election: Divided they blog.
E N D
Information diffusion in online communities Lada Adamic ICOS Sept. 29, 2006
Talk outline discussions in the political blogosphere online person-to-person product recommendations
The political blogosphere and the 2004 election: Divided they blog joint work with Natalie Glance @ Nielsen/Buzzmetrics
Political blogs gaining in importance • Pew Internet & American Life Project Report, January 2005, reports: • 63 million U.S. citizens use the Internet to stay informed about politics (mid-2004, Pew Internet Study) • 9% of Internet users read political blogs preceding the 2004 U.S. Presidential Election • 2004 Presidential Campaign Firsts • Candidate blogs: e.g. Dean’s blogforamerica.com • Successful grassroots campaign conducted via websites & blogs • Bloggers credentialed as journalists & invited to nominating conventions
Related research on political blogs • 10 most popular political blogs account for half the blogs read by surveyed journalists (Drezner and Farrell 2004) • The most popular blogs also receive the majority of citation links (Shirky 2003). • Citation link structure reveals topical subcommunities: Catholicism, homeschooling, A-list bloggers (Herring et. al. 2005) • Comparison of network neighborhoods of Atrios and Instapundit: no overlap in linking behavior (Welsch 2005) • Research question: Are we witnessing cyberbalkanization of the Internet?
Calling all political blogs • Collected self-identified liberal and conservative blogs from online directories (eTalkingHead, BlogCatalog, CampaignLine, Blogorama) • Crawled home page of each blog in February 2005: found 30 more well-cited political blogs (manually categorized) • biases toward sidebar/blogroll links • Did not include libertarian, independent or moderate blogs (fewer in number and lesser in popularity) • Identified: 676 liberal and 659 conservative blogs
The larger political blogosphere Results • 91% of links point to blog of same persuasion • Conservative blogs show greater tendency to link • 82% of conservative blogs linked to at least once; 84% link to at least one other blog • 67% of liberal blogs are linked to at least once; 74% link to at least one other blog • Average # of links per blog is similar: 13.6 for liberal; 15.1 for conservative • Higher proportion of liberal blogs that are not linked to at all
Methodology for detailed study of A-list blogs • Harvested posts for top 20 lists from BlogPulse • BlogPulse stores individual posts: date, permalink, and content • Date range: late August 2004 –> mid-November 2004 • Collected: 12,470 liberal posts; 10,414 conservative posts • Identifying citation links (weblog post -> blog OR post) • For each post, extract all links (hrefs) • Exclude self-links • Blogroll/sidebar links not included • 1511 L-L citations; 2110 R-R citations; 247 L-R; 312 R-L • Result: Conservatives had 16% fewer posts but cited each other 40% more often
Citations between blogs in their posts (Aug 29th – Nov 15th, 2004) • all citations between A-list blogs in 2 months preceding the 2004 election • citations between A-list blogs with at least 5 citations in both directions • edges further limited to those exceeding 25 combined citations only 15% of the citations bridge communities
1 Digby’s Blog 2 James Walcott 3 Pandagon 4 blog.johnkerry.com 5 Oliver Willis 6 America Blog 7 Crooked Timber 8 Daily Kos 9 American Prospect 10 Eschaton 11 Wonkette 12 Talk Left 13 Political Wire 14 Talking Points Memo 15 Matthew Yglesias 16 Washington Monthly 17 MyDD 18 Juan Cole 19 Left Coaster 20 Bradford DeLong 21 JawaReport 22 Vodka Pundit 23 Roger L Simon 24 Tim Blair 25 Andrew Sullivan 26 Instapundit 27 Blogs for Bush 28 LittleGreenFootballs 29 Belmont Club 30 Captain’s Quarters 31 Powerline 32 Hugh Hewitt 33 INDC journal 34 Real Clear Politics 35 Winds of Change 36 Allahpundit 37 Michelle Malkin 38 Wizbang 39 Dean’s World 40 Volokh
Notable examples of blogs breaking a story • Swiftvets.com anti-Kerry video • Bloggers linked to this in late July, keeping accusations alive • Kerry responded in late August, bringing mainstream media coverage • CBS memos alleging preferential treatment of Pres. Bush during the Vietnam War • Powerline broke the story on Sep. 9th, launching flurry of discussion • Dan Rather apologized later in the month • “Was Bush Wired?” • Salon.com asked the question first on Oct. 8th, echoed by Wonkette & PoliticalWire.com • MSM follows-up the next day
Liberals and conservatives differ in the topics they discuss Discussion of “forged documents”
Political blogs as echo chambers Pairwise comparison of URLs and phrases posted by each blog vA = wU1 wU2 … wUN tf*idf weight~ (number of times blog mentions URL)* log[(total number of blogs monitored by blogpulse)/(number of those blogs citing the URL)] Similarity of two blogs is given by the cosine of their vectors cos(A,B) = vA.vB/(||vA||*||vB||) Similarity in URLs between blogs of the same persuasion was higher (0.08 for liberal blogs and 0.09 for conservative ones), than between mixed pairs (0.03) Same trend for phrases. We can even invert the analysis, and see what phrases are similar…
Political figures being discussed 59% of the mentions of Kerry are by right leaning blogs 53% of the mentions of Bush are by left leaning blogs
Mainstream media bias (links from 1,400 blog set)
Mainstream media cited about once every other post from the A-list bloggers (6,762 times from the left, 6,364 from the right)
Insights from the political blogosphere Liberal and conservative blogs are balanced in numbers and tend to link primarily to their own communities But is 10% cross-linking really too little? Or is it a sign of significant discussion? Conservative blogs are more likely to include links to other blogs on their pages, and their A-list blogs reference one another more frequently Liberal and conservative blogs tend to discuss different things, but one is not more ‘coherent’ than the other
Trying to bridge the divideOpposition to the bankruptcy bill (March 2005) conservative blog post liberal blog post uncategorized blog post news article government website link between posts/pages posts/pages belonging to same blog/site but, bill was passed nevertheless: Senate 74 - 25 , House 302 - 126
The dynamics of viral marketing Jure Leskovec, Carnegie Mellon University Lada Adamic, University of Michigan Bernardo Huberman, HP Labs
Using online networks for viral marketing Burger King’s subservient chicken
The dynamics of Viral Marketing • Outline • prior work on viral marketing & information diffusion • incentivised viral marketing program • cascades and stars • network effects • product and social network characteristics
Information diffusion • Studies of innovation adoption • hybrid corn (Ryan and Gross, 1943) • prescription drugs (Coleman et al. 1957) • Models (very many) • Rogers, ‘Diffusion of Innovations’ • Watts, Information cascades (2003) • Kempe, Kleinberg, Tardos, Maximizing the spread of Influence, (2005)
Motivation for viral marketing • viral marketing successfully utilizes social networks for adoption of some services • hotmail gains 18 million users in 12 months,spending only $50,000 on traditional advertising • gmail rapidly gains users although referrals are the only way to sign up • customers becoming less susceptible to mass marketing • mass marketing impractical for unprecedented variety of products online
The web savvy consumer and personalized recommendations • > 50% of people do research online before purchasing electronics • personalized recommendations based on prior purchase patterns and ratings • Amazon, “people who bought x also bought y” • MovieLens, “based on ratings of users like you…” • Is there still room for viral marketing?
Is there still room for viral marketing next to personalized recommendations? • We are more influenced by our friends than strangers • 68% of consumers consult friends and family before purchasing home electronics (Burke 2003)
10% credit 10% off Incentivised viral marketing(our problem setting) • Senders and followers of recommendations receive discounts on products • Recommendations are made to any number of people at the time of purchase • Only the recipient who buys first gets a discount
Product recommendation network purchase following a recommendation customer recommending a product customer not buying a recommended product
the data • large anonymous online retailer (June 2001 to May 2003) • 15,646,121 recommendations • 3,943,084 distinct customers • 548,523 products recommended • Products belonging to 4 product groups: • books • DVDs • music • VHS
summary statistics by product group people recommendations high low
viral marketing programnot spreading virally • 94% of users make first recommendation without having received one previously • size of giant connected component increases from 1% to 2.5% of the network (100,420 users) – small! • some sub-communities are better connected • 24% out of 18,000 users for westerns on DVD • 26% of 25,000 for classics on DVD • 19% of 47,000 for anime (Japanese animated film) on DVD • others are just as disconnected • 3% of 180,000 home and gardening • 2-7% for children’s and fitness DVDs
measuring cascade sizes • delete late recommendations • count how many people are in a single cascade • exclude nodes that did not buy books steep drop-off 6 10 -4.98 = 1.8e6 x 4 10 very few large cascades 2 10 0 10 0 1 2 10 10 10
cascades for DVDs • DVD cascades can grow large • possibly as a result of websites where people sign up to exchange recommendations shallow drop off – fat tail -1.56 ~ x 4 10 a number of large cascades 2 10 0 10 0 1 2 3 10 10 10 10
simple model of propagating recommendations(ignoring for the moment the specific mechanics of the recommendation program of the retailer) • Each individual will have ptsuccessful recommendations. We model pt as a random variable. • At time t+1, the total number of people in the cascade, Nt+1 = Nt * (1+pt) • Subtracting from both sides, and dividing by Nt, we have
simple model of propagating recommendations(continued) • Summing over long time periods • The right hand side is a sum of random variables and hence normally distributed. • Integrating both sides, we find that N is lognormally distributed if s large resembles power-law
participation level by individual very high variance The most active customer made 83,729 recommendations and purchased 4,416 different items!
does receiving more recommendationsincrease the likelihood of buying? DVDs BOOKS
does sending more recommendationsinfluence more purchases? DVDs BOOKS
the probability that the sender gets a credit with increasing numbers of recommendations • consider whether sender has at least one successful recommendation • controls for sender getting credit for purchase that resulted from others recommending the same product to the same person probability of receiving a credit levels off for DVDs
Multiple recommendations between two individuals weaken the impact of the bond on purchases DVDs BOOKS
product and social network characteristics influencing recommendation effectiveness
recommendation success by book category • consider successful recommendations in terms of • av. # senders of recommendations per book category • av. # of recommendations accepted • books overall have a 3% success rate • (2% with discount, 1% without) • lower than average success rate (significant at p=0.01 level) • fiction • romance (1.78), horror (1.81) • teen (1.94), children’s books (2.06) • comics (2.30), sci-fi (2.34), mystery and thrillers (2.40) • nonfiction • sports (2.26) • home & garden (2.26) • travel (2.39) • higher than average success rate (statistically significant) • professional & technical • medicine (5.68) • professional & technical (4.54) • engineering (4.10), science (3.90), computers & internet (3.61) • law (3.66), business & investing (3.62)
professional and organized contexts • In general, professional & technical book recommendations are more often accepted(probably in part due to book cost) • Some organized contexts other than professional also have higher success rate, e.g. religion • overall success rate 3.13% • Christian themed books • Christian living and theology (4.7%) • Bibles (4.8%) • not-as-organized religion • new age (2.5%) • occult spirituality (2.2%) • Well organized hobbies • books on orchids recommended successfully twice as often as books on tomato growing
examples of communities • use a community finding algorithm to identify groups of people sending recommendations to one another # nodes # senders topics _ 735 74 books: American literature, poetry 710 179 sci-fi books, TV series DVDs, alternative rock music 667 181 music: dance, indie 653 121 discounted DVDs 541 112 books: art & photography, web development, graphical design, sci-fi 502 104 books: sci-fi and other 388 77 books: Christianity and Catholicism 309 81 books: business and investing, computers, Harry Potter 192 30 books: parenting, women’s health, pregnancy 163 48 books: comparative religion, Egypt’s history, new age, role playing games