270 likes | 565 Views
Feeds That Matter A Study of Bloglines Subscriptions. Akshay Java Pranam Kolari, Tim Finin, Anupam Joshi, Tim Oates. Outline. Background and Motivation Bloglines General Statistics Grouping Related Topics Applications Conclusion. Bloglines Feed Reader. Folders.
E N D
Feeds That MatterA Study of Bloglines Subscriptions Akshay JavaPranam Kolari, Tim Finin, Anupam Joshi, Tim Oates
Outline • Background and Motivation • Bloglines General Statistics • Grouping Related Topics • Applications • Conclusion
Bloglines Feed Reader Folders Use folder label as approximation for topic. Group similar folders together Rank Feeds under a “topic”
Study user generated tags in feed reader subscriptions Find relevant blogs about a topic Needed labeled, training data for building text classifiers for different topics Motivation Tag Cloud generated by using folder names and merging related folders
Outline • Background and Motivation • Bloglines General Statistics • Grouping Related Topics • Applications • Conclusion
Bloglines General Statistics • 83K publicly listed subscribers • 2.8M feeds, 500K are unique • 26K users (35%) use folders to organize subscriptions • Data collected in May 2006 Although there may be ~ 50M+ Blogs, only a small fraction get continued user attention in the form of subscriptions Users subscribe to Web 2.0 content such as flickr, delicious, technorati and google searches
Bloglines General Statistics Feed Subscriptions follow a power law distribution
Bloglines General Statistics • Most users subscribe to modest number of feeds • Most users have only a few folders • User attention is limited
Bloglines General Statistics As subscriptions increase, users tend to organize them into folders.
Outline • Background and Motivation • Bloglines General Statistics • Grouping Related Topics • Applications • Conclusion
Bloglines General Statistics technologica Musica Foreign Language Weather Email, Mailing List, Tracking A folksonomy emerges from the folder names. Many users use popular folder names to classify feeds.
Tag Cloud After Merge Folder names are used as topics. Lower ranked folder are merged into a higher ranked folder if there is an overlap and a high cosine similarity.
Merging Tags Interesting Cases: • Music vs. Musica : English and Spanish Music sites • Podcasting vs. Podcasts: One refers to the tools for podcasting while the other feeds containing podcasts • Regional Interests: China, Japan, India, etc. • Foreign Language: Spanish, German
Feeds That Matter Top Feeds for “Politics” Merged folders: “political”, “political blogs” • Talking Points Memo: by Joshua Micah Marshall • Daily Kos: State of the Nation • Eschaton • The Washington Monthly • Wonkette, Politics for People with Dirty Minds • http://instapundit.com/ • Informed Comment • Power Line • AMERICAblog: Because a great nation deserves the truth • Crooks and Liars • Top Feeds for “Knitting” • Merged folders “knitting blogs” • Yarn Harlotknitting • Wendy Knits! • See Eunny Knit! • the blue blog • Grumperina goes to local yarn shops and Home Depot • You Knit What?? • Mason-Dixon Knitting • knit and tonic • Crazy Aunt Purl • http://www.lollygirl.com/blog/
Bloglines Wired Slashdot BloingBoing Dilbert Gizmodo Engadget Official Google Blog Alist Apart News: CNN, Reuters, Moreover News Blogs Tech Comics Politics Podcasts Design Sports Science Business Most Subscribed Feeds, Top Folders Top Feeds Top Folders
Tag Merging Folder names are used as topics. Lower ranked folder are merged into a higher ranked folder if there is an overlap and a high cosine similarity.
Outline • Background and Motivation • Bloglines General Statistics • Grouping Related Topics • Applications • Conclusion
FTM! Site Explore Popular Topics Subscribe To Interesting Feeds If you like X you will like… http://ftm.umbc.edu
Feed Recommender (Method 1) • Two feeds are similar if they are categorized under similar folders Technology Politics Business knitting
Feed Recommendation (Method 2) • Start with a seed set from FTM! • Using, graph from WWE dataset, find nodes influenced by the seed set • Find other blogs frequently co-cited by the followers Blogs influenced by seed set
Feed Recommendation Using Co-citation Politics Knitting
Outline • Background and Motivation • Bloglines General Statistics • Grouping Related Topics • Applications • Conclusion
Conclusions • Folder labels can be used to produce an intuitive set of topics for feeds or blogs • Subscription information combined with simple techniques can be quite effective in ranking blogs for a topic. • Many useful applications such as feed recommendation and meme trackers can benefit from this data.
University Study Reveals Rich Data on Bloglines Feeds Feeds That Matter is a fascinating new analysis project out of UMBC and a terrific way to find new RSS feeds to subscribe to.. - Steve Rubel, Micropersuasion blog “Want to find a few good feeds? Try Feeds That Matter, an interesting grouping of publicly listed feeds at Bloglines’’ delicious user skyamese It brings you popular feeds from Bloglines in different categories and I found almost all the popular feeds in appropriate categories out there. Worth paying a visit – netgautam blogger Provides a "swarm" with keywords on subjects which will take you to a list of blogs/sites relating to that keyword. All are rss feeds delicious user damenjoe. Thanks! Find how to classify your feeds and find new feeds based on tags - delicious user inf Links to loads of good RSS feeds. hmspolio …find information and resources that have already been filtered by like minded people – Tryangulation blog Nothing better to read online? Feeds that matters gives you loads of highly rated feeds in all category ….great source for some quality content for a blog or just for browsing. Blendedblog blogger kind of a meta blog delicious user frontporsche …it's a great example of a technique for extracting usefulmetadata from the world - JD on EP blog Easy way to find good blogs -delicious user kc144 600+ bookmarks on delicious & more…
Feed Recommendation (Method 2) • Starting with a seed set from FTM! Find other influential feeds from Blogpulse data, using co-citations. www.dailykos.com Blogs influenced by seed set