90 likes | 299 Views
Jure Leskovec Machine Learning Department Carnegie Mellon University. Who are the most influential bloggers?. Question…. ?. = I have 10 minutes. Which blogs should I read to be most up to date? = Who are the most influential bloggers?. A single story propagates….
E N D
Jure Leskovec Machine Learning Department Carnegie Mellon University Who are the most influential bloggers?
Question… ? = I have 10 minutes. Which blogs should I read to be most up to date? = Who are the most influential bloggers?
A single story propagates… So, who is really the influential here? Obscure technology story Small tech blog Small tech blog Slashdot Wired What about multiple stories propagating? Blogs New Scientist New York Times CNN BBC
Blogs: Detecting big stories early Want to read things beforeothers do. Detect blue & green soon but miss red. • = Data: All posts from top 50,000 blogs for 2006 • = 60 million posts, 120 million links Detect all stories but late.
Problem definition: Covering blogs = Given a budget (e.g., of 3 blogs) = Select blogs to cover the most of the blogosphere? = Bad news: Solving this exactly is NP-hard = Good news: Theorem: Our algorithm can do it in linear timeand with factor 3 approximation Blogosphere
Experimental results • Which blogs to read to be most up to date? www.blogcascades.org Our solution % of stories detected(higher is better) In-links (used by Technorati) Out-links # posts Random Number of selected blogs
Jure Leskovec www.blogcascades.org So, who is influential?What should I read?
Same problem: Water Network • Given: • a real city water distribution network • data on how contaminants spread over time • Place sensors (to save lives) • Problem posed by the US Environmental Protection Agency c1 S S c2
[w/ Ostfeld et al., J. of Water Resource Planning] Water network: Results CELF • Our approach performed bestat the Battle of Water Sensor Networks competition Degree Random Population saved (higher is better) Population Flow Number of placed sensors