A Connectivity-Based Popularity Prediction Approach for Social Networks

A Connectivity-Based Popularity Prediction Approach for Social Networks HuangmaoQuan, Ana Milicic, Slobodan Vucetic, and Jie Wu Department of Computer and Information Sciences Temple University

Overview … Content 1 Content 2 … Server resources • Given a set of server resources? • How much server resources should we allocated to Content 1? How much to Content 2?

Overview … Popular Less popular Content 1 Content 2 … Allocated to Content 1 Allocated to Content 2 • Popular content have higher traffic, so allocating more resources will improve performance • If we know that Content 1 will be more popular, we can allocate more resources accordingly

Overview … • Accurately determining popularity has other applications • Determining advertising rates • Develop marketing strategies • Clearly, inaccurate prediction will result in worst performance

Overview … • Accurately determining popularity has other applications • Determining advertising rates • Develop marketing strategies • Clearly, inaccurate prediction will result in worst performance • Can we use social network information to help us predict popularity?

Overview … • Accurately determining popularity has other applications • Determining advertising rates • Develop marketing strategies • Clearly, inaccurate prediction will result in worst performance • Can we use social network information to help us predict popularity? • Our solution can be applied to applications that have social connectivity information

Our approach … • Try to predict popularity based on social network structure and not on content • E.g. if content viewed by a user with greater connectivity, then will propagate faster than if viewed by user with lower connectivity

Our approach … • Ideal prediction technique • Accurate • Computationally lightweight • Scalable to large scale social media

Our approach … • We incorporate the idea of a “community” into the prediction • A community represents a large group of users who are connected to each other via social graph

Our approach … • Use greedy optimization of modularity to determine community [Clauset et. al., 2004] • Compute by merging nodes into groups that maximizes the modularity score of the graph • Lightweight enough to scale

Our approach … • Try to measure popularity using the connectivity of an individual user within a community • Connection coefficient captures the connectivity of a user within a community • Idea is that higher the connection coefficient, the faster the content will spread to others within the community

Our approach … • Try to measure popularity using the connectivity of a community with respect to other communities • Spreading content within one community may not necessary mean it will spread to the rest of the network • E.g. content may only appeal to a very small niche • So we consider connectivity of community

The data … • Evaluated using real world dataset collected from Digg and MetaFilter • Digg dataset was collected by us. Had 5,000 stories from 2,684 authors. A total of 117,956 users, 1,164,613 edges, and 19,645 posts. • Metafilter dataset obtain from database

The data … • In Digg, users can view • Recently promoted stories (front page) • Recently submitted stories • Stories their friends recently submitted • Stories their friends recently voted for • Then they vote for stories that interest them • Popular stories then become “top stories”. Actual algorithm is unknown

Some findings … • Some top stories come from users with many friends • But considerable top stories comes from users with few friends • So fewer friends does not mean stories are not popular

Some findings … • No strict linear correlation between number of votes in first hour, and final number of votes • So something that is initially popular may not mean long term popularity

Some findings … • No “return-the-favor” behavior observed • Users that voted favorably for stories do not necessary get more favorable votes in return

Some findings … • Almost all top stories spread to larger communities very quickly after they first appear • Story popularity depends less on characteristics of author and number of initial support • More relevant factor is whether user is author is member of large or small community • Also relevant is how many users from different communities offer initial support

Thank you

A Connectivity-Based Popularity Prediction Approach for Social Networks