200 likes | 216 Views
This research explores predicting content popularity in social networks based on user connectivity. The approach incorporates community structure and connection coefficient to determine how content spreads within and across communities. Real-world data from Digg and MetaFilter was used to analyze and validate the prediction model. Discoveries include the impact of user connections on content popularity and the role of community size in spreading stories. The approach aims to provide an accurate, lightweight, and scalable solution for predicting popularity in social media contexts.
E N D
A Connectivity-Based Popularity Prediction Approach for Social Networks HuangmaoQuan, Ana Milicic, Slobodan Vucetic, and Jie Wu Department of Computer and Information Sciences Temple University
Overview … Content 1 Content 2 … Server resources • Given a set of server resources? • How much server resources should we allocated to Content 1? How much to Content 2?
Overview … Popular Less popular Content 1 Content 2 … Allocated to Content 1 Allocated to Content 2 • Popular content have higher traffic, so allocating more resources will improve performance • If we know that Content 1 will be more popular, we can allocate more resources accordingly
Overview … • Accurately determining popularity has other applications • Determining advertising rates • Develop marketing strategies • Clearly, inaccurate prediction will result in worst performance
Overview … • Accurately determining popularity has other applications • Determining advertising rates • Develop marketing strategies • Clearly, inaccurate prediction will result in worst performance • Can we use social network information to help us predict popularity?
Overview … • Accurately determining popularity has other applications • Determining advertising rates • Develop marketing strategies • Clearly, inaccurate prediction will result in worst performance • Can we use social network information to help us predict popularity? • Our solution can be applied to applications that have social connectivity information
Our approach … • Try to predict popularity based on social network structure and not on content • E.g. if content viewed by a user with greater connectivity, then will propagate faster than if viewed by user with lower connectivity
Our approach … • Ideal prediction technique • Accurate • Computationally lightweight • Scalable to large scale social media
Our approach … • We incorporate the idea of a “community” into the prediction • A community represents a large group of users who are connected to each other via social graph
Our approach … • Use greedy optimization of modularity to determine community [Clauset et. al., 2004] • Compute by merging nodes into groups that maximizes the modularity score of the graph • Lightweight enough to scale
Our approach … • Try to measure popularity using the connectivity of an individual user within a community • Connection coefficient captures the connectivity of a user within a community • Idea is that higher the connection coefficient, the faster the content will spread to others within the community
Our approach … • Try to measure popularity using the connectivity of a community with respect to other communities • Spreading content within one community may not necessary mean it will spread to the rest of the network • E.g. content may only appeal to a very small niche • So we consider connectivity of community
Our approach … • Try to measure popularity using the connectivity of a community with respect to other communities • Spreading content within one community may not necessary mean it will spread to the rest of the network • E.g. content may only appeal to a very small niche • So we consider connectivity of community
The data … • Evaluated using real world dataset collected from Digg and MetaFilter • Digg dataset was collected by us. Had 5,000 stories from 2,684 authors. A total of 117,956 users, 1,164,613 edges, and 19,645 posts. • Metafilter dataset obtain from database
The data … • In Digg, users can view • Recently promoted stories (front page) • Recently submitted stories • Stories their friends recently submitted • Stories their friends recently voted for • Then they vote for stories that interest them • Popular stories then become “top stories”. Actual algorithm is unknown
Some findings … • Some top stories come from users with many friends • But considerable top stories comes from users with few friends • So fewer friends does not mean stories are not popular
Some findings … • No strict linear correlation between number of votes in first hour, and final number of votes • So something that is initially popular may not mean long term popularity
Some findings … • No “return-the-favor” behavior observed • Users that voted favorably for stories do not necessary get more favorable votes in return
Some findings … • Almost all top stories spread to larger communities very quickly after they first appear • Story popularity depends less on characteristics of author and number of initial support • More relevant factor is whether user is author is member of large or small community • Also relevant is how many users from different communities offer initial support