270 likes | 322 Views
Explore how user similarities in interest impact social network connections, focusing on predictive link accuracy. Discover methodologies, similarity metrics, and analysis results from online platforms such as Flickr, Last.fm, and aNobii.
E N D
Friendship Prediction and Homophily in Social Media By LUCA MARIA AIELLO ALAIN BARRAT ROSSANO SCHIFANELLA CIRO CATTUTO BENJAMIN MARKINES and FILIPPO MENCZER, Presenter Maltin Shkarpa
Introduction • Online social sites are very popular nowadays • Great place to share information, experience, ideas – all tailored to specific interest • Users sharing the same interest tend to be friends with each other
Introduction • This article: • Suggests that users with similar interests are more likely to be friends. • Confirms that social networks constructed from topical similarity capture actual friendship accurately.
Introduction • Homophily • Metrics to measure “similarity”? • Tagging • three-way relation • Explicit representation of user activities by: • Exposing resources • Tagging items • Discussion groups • And, other user relations
The article addresses these questions: • How does the similarity between user profiles relate to their proximity on the social network? • Amount of activity • Content of activity • And can we predict the existence of social links from knowledge of the similarity among user profiles?
Flickr • Users upload, tag, and share their pictures • Directed links • flickr.com/api & crawling • Last.fm • Tag songs, artists or albums, create or join groups • Undirected links • last.fm/api and crawling • Tasteometer
Datasets • aNobii • The public aNobii book database • Each user has a digital book collections • Library • Wishlist • Two different types of social ties: • Friendship • Neighborhood • The study will consider the union between friendship and neighborhood
Data Analysis – Heterogeneity and Correlations • Number of friendship relations is considered to be a measure of activity. • Analyzing: • The activity patterns of individual users. • The correlation between various activity indicators.
Data Analysis - Heterogeneity • Activity pattern of users = highly heterogeneous Fig 1.Flickr complementary cumulative distributions
Data Analysis - Heterogeneity • Activity pattern of users = highly heterogeneous Fig 2. Complementary cumulative distributions of the measures of activity of aNobii users
Data Analysis - Correlations • Correlation about different types of activity? • Compute the average activity of a type for a user having a certain value of another activity type.
Data Analysis - Correlations • Overall, the various activity metrics are all positively correlated with each other • Large fluctuations still present Fig 3. Left: Average number of distinct tags (nt), of groups (ng), and of tag assignments (a) of users having kout out-neighbors in the Flickr social network. Right: Correlations between the activity of aNobiiusers and their number of declared friends and neighbors: group memberships ng, library nb and wishlistsizes nw, averaged over users with kout out-links, vskout..
Data Analysis - Mixing Patterns • Mixing Patterns • The correlations between the activity metrics • “Assortative mixing” or “homophily” • Clear assortative trends
Data Analysis - Topic Similarity • Topic Similarity
Data Analysis - Topic Similarity • Homophily • Link selection • Social influence • How to measure it and how to relate it to the social network structure? • Number of shared items, tags, groups, books, songs
Data Analysis - Topic Similarity Fig 7. Average library and wishlist similarity as a function of the distance on the aNobii social network.
Data Analysis - Topic Similarity Fig 8. Average tag and group similarity as a function of the distance on the Flickr and Last.fm social networks.
Social Link Prediction - Methodology • Hypothesis • Social tie can be predicted based only on topical similarity • Methodology • ROC curves to test the prediction performance • AUC to measure performance evaluation • Sensitivity analysis • Prediction accuracy affected by density?
Social Link Prediction – Similarity Metrics • Tripartite graph (three mode data) • Triple – a ternary relation • Similarity measures σ(u, v) • Two mode data • aggregation
Social Link Prediction with Single Feature • MIP often outperforms the other measures. • Very good prediction on groups and libraries. Table III. AUC Values for Last.fm and aNobii Social Link Predictions
Social Link Prediction with Single Feature Fig 14. Summary comparison between the ROC curves of the best performing prediction measures
Social Link Prediction with Single Feature • Sensitivity Analysis Fig 15. Sensitivity analysis of link prediction based on the library feature in aNobii.
Social Link PredictionCombining Features for Prediction Predictive Power of Single and Combined Features (using a decision tree on a balanced set of 10,000 positive and negative samples extracted from the aNobii dataset)
Social Link PredictionLanguage Community Analysis • aNobii has two main groups • Italian community (60%) • Far East community (Hong Kong and Taiwan) (20%) • Tags in native language • Very little intersection between two different language clusters
Social Link PredictionLanguage Community Analysis Fig 16. ROC curves comparing the link prediction within different language communities in aNobii and Last.fm. The user samples are composed by the top 500 taggers in the whole system (All) or considering a single language community (Italian, Chinese, English, German). In all cases we used the MIP similarity metric using a distributional aggregation over tags.
Conclusions • Strong correlation between the social connectivity and intensity of user activities. • Link prediction • Can use any user profile features • MIP similarity – best prediction results • Library feature – very accurate • Combining features – even better accuracy • Easer in social networks that are strongly clustered by language