240 likes | 378 Views
Using Transactional Information to Predict Link Strength in Online Social Networks. Indika Kahanda and Jennifer Neville Purdue University. Online social networks (OSNs). Explosive growth of online communities enables study of social processes and behavior at a larger scale than ever before
E N D
Using Transactional Information to Predict Link Strength in Online Social Networks • Indika Kahanda and Jennifer Neville • Purdue University
Online social networks (OSNs) • Explosive growth of online communities enables study of social processes and behavior at a larger scale than ever before • Facebook: 200 mil active users • MySpace: 125 mil active users • LinkedIn: 40 mil users • User-contributed data is much more extensive than hand-collected networks previously studied in social science
OSNs are larger and more heterogeneous than manually-collected social networks Purdue Facebook Network UNC National Longitudinal Study of Adolescent Health In-School Survey Min degree=1 Median degree=7 Max degree=10 Min degree=1 Median degree=81 Max degree=2173
High median degree implies the presence of many weak, or spurious, friendship links. Conjecture: Strong relationships can be identified automatically from transactional link information
OSNs contain additional information about user interactions Wall communications Photo postings Group membership
Purdue Facebook network • 56061 public users in March 2008 • Undergrads, grad students, faculty, staff, alumni
Information about strong relationships • Top Friends application allows users to nominate some of their friends as “best friends” • This provides us with positive and negative training examples of strong relationships • 4900 Purdue users have Top Friends application visible publicly (9%) • 17,393 Purdue users are nominated as a Top Friend • Max out-degree=40 max in-degree=14
Automatically identifying top friends • Formulate this as a link strength prediction task • For each friend pair (u,v), predict whether they are “top friends” given their attributes, interactions, and network information. • Use supervised learning methods: Logistic regression, naïve Bayes classifiers, and bagged decision tress • Consider features from four different categories: attributesimilarity, topological connectivity, transactional connectivity, and network-transactional connectivity. • Evaluate on data from the public Purdue Facebook network • Use basic attribute information from profile, friendship links, wall postings, picture postings, group memberships, and “top friend” nominations
Related work • Link prediction • Focuses on predicting future links between any (u,v) pair in a network with a single edge type (i.e., friendship) • Previous methods primarily use attribute similarity features (e.g., Taskar et al. ‘03) or topological features of the network (e.g., Liben-Nowell & Kleinberg ‘04) • Adamic and Adar (‘03) used ancillary network information for link prediction but they focused on similarity-based features instead of transactions/interactions • Pruning spurious links • Singh et al. (’05) and Hill et al. (‘07) sample nodes and edges based on structural properties but they do not consider transactional information
(1) Attribute-based features U V Assess attribute similarity between users (e.g., number of matches) Gender: Male Religious: Christian Political: Moderate Gender: Male Religious: Agnostic Political: Conservative (2) Topological features Assess connectivity of users in friendship network (e.g., number of common neighbors) U V Feature types
(3) Transactional features Wall post Assess transactional activity between user pairs (e.g., number of bi-directional posts) U V Photo post Same group (4) Network-transactional features Assess connectivity of users in transaction networks (i.e., moderate transactional activity by interactions with other users) U V Feature types
Methodology • Models • Bagged decision trees, naïve Bayes classifiers, and logistic regression • Experiments • Feature ranking • Feature type comparison • Link type comparison • Overall classification • Performance measure: area under the ROC curve (AUC) • Measures the quality of (probability) rankings produced by the model
Facebook sample • Random sample of 500 users with top friends application • Consider all friends of those 500 users • Top friends positive training example • Other friends negative training example • Restrict attention to pairs that have values for 4 common attributes • Final sample consisted of 8766 linked friends with 896 (10.2%) positive examples
Experiment 1: Feature rankings • Compare relative importance of each of the 50 features • Measures: • Information gain • Chi-square statistic • Compute average rank of each feature and look at top 15: • 12 are network-transactional features, 3 are transactional • 12 use wall information, 3 use picture information
Experiment 2: Feature type comparison • Ablation study using features of each type separately • Attribute-based • Topological • Transactional • Network-transactional • Network-transactional features achieve best performance Network-transactional AUC=84% Transactional AUC=74% Topological AUC=75% Attribute-based AUC=50%
Experiment 3: Link type comparison • Ablation study using data from each link type separately (all features) • Wall • Picture • Groups • Friendship • Wall information results in best performance Friends AUC=77% Wall AUC=82% Group AUC=63% Why doesn’t picture information improve performance?… sparsity. 28% of user pairs have 1 wall link4% of user pairs have 1 picture link Picture AUC=62%
Experiment 4: Overall classification results • Uses 50 features, compares performance of three different models • Bagged decision trees achieve best performance • Network-transactional features account for 97% of the performance observed using all features Bagged Decision Trees AUC=87% Naïve Bayes AUC=81% Logistic Regression AUC=82%
Conclusion • Formulated a link strength prediction task to automatically identify stronger relationships among existing friendships. • Compared the utility of attribute-based, topological, transactional, and network-transactional features • Showed that in addition to good accuracy overall, network-transactional features had the largest impact on model performance • Results indicate that transactional events are useful for predicting link strength • However, it is also necessary to consider the transactional events in the context of user behavior within the larger social network
Future work • Exploit temporal aspect of transactions to improve predictions • Address the more general link-strength prediction task by formulating a latent variable model
Thank you! • Indika Kahanda: • ikahanda@purdue.edu • http://web.ics.purdue.edu/~ikahanda/ • Jennifer Neville: • neville@cs.purdue.edu • http://www.cs.purdue.edu/~neville/ Questions?