150 likes | 263 Views
DISTINGUISHING TOPICAL AND SOCIAL GROUPS BASED ON COMMON IDENTITY AND BOND THEORY. Przemyslaw A. Grabowicz Luca M. Aiello Vìctor M. Eguìluz Alejandro Jaimes. We built a classifier that distinguishes if a given set of people is either a social or a topical group. What are social and
E N D
DISTINGUISHING TOPICAL AND SOCIAL GROUPS BASED ON COMMON IDENTITY AND BOND THEORY Przemyslaw A. Grabowicz Luca M. Aiello Vìctor M. Eguìluz Alejandro Jaimes
We built a classifier that distinguishes if a given set of people is either a socialor a topicalgroup.
What are social and topical groups?
Social groups Friends www.news.com.au
Characteristics of social groups • Direct reciprocityof interactions • Small talk (broad range of topics in conversations)
Topical groups A camera club http://www.flickr.com/photos/59571907@N03/5545401056/
Characteristics of topical groups • General reciprocity of interactions • Conversations on a narrow range of topics
Two types of metrics Based on reciprocityof interactions Based on diversity of topics (Shannon’s entropy)
Reciprocity metrics 1. intra-group reciprocity intra-reciprocity: -------------------------------------------------- inter-reciprocity: 2.
Diversity of topics’ metrics 1. H(g)– Shannon’s entropy of terms/tags normalized by the average for all groups having the same number of terms 2.
Dataset – Flickr, 2008 Tags extracted from photos: • from a group pool • commented in a group • favorited in a group
Human labeling of groups Consists of exploring: • text of comments • group profiles • photos • tags • maps
Results Reciprocity Normalized entropy
Classifier AUC 0.75 Accuracy 0.76 1. • hg for comments 1 • tg for comments 2 • ug for comments 3 AUC 0.88 Accuracy 0.80 AUC 0.87 Accuracy 0.80 • hg for favorites 4 2. • bg for comments 5
Conclusions Findings: • The metrics work as the theory predicts • Agreement and accuracy depend on value of the score • Groups found with a community detection algorithm are more social than declared groups Future work: • Entropy is a simple measure, could be replaced with something what understands text • NLP • Binary classifier has its limitations • multi-label classification