490 likes | 860 Views
Evolution Dynamics in social networks. Ashwin Bahulkar Advisor & Collaborators: Boleslaw K. Szymanski , Kevin Chan 1 , Omar Lizardo 2 1 US Army Research Laboratory 2 University of Notre Dame, Notre Dame, IN, USA supported by Network Science CTA, ARL. Overview.
E N D
Evolution Dynamics in social networks Ashwin Bahulkar Advisor & Collaborators: Boleslaw K. Szymanski,Kevin Chan1, Omar Lizardo2 1US Army Research Laboratory 2University of Notre Dame, Notre Dame, IN, USA supported by Network Science CTA, ARL
Overview • Link Formation and Dissolution in attribute-rich networks • Can we predict the state of a network from node attributes? • Which node attributes can predict formationand dissolution of edges in networks. • Coevolution of node-aligned multiple layers in networks • Multiple layers: several networks sharing the same node-set, different relations among nodes. • Coevolution: Do edges occur in one network before they do so in another? • Groups and Influence
Motivation • Find out which factors affect evolution of networks • Sociological interests: influence policy making in organizations, based on factors • Bring stability to networks in organizations through policies, if desired • Infer cause of instability in networks • Build strong, stable teams in organizations • Commercial interests: influence advertisement, marketing and reach-out strategies
Part 1 Link Formation and Dissolution in Attribute-rich Networks
Introduction • How much does knowledge of node-attributes improve link formation and dissolution prediction? • How should these attributes be used to make predictions? • Find which attributes are correlated with formationof new links • We introduce the preference model • Find which attributes are correlated with dissolution and persistence of existing links. • Track network stability with link prediction
Link Prediction • Link Formation Prediction: • Given is a social network, which evolves over time and this evolution is recorded in a sequence of network snapshots. • Some new edges are created, some old edges get dissolved and some node are removed from one network snapshot to another. • At any given snapshot, which edges would be created in the future snapshot? • Highly unbalanced classification, very few potential links are created Training set Test set visiblevisible visiblevisible hidden hidden New links New links Link Dissolution Prediction: similar, predict which links would dissolve.
Related Work • Existing link prediction approaches: • Topology based link predictors • Machine learning based • Markov model or graphical model based • Little work on attribute-rich networks, attributes are used in very simplistic manner • Little work on dissolution prediction • Attribute-rich data has become recently available to us, although the size of networks is relatively small
Attribute-rich data: NetSense • Nodes: Students from University of NotreDame, from Freshman to Junior years, around 2 years, 200 of them. • Data collected: • Call and message logs between studentsin the study. • Contact data based on bluetooth recorded proximity. • Nominations of significant peers, opinions on social & political issues, student background and university activities for every student. • Frequency: • Nominations and opinions were collected in the form of surveys at the beginning of every semester.
Evolving NetSense Networks Network snapshots are taken for every semester of the year: Fall and Spring. • Behavioral Networks : Based on calls and texts made in the semester. An edge exists if there is a call or text exchange between two nodes. Typical network size ranges from 150-200 nodes and 200-350 edges. We have snapshots for 4 semesters. • Nominative Network: Based on survey answers by students to “Who are your top contacts”.
Node Attributes • Student background: • Major in the Notre Dame programs • Behavioral traits • Family income, race and religion • Opinions on: • Politics • Abortion and marijuana legalization • Homosexuality and gay marriage • Habits and Lifestyle: • Drinking habits • Time spent on weekly activities: studying, partyingetc.
Attributes for link prediction • We use machine learning for link prediction • The Homophily Model: • “Birds of a feather flock together” • Node n1, n2; attribute values a1 = a1; feature value = 1 • Node n1, n2; attribute values a1 ≠ a2; feature value = 0 • Does this work? Not so much. • Why? Nodes have different “preferences” for different attribute values • We introduce the “preference model”.
A case for the preference model • Different groups of people have different attributes • Still, difficult to generalize preferences on a group-basis • Different nodes would have different preferences for attributes Values > 1 indicate preference for, values < indicates preference against.
Intuition of the Preference Model • Population: 60% liberals, 40% conservatives • Node 1: liberal; 90% contacts liberals, 10% conservatives • Strong bias towards liberals, strong bias against conservatives • Node 2: conservative; 50% liberals, 50% conservatives • Only slight bias towards conservatives • We capture the bias, or “anomaly” for every attribute value, for each node, with reference to the population.
Individual Preferences of Nodes • Features for machine learning: • Node Preferences -> Edge Preferences • Some network features: number of common neighbors • Node preference feature: • For an edge with nodes n1 and n2, for attribute a: • Feature-value (a) = n1->preference(n2.a) * n2-> preference(n1.a). • Calculate preference of node n1 for attribute-value v: • n1 has n contacts with attribute-value v. • Calculate Z-score of having ncontacts • Z-Score= (value – expected mean) / standard deviation • Obtain scores, which can vary from -3.4 to +3.4 • Convert to a range 0 - 1
Results with the Preference method • Link Prediction: We get about 90% recall with good accuracy, using SVM, Linear and Logistic regression. • Link Dissolution Prediction: 80-90% accuracy • Below are the plots of recall vs. false positives for different thresholds in linear regression.
Results and Ranking of Attributes • Ranking of attributes: Leave-feature out, weight in linear regression Nomination Behavior Political Views Parental Income Common Neighbors Time Volunteering Time Exercising Gay Marriage Legalization Political Views Parental Income Views on homosexuality Time Camping Link Creation Views on Homosexuality Political Views Time socializing Time Partying Marijuana Legalization • Time socializing • Time in Clubs • Marijuana Legalization • Time Exercising • Time Studying Link Dissolution
Track Network Stability by Link Prediction • Networks evolve over time • Patterns of new Link formation also change over time • We look at the network of researchers studying Leishmaniasis, a rare disease • Network spreads over several countries, including, Brazil, India, US, European Union countries • From 1980 to 2015, leaders of research changed over time, nature of link formation also changed • We use link prediction to track the change
Experiment • Perform link prediction over the period 1980 to 2015, divided into seven 5-year snapshots • Perform link prediction using older snapshots, see if the models still apply • Perform link prediction only on newly emerging nodes, and compare with older nodes • Features: • Network topology features, common areas of research, country of origin, recency and strength of collaboration • Network size: • Ranges from 700 to 5000 nodes, and 1200 to 34,000 edges.
Results • Using the most recent snapshot, we get recall values between 60-80% • Using older snapshots, recall and accuracy values both drop, about a 8-10% drop • Edges between old nodes vs. new nodes: • Till 2000, recall of edges between old and new nodes is equivalent. • After 2000, recall of edges with new nodes is very poor, increases a little by 2015 • Possible large scale disruption in 2000 in the network • Leadership in research passes from USA, Europe to India, Brazil, and focus shifts from fundamental research to more diagnostic and trials based work
Part 2 Coevolution of a Multilayer Node-aligned Network whose Layers Represent Different Social Relations
Coevolution of Multiple Layers in Social Networks • Continuously evolving cognitive and behavioral layers. • Are behavioral edges formed before nominative edges are formed? • How likely does behavioral edge dissolve after the corresponding edge disappears in the nominative network? Nominative network (red edges) and behavioral network(green edges).
Questions • Are behavioraledges formed before nomination edges are formed? • How likely does behavioral edge dissolve after the corresponding edge disappears in the nomination network? • Are there any patterns of communication decay following link dissolution in the nomination network? • Do symmetric nominations differ from asymmetric nominations?
Dataset: NetSense • NetSense communication and nomination data is used. • Also, bluetooth interactions data is used. • BehavioralLayers : Layer based on communication edges, and based on bluetooth proximity measures. Bluetooth proximity layer is much more dense. We have snapshots for 4 semesters. • Nominative Layer: Based on survey answers by students to “Who are your top contacts”.
Behavior Before Nomination • We can predict future edges with a good accuracy and recall, based on number of calls and texts. • Higher communication corresponds to edge formation in the next semester. Prediction accuracy and recall: 70-80%
Behavior After Nomination • Does behavior changes after nomination? Yes, they do. • Contacts have much higher communication than non-contacts who communicate. • Newly formed edges communicate less than older edges. Prediction accuracy and recall: 70-80%
Temporal Features of Nomination • Collocations of edges in nominative layer: • Significant collocations on weekends and weekday evenings • Collocations of edges not in the nominative layer • Most collocations on weekdays, largely during the working times of day
Slow Progression of Nomination • Collocation -> Communication -> Nomination • Coevolution of behavioral networks with each other • Higher collocation often leads to creation of edge in communication layer • Out of the new communication edges, nodes whose edges represent more talking and collocations, show nomination subsequently. • Ones which drop contact, don’t show nomination.
Behavior Decay After Nomination Dissolution • Typically, 50-55% of behavior edges not connected in the nominative network stop communicating in the next semester. • However, out of the edges which dissolve contact, 70-75% of them stop communicating in the next semester. • Implication: Dissolution of nomination is faster than formation of nomination.
Discussion and Conclusion • Good predictions with using the “preference model” • Important attributes: income and political views, known in sociological literature, never tested on real-life datasets • Future: apply to bigger, more diverse datasets • We observe the process and speed of formation and dissolution of links is different • Sociological theories with respect to coevolution of behavior and nomination have been verified • Limitations: small size of data, college campus, several other factors affect links
Overview • We study the formation of stable groups in a face-to-face interactions network based on bluetooth proximity records. • We discuss a method of identifying small, stable groups which have face-to-face meetings. • We discuss how node attributes play a role in the formation of groups • We look at the different purposes of groups which can be identified, from temporal behavior of group meetings.
Which Groups do We Aim to Discover? • We discover groups with the following properties: • Stable over time, a semester for example • Repeated, face-to-face interactions • How is this different from community detection: • Based on non-aggregated F2F interactions • Bluetooth interactions network is very dense with several overlapping communities • Discover groups directly from interactions.
How are Groups Discovered? • Method: • Groups discovered for every semester • Get multiple snapshots of network over time, of 10 minutes • Identify connected components in each snapshot. This is a potential group. • However, • Members might be missing on certain meetings • Visiting members might be present on certain meetings • Group might evolve a little during the course of time • Merge these components across time using certain rules.
Creation of Groups • Merging connected components into groups • Input: Connected components, with number of meetings • Output: Groups • Parameters: intersection threshold, membership threshold, member-intersection threshold. • Method: • Merge groups iteratively, using hierarchical clustering • Merge a pair of groups if: • Intersection of members > intersection threshold • Potential member : Members having attendance % > membership threshold, • Intersection of potential members > membership – intersection threshold • Merge by adding intersecting and potential members of both groups. • Stop merging when no new merges can be made
Merging of components • intersection threshold = 0.6, membership threshold = 0.3, • member-intersection threshold = 0.5 Regular members Not regular members The membership threshold decides this value No merge No merge New group- A, B, C, D, E. Component 1 Component 2 Component 1 Component 2 Component 1 Component 2 A B C D E F A B C D E F A B C D E F Intersection = 4/6 = 0.66 # potential members = 4 Intersection of potential members = 3/4 = 0.75 Intersection = 4/6 = 0.66 # potential members = 3 Intersection of potential members = 1/3 = 0.33 Intersection = 2/6 = 0.33
Groups Discovered • With certain values on the thresholds: • intersection threshold = 0.6, membership threshold = 0.3, member-intersection threshold = 0.5 • Number of nodes: 200 • Number of groups: 256 • Average Group Size: 4.1 • Average member attendance: 91% • Higher levels of thresholds: smaller groups with higher attendance • Lower levels of thresholds: larger groups with lower attendance
Thresholds and Size of Groups Higher threshold levels, smaller groups are formed Normal threshold levels Lower threshold levels, larger group New group- A, B, C, D, E, F. New group- B, C, D, E. New group- A, B, C, D, E. Component 1 Component 2 Component 1 Component 2 Component 1 Component 2 A B C D E F B C D E F A B C D E F A, B, C, D, E, all are either intersecting or regular members. A, B, C, D, E, F all are either intersecting or regular members. B, C, D, E all are either intersecting or regular members.
Group Relations and Path Length • In the communication layer of the network: • Group Contacts are 2.5 hops away from each other • A random node pair is 3.9 hops away • In the contact nomination layer: • Group Contacts are 3.6 hops away • Communication contacts are 1.2 hops away • Implication: Although group contacts not as “close” as communication of nomination contacts, they are are “closer” than random pair of nodes.
Group Contacts and Attributes • How many attributes do group contacts agree on? • Out of 14 attributes, group contacts agree on 7 attributes • While, communication contacts agree on 8 attributes • Random contacts agree on 5 attributes only • Nodes have less bias for group contacts, than communication contacts, for several attributes, like parental income, drinking habits, views on abortion and homosexuality. • However, biases exist to a certain extent. • Implication: groups are biased, but not as much as communication contacts.
Groups with Different Purposes • Social and work-related purpose • Groups are clustered based on % of time spent on weekends and weekday evenings • 2 well separated clusters emerge. • Social groups have 40% of meetups on weekends and weekday evenings • Work groups have about 10%of meetups on weekends and weekday evenings • Social groups are more biased than work groups on several attributes.
Clustering of Groups on Attributes • Groups are clustered based on attributes • Vector for group: % of members with particular attribute value. All values form the vector. • Neat, though based clusters of groups are seen. • With k=2: • Cluster one: Groups with majority of members who are mostly conservative, rich, against homosexualityand abortion. • Cluster two : Groups with majority of members who are equally distributed on political and social views, with a liberal bias. • With k=3: • Cluster two splits into two more clusters, one has strongly liberal members while the other contains less liberal members.
Conclusion • We demonstrate a methodology for identifying groups of stable and frequently interacting nodes. • Group relationships are stronger than random edges, however, are more fragile than communication relationships. • Groups are biased on several attributes. • Groups have different purposes: social and work related. • Groups can be well clustered based on attributes: into conservative and liberal leaning groups.
References • Analysis of Link Formation, Persistence and Dissolution in NetSense Data, Advances in Social Networks Analysis and Mining (ASONAM), IEEE/ACM International Conference, 2016 • Influence of Personal Preferences on Link Dynamics in Social Networks, Complexity, 2017 • Network Analysis to Support Public Health: Evolution of Collaboration among Leishmaniasis Researchers, Scientometrics, 2017 • Co-evolution of two networks representing different social relations in NetSense, International Workshop on Complex Networks and their Applications, Springer, 2016 • Coevolution of a multilayer node-aligned network whose layers represent different social relations, Computational Social Networks, Volume 4, 2017 • Impact of Attributes on Group Formation, Proc. IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), Barcelona, Spain, August 28, 2018, pp. 1250-1257.
Thank you! Questions?