610 likes | 707 Views
Different Approaches to Community Evolution Prediction in Blogosphere. Bogdan Gliwa, Piotr Bródka, Anna Zygmunt, Stanisław Saganowski, Przemysław Kazienko, Jarosław Kolak. Outline:. Introduction and motivation Methods of events identification in group evolution: SCGI GED
E N D
Different Approaches to Community EvolutionPrediction in Blogosphere Bogdan Gliwa, Piotr Bródka, Anna Zygmunt, Stanisław Saganowski, Przemysław Kazienko, Jarosław Kolak
Outline: • Introduction and motivation • Methods of events identification in group evolution: • SCGI • GED • Predicting group evolution in the social network • Dataset and experiment setup • Classifiers – reminder • For each method we will compare results between different classifiers • conclusion Different Approaches to Community Evolution Prediction in Blogosphere
General Idea • Predicting the future direction of community evolution allows to determine which characteristics describing communities have importance from the point of view of their future behavior. Different Approaches to Community Evolution Prediction in Blogosphere
Motivation • Making decision concerning investing in contact with members of a given community and carrying out actions to achieve a key position in it • Allows to determine effective ways of forming opinions. • Allows to protect group participants against such activities. Different Approaches to Community Evolution Prediction in Blogosphere
INTRODUCTION – prediction • Link prediction (Best investigated) link prediction problem: predicting the existence of a link (relation) between two nodes (users) within a social network. • Liben-Nowell - focused on path and common neighbours between pair of nodes • Lichtenwalter consider degrees and mutual information between them. Different Approaches to Community Evolution Prediction in Blogosphere
INTRODUCTION – prediction • Link sign prediction - Sign in this context means that predicted relation between users may be positive or negative • Symeonidislooked at paths between the node pair and use the notion of similarity to predict the sign • Leskovecuse degree and mutual information between pair of nodes for link prediction and profits from the theory of balance and status to predict the link sign. • Richter and Wai-Ho faced the very important task of churn prediction (the number of individuals moving out of a collective over a specific period of time). • Richter presented a new approach and tried to predict churn based on analysis of group behavior. This approach touches another aspect, not well studied yet, where evolution of the whole group is being predicted, i.e. which event will be next in group lifetime. Different Approaches to Community Evolution Prediction in Blogosphere
Prediction of the group evolution. • What is a group?Set of vertices which communicate to each other more frequently than with vertices outside of a group • A new method for future event prediction has been developed - based on stable group changes identification algorithm (SGCI) has been developed • Prediction in this method is being made based on • previous events in group lifetime extracted by SGCI • group profile described by group size, cohesion, leadership and density Different Approaches to Community Evolution Prediction in Blogosphere
METHODS OF EVENTS IDENTIFICATION IN GROUP EVOLUTION Different Approaches to Community Evolution Prediction in Blogosphere
SGCI algorithm Stable group changes identification • Step 1. Identification of fugitive groups in the separate time frames. • Whole network is divided into time frames • In each time frame the method of finding communities in network is applied. • Step 2. Identification of group continuation – assigning transitions between groups in neighboring time steps. After extracting communities in time frames: • The communities from neighboring time frames are matched and algorithm assigns transitions between them (from group in time frame t to group in time frame t+1) Different Approaches to Community Evolution Prediction in Blogosphere
SGCI algorithm • For each pair of non-empty groups A,B from neighboring time slots we will calculate: • MJ (- Modified Jaccard Measure) • ds (- difference in size) • If MJ(A,B) is above a defined threshold and ds(A,B) between these groups is no more than specified, then the algorithm make transition between these groups. Algorithm for stable group changes identification • Step 1. Identification of fugitive groups in the separate time frames. • Whole network is divided into time frames • In each time frame the method of finding communities in network is applied. • Step 2. Identification of group continuation – assigning transitions between groups in neighboring time steps. After extracting communities in time frames: • The communities from neighboring time frames are matched and algorithm assigns transitions between them (from group in time frame t to group in time frame t+1) Different Approaches to Community Evolution Prediction in Blogosphere
SGCI algorithm • Step 3. Separation of the stable groups (lasting for at least required subsequent time steps). In this step, the stable groups are retrieved. • Step 4. Identification of types of group changes. Assigning events describing the change of the state of the group to the transitions.Each transition between stable groups from neighboring time frames. • We can define some types of group changes (A and B are the groups from the first and the second time transitions accordingly). sh and dh are some thresholds. Different Approaches to Community Evolution Prediction in Blogosphere
SGCI algorithm • addition - when a small group attaches to a large one: • deletion - when a small group detaches from a large one: • merge - many groups in one time frame form a new larger group in the next time frame. • split – group divides into some smaller groups in next time frame. • split_merge - occurs when a group divides into at least 2 groups in the next time frame and one of this groups from next time frame is a result of merging with another from a previous time frame. • constancy - simple transition without significant change of the group size: • change size – simple transition with the change of the group size: • decay - group does not exist in next time frame. Different Approaches to Community Evolution Prediction in Blogosphere dh dh
SGCI algorithm • For a given group it is possible to match more than one event from this group to groups in the next time frame. Some events can coexist with other ones but some of them cannot. • Constancy event, can’t coexist with change size, merge or split event, • Constancy event, can coexist with addition or deletion events. • The addition and the deletion events can coexist with each event type, except the decay event.The decay event is always a single event for the group. Different Approaches to Community Evolution Prediction in Blogosphere
GED: Group Evolution Discovery For GED method we will calculate inclusion measure.It allows to evaluate the inclusion of one group in another. The inclusion of group G1 in group G2 is: • NIG1(x) – the importance of the node x in group G1. • The GED method takes into account both the quantity and quality of the group members.* Quantity can be expressed by any user importance measure e.g. centrality degree, betweenness degree, page rank, social position etc. group quantity* Different Approaches to Community Evolution Prediction in Blogosphere group quality
PREDICTING GROUP EVOLUTION IN THE SOCIAL NETWORK Different Approaches to Community Evolution Prediction in Blogosphere
Predicting Group Evolution Using SGCI Results • This approach for prediction future events of groups employs classifier. • Structure:sequences of 3 states of groups (present time and two previous times) Different Approaches to Community Evolution Prediction in Blogosphere
Predicting Group Evolution Using SGCI Results • Measures for the state of each group: • leadership - measure describing centralization in graph or group (the largest value is for star network) d - max means maximum value of degree in groupn - number of nodes in group. • density - measure expressing how many connections between nodes are present in network in relation to all possible connections between them [16] where a(i,j) =1 when there is connection from node i to node j Different Approaches to Community Evolution Prediction in Blogosphere
Predicting Group Evolution Using SGCI Results • Measures for the state of each group – cont.: • cohesion - measure characterizing strength of connections inside group in relation to connections outside group (from group members)where w is function assigning weight between nodes, G is group, n - number of nodes in group and N - number of nodes in network • group size - number of nodes in group Different Approaches to Community Evolution Prediction in Blogosphere
Predicting Group Evolution Using SGCI Results • Described sequence of group states is an input for classifier. The predicted variable is the dominating next event for the last group in a sequence. Different Approaches to Community Evolution Prediction in Blogosphere
Predicting Group Evolution Using SGCI Results • Dominating event - one of events assigned for a given group. The event with the highest priority among the assigned events is chosen. • We use the following order of events (from the highest priority to the lowest one): constancy, change size, split, merge, addition, deletion, split_merge, decay. Different Approaches to Community Evolution Prediction in Blogosphere
Predicting Group Evolution Using SGCI Results • The group Gn,1 has two assigned events: change size and addition, so the dominating event for group Gn,1 is change size because this event has higher priority. Different Approaches to Community Evolution Prediction in Blogosphere
Predicting Group Evolution Using GED Results • The idea is using a simple sequence as an input for the classifier: preceding groups profiles and events.The learnt model will be able to produce very good results even for simple classifiers • The sequences of groups sizes and events between time frames can be extracted from the GED results. • For each event - four group profiles in four previous time frames together with three associated events are identified as the input for the classification model, separately for each group. • A single group in a given time frame (Tn) is a case (instance) for classification, for which its event TnTn+1 is being predicted. Different Approaches to Community Evolution Prediction in Blogosphere
Predicting Group Evolution Using GED Results • The sequence presented in Figure 2 is used as an input for classification. • The first part of the sequence is used as input features (variables): the group profiles per timeframe and the event types between them. • The goal of classification is to predict (classify) Event TnTn+1 type – out of the six possible classes:growing, continuing, shrinking, dissolving, and splitting. Forming was excluded since it can only start the sequence. Different Approaches to Community Evolution Prediction in Blogosphere
DATASET AND EXPERIMENT SETUP • Dataset description: For tests we will use half of the data set: 04/04/2010 – 31/03/2012 Data from www.salon24.pl which contains many blogs (mainly political) Different Approaches to Community Evolution Prediction in Blogosphere 26,722 users Each time frame lasts 7 days Time frames overlap each other by 4 days 285,532 posts 4,173,457 comments Yields a total of 182 time frames
DATASET AND EXPERIMENT SETUP • Group extraction:After separation of time frames the groups were extracted in each of the time frames.Done using CPM method (CPMd version) from CFinder tool (http://www.cfinder.org/) for k=5. • CFinder is a tool for finding and visualizing overlapping dense groups of nodes in networks, based on the Clique Percolation Method (CPM) Different Approaches to Community Evolution Prediction in Blogosphere
DATASET AND EXPERIMENT SETUP • Group sizes As we can notice in Figure 3 there are many small groups and groups with size 5 outnumber other ones. Different Approaches to Community Evolution Prediction in Blogosphere
DATASET AND EXPERIMENT SETUP • Experiment setup: • SGCI method experiments were conducted using following parameters:MJ=0.5, ds=50,sh=10 and dh=0.05. • GED method was run on the dataset with all combination of GED parameters from the set:Quantity:{50%, 60%, 70%, 80%, 90%, 100%}.Quality (node importance):social position measure wasutilized (measure similar to page rank). Reminder: group quantity* Different Approaches to Community Evolution Prediction in Blogosphere group quality Reminder:
DATASET AND EXPERIMENT SETUP • Experiment setup: • To describe the group profile, its size, density, cohesion and leadership were used • Seven different classifiers were utilized with default settings • All classifiers were utilized for both approaches: SGCI and GED Different Approaches to Community Evolution Prediction in Blogosphere
DATASET AND EXPERIMENT SETUP • Classifiers – reminder: • What is a classifier?Adaptive system that learns to perform the best action given its input - identifying to which of a set of categories (sub-populations) a new observation belongs. • What Is Multiclass Classification? Each training point belongs to one of N different classes. The goal is to construct a function which, given a new data point, will correctly predict the class to which the new point belongs Different Approaches to Community Evolution Prediction in Blogosphere
DATASET AND EXPERIMENT SETUP • Multi-Class Classification: • direct approaches: • Nearest Neighbor • Generative approach & Naïve Bayes • Linear classification: • Multi-label classification: • Is it eatable? • Is it sweet? • Is it a fruit? • Is it a banana? • Is it a banana? • Is it an apple? • Is it an orange? • Is it a pineapple? • Is it a banana? • Is it yellow? • Is it sweet? • Is it round? Different Approaches to Community Evolution Prediction in Blogosphere Nested/ Hierarchical Exclusive/ Multi-class General/Structured
DATASET AND EXPERIMENT SETUP • Multi-Class Classification – real world examples: Object recognition Automated protein classification Digit recognition Different Approaches to Community Evolution Prediction in Blogosphere 10 Phoneme recognition 100 300-600 50
DATASET AND EXPERIMENT SETUP • A Simple Idea — One-vs-All Classification • Pick a good technique for building binary classifiers. Build N different binary classifiers. For the i’th classifier, let the positive examples be all the points in class i, and let the negative examples be all the points not in class i. Let fi be the i’th classifier. Classify with • single classifier is trained per class to distinguish that class from all other classes Different Approaches to Community Evolution Prediction in Blogosphere
DATASET AND EXPERIMENT SETUP LEADERSHIP >0.7 Different Approaches to Community Evolution Prediction in Blogosphere
DATASET AND EXPERIMENT SETUP LEADERSHIP >0.7 A Different Approaches to Community Evolution Prediction in Blogosphere
DATASET AND EXPERIMENT SETUP LEADERSHIP >0.7 <0.7 DENSITY A Different Approaches to Community Evolution Prediction in Blogosphere
DATASET AND EXPERIMENT SETUP LEADERSHIP >0.7 <0.7 DENSITY A <0.2 Different Approaches to Community Evolution Prediction in Blogosphere
DATASET AND EXPERIMENT SETUP LEADERSHIP >0.7 <0.7 DENSITY A <0.2 Different Approaches to Community Evolution Prediction in Blogosphere B
DATASET AND EXPERIMENT SETUP LEADERSHIP >0.7 <0.7 DENSITY A >0.2 <0.2 Different Approaches to Community Evolution Prediction in Blogosphere B COHISION
DATASET AND EXPERIMENT SETUP LEADERSHIP >0.7 <0.7 DENSITY A >0.2 <0.2 Different Approaches to Community Evolution Prediction in Blogosphere B COHISION >0.8
DATASET AND EXPERIMENT SETUP LEADERSHIP >0.7 <0.7 DENSITY A >0.2 <0.2 Different Approaches to Community Evolution Prediction in Blogosphere B COHISION >0.8 B
DATASET AND EXPERIMENT SETUP LEADERSHIP >0.7 <0.7 DENSITY A >0.2 <0.2 Different Approaches to Community Evolution Prediction in Blogosphere B COHISION <0.8 >0.8 B GROUP SIZE
DATASET AND EXPERIMENT SETUP LEADERSHIP >0.7 <0.7 DENSITY A >0.2 <0.2 Different Approaches to Community Evolution Prediction in Blogosphere B COHISION <0.8 >0.8 B GROUP SIZE <10
DATASET AND EXPERIMENT SETUP LEADERSHIP >0.7 <0.7 DENSITY A >0.2 <0.2 Different Approaches to Community Evolution Prediction in Blogosphere B COHISION <0.8 >0.8 B GROUP SIZE <10 C
DATASET AND EXPERIMENT SETUP LEADERSHIP >0.7 <0.7 DENSITY A >0.2 <0.2 Different Approaches to Community Evolution Prediction in Blogosphere B COHISION <0.8 >0.8 B GROUP SIZE <10 >10 C
DATASET AND EXPERIMENT SETUP LEADERSHIP >0.7 <0.7 DENSITY A >0.2 <0.2 Different Approaches to Community Evolution Prediction in Blogosphere B COHISION <0.8 >0.8 B GROUP SIZE <10 >10 C A
EXPERIMENTS • Predicting Group Evolution Using SGCI Results • The measure selected is F-measure (AKA F1-measure) – represents accuracy of result program's precision = program's recall = • The F measure is: Different Approaches to Community Evolution Prediction in Blogosphere
EXPERIMENTS • Predicting Group Evolution Using SGCI Results • Results of prediction events for different classifiers:Tree classifiers (J48, Random Forest and Simple CART) and Decision Table (rule classifier) achieved the best results. Notably worse results are for Naive Bayes and IBk. Different Approaches to Community Evolution Prediction in Blogosphere
EXPERIMENTS • Predicting Group Evolution Using SGCI Results – cont. • Results of classification for 3 tree classifiers. One can see that results for these 3 classifiers are very similar - the biggest difference is for the decay event which seemed harder to classify. Other events are well classified.Results of event classification for decision tree classifiers Different Approaches to Community Evolution Prediction in Blogosphere
EXPERIMENTS • Predicting Group Evolution Using SGCI Results – cont. • Results of prediction obtained by probabilistic classifiers. BayesNet achieved quite good results, but NaiveBayes much worse. • Explenatuon: this classifier is based on assumption of independence features used to classification task. This requirement is not met because some values of one measure are correlated with values of another measure e.g. generally density has higher values for smaller groups. Results of event classification for probabilistic classifiers Different Approaches to Community Evolution Prediction in Blogosphere
EXPERIMENTS • Predicting Group Evolution Using SGCI Results – cont. • Here we can see results for other tested classifiers. • Decay event is significantly worse classified than other events (as seen before). • The Ibk classifier accomplished worse results of prediction than DecisionTable one. • For Ibk classifier the hardest event to classify seemed to be constancy. Results of event classification for other classifiers Different Approaches to Community Evolution Prediction in Blogosphere