Advanced Topics in Online Social Networks Analysis Seminar Introduction

social networks analysis seminarintroductory lecture #2 Danny Hendler and Yehonatan CohenAdvanced Topics in on-line Social Networks Analysis

Seminar schedule Introductory lecture #1 5/3/14 10/3/14 Papers list published, students send their 3 preferences 12/3/14 Introductory lecture #2 All students preferences must be received 14/3/14 No seminar (Purim!) 19/3/14 26/3/14 Student talks start 11 weeks of Student talks Semesterends

Talk outline • Nodes centrality • Degree • Closeness • Betweenness • Machine-learning

1 2 3 Nodes centrality 13 4 9 11 • Name the most central/significant node: 10 12 8 7 5 6

Nodes centrality 6 7 12 10 8 5 4 9 13 11 1 3 • Name the most central/significant node: 2

Nodes centrality • What makes a node central? • Number of connections • It is central if it disconnects the graph • High number of paths passing through the node • Proximity to all other nodes • Central node is the one whose neighbors are central • …

Nodes centrality: Applications • Detection of the most popular actor in a network  Spamming / Advertising • Network vulnerability  Health care / Epidemics • Clustering similar structural positions  Recommendation systems • …

Nodes centrality: Degree • In this lecture we will define the connectivity degree of a node as the number of its neighbors. • Alternative definitions are possible where you take into account • The strength of ’s connections • The direction of ’s connections • Etc.

Nodes centrality: Degree • Name the most central/significant node: 5 8 6 4 1 3 7 2 9

Nodes centrality: Degree 6 7 12 10 8 5 4 9 13 11 1 3 2

Nodes centrality: Closeness (Reach) • Vertices that are connected to are directly reachable from . • Vertices connected to ’s neighbors are still reachable although it is harder to reach them. • – distance in hops from to . • – the reach attenuation factor • means no attenuation. All vertices in ’s connected component are equally reachable from .

Nodes centrality: Closeness (Reach) 6 7 12 10 8 5 4 9 13 11 1 3 2 Reach attenuation factor

Nodes centrality: Betweenness • Measures the extent to which a node lays between all others in a network. • Betweennessis used to estimate the control a node may have over the communication flows in a network. • is the number of shortest paths between and . • is the number of shortest paths between and that pass through .

Nodes centrality: Beetweenness 6 7 12 10 8 5 4 9 13 11 1 3 2 Reach attenuation factor

Talk outline • Nodes centrality • Machine Learning • The learning process • Classification • Evaluation

Machine Learning • Herbert Alexander Simon: “Learning is any process by which a system improves performance from experience.” • “Machine Learning is concerned with computer programs that automatically improve their performance through experience. “ Herbert Simon Turing Award 1975Nobel Prize in Economics 1978

Machine Learning • Learning = Improving with experience at some task • Improve over task T, • With respect to performance measure, P • Based on experience, E. Herbert Simon Turing Award 1975Nobel Prize in Economics 1978

Machine Learning • Example: Spam Filtering • T: Identify Spam Emails • P: • % of spam emails that were filtered • % of ham/ (non-spam) emails that were incorrectly filtered-out • E: a database of emails that were labelled by users i.e. Feedback on emails: • “Move to Spam” , “Move to Inbox”

Machine Learning Applications?

Machine Learning: The learning process Model Testing Model Learning

Machine Learning: The learning process Model Testing Model Learning ● Content of the email ● Number of recipients ● Size of message ● Number of attachments ● Number of "re's" in the subject line … Email Server

Machine Learning: The learning process • From e-mails to feature vectors: • Textual-Based Content Features: • Email is tokenized • Each token is a feature • Meta-Features: • Number of recipients • Size of message

Machine Learning: The learning process Target Attribute Vocabulary Instances Binary

Machine Learning: The learning process Target Attribute Input Attributes Instances Nominal Ordinal Numeric

Machine Learning: Model learning Learner Classifier

Machine Learning: Model testing Database Training Set Learner

categorical categorical continuous class Machine Learning: Decision trees Training Data

categorical categorical continuous class Machine Learning: Decision trees Splitting Attribute Refund Yes Model: Decision Tree Training Data

categorical categorical continuous class Machine Learning: Decision trees Splitting Attribute Refund Yes NO Model: Decision Tree Training Data

categorical categorical continuous class Machine Learning: Decision trees Splitting Attributes Refund Yes No NO MarSt Married Model: Decision Tree Training Data

categorical categorical continuous class Machine Learning: Decision trees Splitting Attributes Refund Yes No NO MarSt Married NO Model: Decision Tree Training Data

categorical categorical continuous class Machine Learning: Decision trees Splitting Attributes Refund Yes No NO MarSt Married Single, Divorced NO Model: Decision Tree Training Data

categorical categorical continuous class Machine Learning: Decision trees Splitting Attributes Refund Yes No NO MarSt Married Single, Divorced TaxInc NO > 80K Model: Decision Tree Training Data

categorical categorical continuous class Machine Learning: Decision trees Splitting Attributes Refund Yes No NO MarSt Married Single, Divorced TaxInc NO > 80K YES Model: Decision Tree Training Data

categorical categorical continuous class Machine Learning: Decision trees Splitting Attributes Refund Yes No NO MarSt Married Single, Divorced TaxInc NO > 80K < 80K YES Model: Decision Tree Training Data

categorical categorical continuous class Machine Learning: Decision trees Splitting Attributes Refund Yes No NO MarSt Married Single, Divorced TaxInc NO > 80K < 80K NO YES Model: Decision Tree Training Data

Machine Learning: Classification • Binary classification • (Instances, Class labels): (x1, y1), (x2, y2), ..., (xn, yn) • yi {1,-1} - valued • Classifier: provides class prediction Ŷ for an instance • Outcomes for a prediction: True class Predictedclass

Machine Learning: Classification • P(Ŷ = Y): accuracy • P(Ŷ = 1 | Y = 1): true positive rate • P(Ŷ = 1 | Y = -1): false positive rate • P(Y = 1 | Ŷ = 1): precision True class Predictedclass

Machine Learning: Classification • Consider diagnostic test for a disease • Test has 2 possible outcomes: • ‘positive’ = suggesting presence of disease • ‘negative’ • An individual can test either positive or negative for the disease

Machine Learning: Classification Individuals without the disease Individuals with disease Test Result

Call these patients “negative” Call these patients “positive” Machine Learning: Classification Test Result

Call these patients “negative” Call these patients “positive” Machine Learning: Classification True Positives Test Result without the disease with the disease

Call these patients “negative” Call these patients “positive” Machine Learning: Classification False Positives Test Result without the disease with the disease

Call these patients “negative” Call these patients “positive” Machine Learning: Classification True negatives Test Result without the disease with the disease

Call these patients “negative” Call these patients “positive” Machine Learning: Classification False negatives Test Result without the disease with the disease

Machine Learning: Cross-Validation • What if we don’t have enough data to set aside a test dataset? • Cross-Validation: • Each data point is used both as train and test data. • Basic idea: • Fit model on 90% of the data; test on other 10%. • Now do this on a different 90/10 split. • Cycle through all 10 cases. • 10 “folds” a common rule of thumb.

Machine Learning: Cross-Validation • Divide data into 10 equal pieces P1…P10. • Fit 10 models, each on 90% of the data. • Each data point is treated as an out-of-sample data point by exactly one of the models.

Advanced Topics in Online Social Networks Analysis Seminar Introduction

Advanced Topics in Online Social Networks Analysis Seminar Introduction

Presentation Transcript

HEIFES12 introductory seminar

HESES12 introductory seminar

HEIFES11 introductory seminar

Introductory Lecture

HESES11 introductory seminar

INTRODUCTORY LECTURE