1 / 49

social networks analysis seminar introductory lecture #2

social networks analysis seminar introductory lecture #2. Danny Hendler and Yehonatan Cohen Advanced Topics in on-line Social Networks Analysis. Seminar schedule. Introductory lecture #1. 5/3/14. 10/3/14. Papers list published, students send their 3 preferences. 12/3/14.

Download Presentation

social networks analysis seminar introductory lecture #2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. social networks analysis seminarintroductory lecture #2 Danny Hendler and Yehonatan CohenAdvanced Topics in on-line Social Networks Analysis

  2. Seminar schedule Introductory lecture #1 5/3/14 10/3/14 Papers list published, students send their 3 preferences 12/3/14 Introductory lecture #2 All students preferences must be received 14/3/14 No seminar (Purim!) 19/3/14 26/3/14 Student talks start 11 weeks of Student talks Semesterends

  3. Talk outline • Nodes centrality • Degree • Closeness • Betweenness • Machine-learning

  4. 1 2 3 Nodes centrality 13 4 9 11 • Name the most central/significant node: 10 12 8 7 5 6

  5. Nodes centrality 6 7 12 10 8 5 4 9 13 11 1 3 • Name the most central/significant node: 2

  6. Nodes centrality • What makes a node central? • Number of connections • It is central if it disconnects the graph • High number of paths passing through the node • Proximity to all other nodes • Central node is the one whose neighbors are central • …

  7. Nodes centrality: Applications • Detection of the most popular actor in a network  Spamming / Advertising • Network vulnerability  Health care / Epidemics • Clustering similar structural positions  Recommendation systems • …

  8. Nodes centrality: Degree • In this lecture we will define the connectivity degree of a node as the number of its neighbors. • Alternative definitions are possible where you take into account • The strength of ’s connections • The direction of ’s connections • Etc.

  9. Nodes centrality: Degree • Name the most central/significant node: 5 8 6 4 1 3 7 2 9

  10. Nodes centrality: Degree 6 7 12 10 8 5 4 9 13 11 1 3 2

  11. Nodes centrality: Closeness (Reach) • Vertices that are connected to are directly reachable from . • Vertices connected to ’s neighbors are still reachable although it is harder to reach them. • – distance in hops from to . • – the reach attenuation factor • means no attenuation. All vertices in ’s connected component are equally reachable from .

  12. Nodes centrality: Closeness (Reach) 6 7 12 10 8 5 4 9 13 11 1 3 2 Reach attenuation factor

  13. Nodes centrality: Betweenness • Measures the extent to which a node lays between all others in a network. • Betweennessis used to estimate the control a node may have over the communication flows in a network. • is the number of shortest paths between and . • is the number of shortest paths between and that pass through .

  14. Nodes centrality: Beetweenness 6 7 12 10 8 5 4 9 13 11 1 3 2 Reach attenuation factor

  15. Talk outline • Nodes centrality • Machine Learning • The learning process • Classification • Evaluation

  16. Machine Learning • Herbert Alexander Simon: “Learning is any process by which a system improves performance from experience.” • “Machine Learning is concerned with computer programs that automatically improve their performance through experience. “ Herbert Simon Turing Award 1975Nobel Prize in Economics 1978

  17. Machine Learning • Learning = Improving with experience at some task • Improve over task T, • With respect to performance measure, P • Based on experience, E. Herbert Simon Turing Award 1975Nobel Prize in Economics 1978

  18. Machine Learning • Example: Spam Filtering • T: Identify Spam Emails • P: • % of spam emails that were filtered • % of ham/ (non-spam) emails that were incorrectly filtered-out • E: a database of emails that were labelled by users i.e. Feedback on emails: • “Move to Spam” , “Move to Inbox”

  19. Machine Learning Applications?

  20. Machine Learning: The learning process Model Testing Model Learning

  21. Machine Learning: The learning process Model Testing Model Learning ● Content of the email ● Number of recipients ● Size of message ● Number of attachments ● Number of "re's" in the subject line … Email Server

  22. Machine Learning: The learning process • From e-mails to feature vectors: • Textual-Based Content Features: • Email is tokenized • Each token is a feature • Meta-Features: • Number of recipients • Size of message

  23. Machine Learning: The learning process Target Attribute Vocabulary Instances Binary

  24. Machine Learning: The learning process Target Attribute Input Attributes Instances Nominal Ordinal Numeric

  25. Machine Learning: Model learning Learner Classifier

  26. Machine Learning: Model testing Database Training Set Learner

  27. categorical categorical continuous class Machine Learning: Decision trees Training Data

  28. categorical categorical continuous class Machine Learning: Decision trees Splitting Attribute Refund Yes Model: Decision Tree Training Data

  29. categorical categorical continuous class Machine Learning: Decision trees Splitting Attribute Refund Yes NO Model: Decision Tree Training Data

  30. categorical categorical continuous class Machine Learning: Decision trees Splitting Attributes Refund Yes No NO MarSt Married Model: Decision Tree Training Data

  31. categorical categorical continuous class Machine Learning: Decision trees Splitting Attributes Refund Yes No NO MarSt Married Model: Decision Tree Training Data

  32. categorical categorical continuous class Machine Learning: Decision trees Splitting Attributes Refund Yes No NO MarSt Married NO Model: Decision Tree Training Data

  33. categorical categorical continuous class Machine Learning: Decision trees Splitting Attributes Refund Yes No NO MarSt Married Single, Divorced NO Model: Decision Tree Training Data

  34. categorical categorical continuous class Machine Learning: Decision trees Splitting Attributes Refund Yes No NO MarSt Married Single, Divorced TaxInc NO > 80K Model: Decision Tree Training Data

  35. categorical categorical continuous class Machine Learning: Decision trees Splitting Attributes Refund Yes No NO MarSt Married Single, Divorced TaxInc NO > 80K YES Model: Decision Tree Training Data

  36. categorical categorical continuous class Machine Learning: Decision trees Splitting Attributes Refund Yes No NO MarSt Married Single, Divorced TaxInc NO > 80K YES Model: Decision Tree Training Data

  37. categorical categorical continuous class Machine Learning: Decision trees Splitting Attributes Refund Yes No NO MarSt Married Single, Divorced TaxInc NO > 80K < 80K YES Model: Decision Tree Training Data

  38. categorical categorical continuous class Machine Learning: Decision trees Splitting Attributes Refund Yes No NO MarSt Married Single, Divorced TaxInc NO > 80K < 80K NO YES Model: Decision Tree Training Data

  39. Machine Learning: Classification • Binary classification • (Instances, Class labels): (x1, y1), (x2, y2), ..., (xn, yn) • yi {1,-1} - valued • Classifier: provides class prediction Ŷ for an instance • Outcomes for a prediction: True class Predictedclass

  40. Machine Learning: Classification • P(Ŷ = Y): accuracy • P(Ŷ = 1 | Y = 1): true positive rate • P(Ŷ = 1 | Y = -1): false positive rate • P(Y = 1 | Ŷ = 1): precision True class Predictedclass

  41. Machine Learning: Classification • Consider diagnostic test for a disease • Test has 2 possible outcomes: • ‘positive’ = suggesting presence of disease • ‘negative’ • An individual can test either positive or negative for the disease

  42. Machine Learning: Classification Individuals without the disease Individuals with disease Test Result

  43. Call these patients “negative” Call these patients “positive” Machine Learning: Classification Test Result

  44. Call these patients “negative” Call these patients “positive” Machine Learning: Classification True Positives Test Result without the disease with the disease

  45. Call these patients “negative” Call these patients “positive” Machine Learning: Classification False Positives Test Result without the disease with the disease

  46. Call these patients “negative” Call these patients “positive” Machine Learning: Classification True negatives Test Result without the disease with the disease

  47. Call these patients “negative” Call these patients “positive” Machine Learning: Classification False negatives Test Result without the disease with the disease

  48. Machine Learning: Cross-Validation • What if we don’t have enough data to set aside a test dataset? • Cross-Validation: • Each data point is used both as train and test data. • Basic idea: • Fit model on 90% of the data; test on other 10%. • Now do this on a different 90/10 split. • Cycle through all 10 cases. • 10 “folds” a common rule of thumb.

  49. Machine Learning: Cross-Validation • Divide data into 10 equal pieces P1…P10. • Fit 10 models, each on 90% of the data. • Each data point is treated as an out-of-sample data point by exactly one of the models.

More Related