1 / 30

Saswat Mishra sxm111131

The Link Prediction Problem for Social Networks David Libel- Nowell , MIT John Klienberg , Cornell. Saswat Mishra sxm111131. Summary. The “Link Prediction Problem”

brownsylvia
Download Presentation

Saswat Mishra sxm111131

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Link Prediction Problem for Social NetworksDavid Libel-Nowell, MITJohn Klienberg, Cornell Saswat Mishra sxm111131

  2. Summary • The “Link Prediction Problem” • Given a snapshot of a social network, can we infer which new interactions among its members are likely to occur in the near future? • Based on “proximity” of nodes in a network

  3. Introduction • Natural examples of social networks: Nodes = people/entities Edges = interaction/ collaboration

  4. Motivation • Understanding how social networks evolve • The link prediction problem • Given a snapshot of a social network at time t, we seek to accurately predict the edges that will be added to the network during the interval (t, t’) ?

  5. Why? • To suggest interactions or collaborations that haven’t yet been utilized within an organization • To monitor terrorist networks - to deduce possible interaction between terrorists (without direct evidence) • Used in Facebook and Linked In to suggest friends • Open Question: How does Facebook do it? (friends of friends, same school, manually…)

  6. Motivation • Co-authorship network for scientists • Scientists who are “close” in the network will have common colleagues & circles – likely to collaborate Caveat: Scientists who have never collaborated might in future - hard to predict • Goal: make that intuitive notion precise; understand which measures of “proximity” lead to accurate predictions D B A C

  7. Goals • Present measures of proximity • Understand relative effectiveness of network proximity measures (adapted from graph theory, CS, social sciences) • Prove that prediction by proximity outperforms random predictions by a factor of 40 to 50 • Prove that subtle measures outperform more direct measures

  8. Data and Experimental Setup • Co-authorship network (G) from “author list” of the physics e-Print arXiv (www.arxiv.org) • Took 5 such networks from 5 sections of the print D B B A A C C Training interval [1994,1996] Ktraining = 3 Test interval [1997,1999] Ktest = 3 Core: set of authors who have at least 3 papers during both training and test G[1994,1996] = Gcollab = (A,Eold) Enew = new collaborations (edges)

  9. Data

  10. Methods for Link Prediction • Take the input graph during training period Gcollab • Pick a pair of nodes (x, y) • Assign a connection weight score(x, y) • Make a list in descending order of score • score is a measure of proximity • Any ideas for measures?

  11. Proximity Measures for Link Prediction

  12. Graph distance & Common Neighbors • Graph distance: (Negated) length of shortest path between x and y • Common Neighbors: A and C have 2 common neighbors, more likely to collaborate E D B A C E D B A C

  13. Jaccard’s coefficient and Adamic / Adar • Jaccard’s coefficient: same as common neighbors, adjusted for degree • Adamic / Adar: weighting rarer neighbors more heavily E D B A C

  14. Preferential Attachment • Probability that a new collaboration involves x is proportional to T(x), current neighbors of x • score (x, y) :=

  15. Considering all paths: Katz • Katz: measure that sums over the collection of paths, exponentially damped by length (to count short paths heavily) • β is chosen to be a very small value (for dampening) E D B A C

  16. Hitting time, PageRank • Hitting time: expected number of steps for a random walk starting at x to reach y • Commute time: • If y has a large stationary probability, Hx,y is small. To counterbalance, we can normalize • PageRank: to cut down on long random walks, walk can return to x with a probablity α at every step y

  17. SimRank • Defined by this recursive definition: two nodes are similar to the extent that they are joined by similar neighbors

  18. Low-rank approximation • Treat the graph as an adjacency matrix • Compute the rank-k matrix Mk (noise-reduction) • x is a row, y is a row, score(x, y) = inner product of rows r(x) and r(y)

  19. Unseen bigrams and Clustering • Unseen bigrams: Derived from language modeling • Estimating frequency of unseen bigrams – pairs of words (nodes here) that co-occur in a test corpus but not in the training corpus • Clustering: deleting tenuous edges in Gcollab through a clustering procedure and running predictors on the “cleaned-up” subgraph

  20. Results • The results are presented as: • 1. Factor improvement of proposed predictors over • Random predictor • Graph distance predictor • Common neighbors predictor • 2. Relative performance vs. the above predictors • 3. Common Predictions

  21. Factor Improvement of different measures

  22. Factor Improvement - meta approaches

  23. Relative performance vs. Random Predictions

  24. vs. graph distance predictor, vs. common neighbors predictor • a

  25. Common Predictions • a

  26. Conclusions • No single clear winner • Many outperform the random predictor => there is useful information in the network topology • Katz + clustering + low-rank approximation perform significantly well • Some simple measures i.e. common neighbors and Adamic/ Adar perform well

  27. Critique • Even the best predictor (Katz on gr-qc) is correct on only 16% of predictions • How good is that? • Treat all collaborations equally. Perhaps, treating recent collaborations as more important than older ones will help?

  28. References • Lada A. Adamic and Eytan Adar. Friends and neighbors on the web. Social Networks, 25(3):211{230, July 2003. • A. L. Barabasi, H. Jeong, Z. N eda, E. Rav asz, A. Schubert, and T. Vicsek. Evolution of the social network of scientist collaboration. Physica A, 311(3{4):590{614, 2002. • Sergey Brin and Lawrence Page. The anatomy of a large-scale hyper textual Web search engine Computer Networks and ISDN Systems, 30(1{7):107{117, 1998. • Rodrigo De Castro and Jerrold W. Grossman. F amous trails to Paul Erdos. Mathematical Intelligencer, 21(3):51{63, 1999.

  29. Question Question???

  30. Thank You

More Related