1 / 66

Nonparametric Link Prediction in Dynamic Graphs

Nonparametric Link Prediction in Dynamic Graphs. Purnamrita Sarkar (UC Berkeley) Deepayan Chakrabarti (Facebook) Michael Jordan (UC Berkeley). Link Prediction. Who is most likely to be interact with a given node?. Should Facebook suggest Alice as a friend for Bob ?. Alice. Bob.

deion
Download Presentation

Nonparametric Link Prediction in Dynamic Graphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Nonparametric Link Prediction in Dynamic Graphs PurnamritaSarkar (UC Berkeley) DeepayanChakrabarti (Facebook) Michael Jordan (UC Berkeley)

  2. Link Prediction • Who is most likely to be interact with a given node? Should Facebook suggest Alice as a friend for Bob? Alice Bob Friend suggestion in Facebook

  3. Link Prediction Alice Should Netflix suggest this movie to Alice? Bob Charlie Movie recommendation in Netflix

  4. Link Prediction • Prediction using simple features • degree of a node • number of common neighbors • last time a link appeared • What if the graph is dynamic?

  5. Related Work • Generative models • Exp. family random graph models [Hanneke+/’06] • Dynamics in latent space [Sarkar+/’05] • Extension of mixed membership block models [Fu+/10] • Other approaches • Autoregressive models for links [Huang+/09] • Extensions of static features [Tylenda+/09]

  6. Goal • Link Prediction • incorporating graph dynamics, • requiring weak modeling assumptions, • allowing fast predictions, • and offering consistency guarantees.

  7. Outline • Model • Estimator • Consistency • Scalability • Experiments

  8. The Link Prediction Problem in Dynamic Graphs YT+1 (i,j)=? Y1 (i,j)=1 Y2 (i,j)=0 …… G2 G1 YT+1(i,j) | G1,G2, …,GT ~ Bernoulli(gG1,G2,…GT(i,j)) GT+1 Features of previous graphsand this pair of nodes Edge in T+1

  9. Including graph-based features • Example set of features for pair (i,j): • cn(i,j) (common neighbors) • ℓℓ(i,j) (last time a link was formed) • deg(j) • Represent dynamics using “datacubes” of these features. • ≈ multi-dimensional histogram on binned feature values ηt = #pairs in Gt with these features 1 ≤ cn ≤ 33 ≤ deg ≤ 61 ≤ ℓℓ ≤ 2 high ηt+/ηt this feature combination is more likely to create a new edge at time t+1 cn deg ηt+ = #pairs in Gt with these features, which had an edge in Gt+1 ℓℓ

  10. Including graph-based features 1 ≤ cn(i,j) ≤ 3 3 ≤ deg(i,j) ≤ 6 1 ≤ ℓℓ (i,j) ≤ 2 • How do we form these datacubes? • Vanilla idea: One datacube for Gt→Gt+1aggregated over all pairs (i,j) • Does not allow for differently evolving communities YT+1 (i,j)=? Y2 (i,j)=0 Y1 (i,j)=1 …… G2 G1 GT

  11. Our Model 1 ≤ cn(i,j) ≤ 3 3 ≤ deg(i,j) ≤ 6 1 ≤ ℓℓ (i,j) ≤ 2 • How do we form these datacubes? • Our Model: One datacube for each neighborhood • Captures local evolution Y2 (i,j)=0 YT+1 (i,j)=? Y1 (i,j)=1 …… G2 G1 GT

  12. Our Model 1 ≤ cn(i,j) ≤ 33 ≤ deg(i,j) ≤ 61 ≤ ℓℓ (i,j) ≤ 2 Neighborhood Nt(i)= nodes within 2 hops Features extracted from (Nt-p,…Nt) Datacube Number of node pairs- with feature s- in the neighborhood of i- at time t Number of node pairs- with feature s- in the neighborhood of i- at time t- which got connected at time t+1

  13. Our Model • Datacubedt(i) captures graph evolution • in the local neighborhood of a node • in the recent past • Model: • What is g(.)? g(dt(i), st(i,j)) YT+1(i,j) | G1,G2, …,GT ~ Bernoulli( gG1,G2,…GT(i,j)) Features of the pair Local evolution patterns

  14. Outline • Model • Estimator • Consistency • Scalability • Experiments

  15. Kernel Estimator for g G1 G2 …… { { { { { { { { { { { { { { { { { { { { { { { { { { query data-cube at T-1 and feature vector at time T compute similarities GT-1 GT-2 … … … GT datacube, feature pair t=3 datacube, feature pair t=2 datacube, feature pair t=1

  16. K( , )I{ == } Kernel Estimator for g • Factorize the similarity function • Allows computation of g(.) via simple lookups } } }

  17. Kernel Estimator for g G1 G2 …… compute similarities only between data cubes w1 η1 , η1+ w2 η2 , η2+ GT-1 η3 , η3+ w3 datacubes t=3 datacubes t=2 datacubes t=1 GT-2 η4 , η4+ w4 GT

  18. K( , )I{ == } Kernel Estimator for g • Factorize the similarity function • Allows computation of g(.) via simple lookups • What is K( , )? } } }

  19. Similarity between two datacubes Idea 1 • For each cell s, take(η1+/η1 – η2+/η2)2 and sum • Problem: • Magnitude of η is ignored • 5/10 and 50/100 are treated equally • Consider the distribution η1 , η1+ η2 , η2+

  20. Similarity between two datacubes Idea 2 • For each cell s, compute posterior distribution of edge creation prob. • dist = total variation distance between distributions • summed over all cells η1 , η1+ η2 , η2+ 0<b<1 As b0, K( , ) 0 unless dist( , ) =0

  21. Kernel Estimator for g Want to show:

  22. Outline • Model • Estimator • Consistency • Scalability • Experiments

  23. Consistency of Estimator • Lemma 1: As T→∞, for some R>0, • Proof using: As T→∞,

  24. Consistency of Estimator • Lemma 2: As T→∞,

  25. Consistency of Estimator • Assumption: finite graph • Proof sketch: • Dynamics are Markovian with finite state space • the chain must eventually enter a closed, irreducible communication class • geometric ergodicity if class is aperiodic(if not, more complicated…) • strong mixing with exponential decay • variances decay as o(1/T)

  26. Consistency of Estimator • Theorem: • Proof Sketch: • for some R>0 • So

  27. Outline • Model • Estimator • Consistency • Scalability • Experiments

  28. Scalability • Full solution: • Summing over all n datacubesfor all T timesteps • Infeasible • Approximate solution: • Sum over nearest neighbors of query datacube • How do we find nearest neighbors? • Locality Sensitive Hashing (LSH)[Indyk+/98, Broder+/98]

  29. Using LSH • Devise a hashing function for datacubes such that • “Similar” datacubestend to be hashed to the same bucket • “Similar” = small total variation distance between cells of datacubes

  30. Using LSH • Step 1: Map datacubes to bit vectors Use B1 buckets to discretize [0,1] Use B2 bits for each bucket Total M*B1*B2 bits, where M = max number of occupied cells << total number of cells For probability mass p the first bits are set to 1

  31. Using LSH • Step 1: Map datacubes to bit vectors • Total variation distance L1 distance between distributions Hamming distance between vectors • Step 2: Hash function = k out of MB1B2 bits

  32. Fast Search Using LSH 0000 0001 1111111111000000000111111111000 0011 10000101000011100001101010000 . . . . 1011 10101010000011100001101010000 101010101110111111011010111110 1111111111000000000111111111001 1111

  33. Outline • Model • Estimator • Consistency • Scalability • Experiments

  34. Experiments • Baselines • LL: last link (time of last occurrence of a pair) • CN: rank by number of common neighbors in • AA: more weight to low-degree common neighbors • Katz: accounts forlonger paths • CN-all: apply CN to • AA-all, Katz-all: similar s s

  35. Setup • Pick random subset S from nodes with degree>0 in GT+1 • , predict a ranked list of nodes likely to link to s • Report mean AUC (higher is better) G2 G1 GT Test data Training data GT+1

  36. Simulations • Social network model of Hoff et al. • Each node has an independently drawn feature vector • Edge(i,j) depends on features of i and j • Seasonality effect • Feature importance varies with season • different communities in each season • Feature vectors evolve smoothly over time • evolving community structures

  37. Simulations • NonParamis much better than others in the presence of seasonality • CN, AA, and Katz implicitly assume smooth evolution

  38. Sensor Network* * www.select.cs.cmu.edu/data

  39. Summary • Link formation is assumed to depend on • the neighborhood’s evolution • over a time window • Admits a kernel-based estimator • Consistency • Scalability via LSH • Works particularly well for • Seasonal effects • differently evolving communities

  40. Thanks!

  41. Problem statement • We are given {G1, G2,…, Gt}. Want to predict Gt+1 • Model 1: Yt+1(i,j) = f(Yt-p+1(i,j), …, Yt(i,j)) • Takes all edges as independent • Only looks at one feature. • Model2: Gt+1 = f(Gt-p+1, Gt-p+2,…, Gt) • Huge dimensionality • Probably intractable • Middle ground • Learn local prediction model for Yt+1(i,j) using a few features and patch these together to predict the entire graph.

  42. Our Model • Idea: Yt+1(i,j) depends on features of (i,j) and the neighborhood of i in the ‘’p’’ previous graphs. Features specific to (i,j) in t {deg(i), deg(j), cn(i,j), ℓℓ(i,j)} Features of the neighborhood of i Should be amenable to fast algorithms. Should reflect the evolution of the graph. But should also be similar to the features of (i,j).

  43. Estimation • Kernel Estimator of g } Once you have computed the kernel similarities between two datacubes, everything boils down to table lookups.

  44. Distance between two datacubes • Can just compare rates of link formation, i.e. η+/η, but this does not take into account the variance. • Instead, make a normal approximation to η+/η and look at the total variation distance. As b0, K(dt(i), dt’(i’)) 0 unless D(K(dt(i), dt’(i’)) =0

  45. Distance between two datacubes • Can just compare rates of link formation, i.e. η+/η, but this does not take into account the variance. • Instead, make a normal approximation to η+/η and look at the total variation distance. As b0, K(dt(i), dt’(i’)) 0 unless D(K(dt(i), dt’(i’)) =0

  46. Consistency of Estimator • Define • Kind of behaves like a bias term.

  47. Consistency of Estimator • Show • Assumption 1. b0 as nT∞ [similar to kernel density estimation] • Show that for bounded q, • Assumption 2. Introduce strong mixing coefficient α(k), roughly this bounds the degree of dependence between two neighborhoods at distance k. • The total covariance between all neighborhoods is bounded. • Assume

  48. Including graph-based features 1 ≤ cn(i,j) ≤ 3 3 ≤ deg(i,j) ≤ 6 1 ≤ ℓℓ (i,j) ≤ 2 YT+1 (i,j)=? Y2 (i,j)=0 Y1 (i,j)=1 …… G2 G1 Too local, not to mention expensive Too global. Idea2: Make one datacube for each pair of nodes. Idea1: Make one datacube per (Gt ,Gt+1 ) transition. Learn how successful this feature combination has been in generating links over the past. GT

  49. Our Model 1 ≤ cn(i,j) ≤ 3 3 ≤ deg(i,j) ≤ 6 1 ≤ ℓℓ (i,j) ≤ 2 sT (i,j) • Datacubedt(i)captures the evolution of a small (2-hop) neighborhood around node i • Close nodes will have overlapping neighborhoods  similar datacubes. Y2 (i,j)=0 Y1 (i,j)=1 YT+1 (i,j)=? …… G2 G1 GT {dT-1(i) ,sT (i,j)}

  50. Building neighborhood features • Let S=range of s(i,j). Assume S is finite. Datacube Number of pairs with feature s in the neighborhood of i at time t Number of pairs which got connected at time t+1 out of ηit (s) • Captures the evolution of the neighborhood from tt+1 • We use the past evolution pattern of a neighborhood in predicting future evolution. • But how do we estimate g efficiently? We will show that the inference of g will boil down to table lookups in the datacubesdt(i)

More Related