270 likes | 384 Views
The Theory of Zeta Graphs with an Application to Random Networks. Christopher Ré Stanford. Social Network Data. Social network data is ubiquitous and high value. Since 2000, many studies of the dynamics of these graphs, Watts- Strogatz , Preferential Attachment, etc.
E N D
The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford
Social Network Data Social network data is ubiquitous and high value. Since 2000, many studies of the dynamics of these graphs, Watts-Strogatz, Preferential Attachment, etc. Design new random graph models to capture some new aspect of an observed network. Above is not the goal of this work…
What’s the matter with Erdös-Rényi? G(N,p) does not match real-world graphs (degree distribution, diameter) But we have a beautiful theory of G(N,p) (zero-one laws, the “movie”, threshold phenomenon, ….) Much of this work enabled by simple, declarative G(N,p). Find an ER-like model to replace generative models for DB theory-style theorems? May lead to rigorous hypothesis testing for these models (key question in motifs).
Which model should we study? Many models. For this study: simple & popular. “At each time step, a new vertex is added. Then, with probability δ, two vertices are chosen uniformly at random and joined by an undirected edge.” – CHKNS Callway, Hopcroft, Kleinberg, Newman, Strogatz (CHKNS) CHKNS captures one salient aspect of many models: Arrival order of node affect its properties. NB: Does not capture all phenomenon of interest.
Zeta Graphs Simple model to capture “arrival order” NB: We’ll use a directed variant,all queries are binary since its easier to describe.
Zeta graphs Zeta graphs are a family of sets of graphs indexed by N Fixed node set: [N] = {1,…,N} (Index ≈ arrival order) Stochastic edge set (independent edges) Bare bones model to break symmetry: 1 connects to many nodes (~ log N). N connects to 1 node (in expectation) ER-like: Edges are present independently.
Informal Main Result Conjunctive Graph Queries cannot distinguish between Zeta graphs and CHKNS as N to ∞. 1. Determine the Theory of Zeta Graphs 2. Show the Theory of CHKNS is sandwiched between two “slices” of Zeta Graphs. Here, Theory is set of CQs with probability 1
Our goal for this section Given (1) a Language of Boolean queries L, and (2) a family of probability models M(1), M(2), …,M(N) check if limN to∞PrM(N)[q] = 1 for q in L “Theory” Th(L,M) = { q in L : limNto∞ PrM(N)[q] = 1 } For the talk: L will be “graph patterns” positive conjunctive queries over binary relations. The family of probability models M(N)=
Boolean Query Answering on ER Graphs (1) Form “full query” corresponding to q. (2) Compute expected number of tuples. (3) Use Janson’s Inequality to relate E[Q] to Pr[q]
see Alon & Spencer, Random Graphs Recall: Classical Janson’s Inequality A classical sufficient condition for Pr[q] to 1. A Q(c) and Q(d) properly overlap if they are not identical, but they share at least one identical subgoal A corollary of Janson’s inequality is:
Boolean Query Answering on ER Graphs (1) Form “full query” corresponding to q. (2) Compute expected number of tuples. (3) Use Janson’s Inequality to relate E[Q] to Pr[q] What changes for Zeta graphs?
Multiple Valued Zeta (MVZ) Functions Only use integer siin this talk MVZs show up in some strange places…
Order Matters: Paths of Length 2 If x < y < z 0 1 1 If x < z < y 0 0 2 So in our “atoms” variables will be totally ordered.
Why Multiple-Valued Zeta (MVZ)? Well-studied special function. We get for free: • Asymptotics [Costermans et al. 2005] • Algebraic Identities [Zudilin& Zudilin2003] • Fancy sounding function (not helpful)
This is a small variation of Costermans et al. result. Asymptotic Estimates for MVZs (expected # of edges) (expected # of triangles) (expected # of K4)
Pr[2 Paths] 0 1 1 Consider pairs of properly overlapping 2 paths. Indicates shared identical goal 0 1 1 1 0 2 1 … 0 And others o(E[Q]2) and since E[Q] = w(1), Pr[Q] = 1 – o(1)
Two cycles you’re out! s cycle B(r,s) r cycle 1st result: (1) For all r, s ≥ 2, PrM(N)[ B(r,s) ] < 1 – e for some fixed e > 0 as N to ∞, i.e., no bicycles. (2) Any connected graph q with at most one cycle appears with probability 1. Two Parts: (A) Any individual pattern, check E, and (B) Different “orderings” are non-negatively correlated.
Central Message How different is CHKNS from the family of Zeta graphs? Up to CQs, the answer is not at all.
Key Technical Issues Goal: Establish that Th(“Graph Patterns”, CHKNS ) = Th(“Graph Patterns”, Zeta Graphs) 1. CHKNS Edge probabilities have a painful form. • But can be sandwiched by “Zeta slices” 2. CHKNS Edges are correlated! - Develop bounds on correlations 3. Show that CHKNS can be essentially embedded in a part of Zeta graphs.
Other Related Work Graph Models. Huge amounts. Volumes! [Lynch 05]: Conditions on a skewed degree distribution, but symmetrizes labels. • Proves a 0-1 law for all of FO! • Zeta graphs and CHKNS have no 0-1 law. • Inspired by this paper!
Future Work & Conclusion “Conjunctive” theory of simple random graph models with order. • Does a simpler model capture CHKNS? • Could one capture Albert & Barabasi’spreferential attachment model? • Richer Languages?
Expectations for Ordered Graphs Since sensitive to order, consider graph patterns with order among variables. Then expectation has a semi-closed form. This function has an MVZ
Computing Expectations of General CQs If variables in Q are totally ordered, then we can compute E[Q] using MVZs. Obvious algorithm: given a query, add in equality and inequality in all possible ways. This takes exponential time in Q (#P-hard).