140 likes | 163 Views
Explore counting triangles in graphs efficiently in sublinear time, with algorithms and lower bounds discussed. Discover novel approaches for approximating the number of triangles in graphs using vertex-pair queries.
Approximately counting triangles in sublinear time Talya Eden, Tel Aviv UniversityAmit Levi, University of WaterlooDana Ron, Tel Aviv UniversityC. Seshadhri, UC Santa Cruz
Counting Triangles Basic graph-theoretic algorithmic question that arises in various applications (e.g. Bioinformatics and Social networks). Has been studied quite extensively in the past: Algorithms for exact counting: O(m3/2) – [Itai&Rodeh], [Chiba&Nisizeki] (m is num of edges)O(m1.41) – [Alon,Yuster&Zwick] (based on matrix multiplication) Algorithms for approximate countingMany algorithms in a variety of models (including streaming) (e.g., [Schank&Wagber], [Tsourakakis], [Avron], [Kolointzakis,Miller,Peng,Tsourakakis], [Chu&Cheng], [Suri&Vassilvitskii], [Arifuzzamna,Khan,Marathe], [Seshadhri,Kolda,Pinar], [Tangwongsan,Pavan,Tirthapura]… ) All previous algorithms (exact/approximate) read the entire graph
Counting Triangles in Sublinear Time Problem considered by [Gonen,R,Shavit], whose main focus was on counting the number of s-stars They considered algorithms that had access to degree queries: what is d(v)for vertex v, and neighbor queries: what is i‘th neighbor of vertex v.Showed that in general no sublinear algorithm for approximately counting num of triangles (in contrast to s-stars) Simple LB construction: Num of triangles linear in n (and m) No triangles Natural question: Is there sublinear alg if also allow vertex-pair queries (is there an edge btwnu and v)? We answer question affirmatively
Our Results Given query access (degree, neighbor, vertex-pair) to graph G with n vertices, m edges, t triangles and parameter (0,1], our algorithm returns s.t. with high constant probability (1-)t (1+)t Expected query complexity O(n/t1/3+ m3/2/t) poly(log n,1/) More precisely: O(n/t1/3+ min{m,m3/2/t}) poly(log n,1/) Also give matching lower bound (up to polylog(n) factors and for constant )
Related Works (Sublinear algs) • Approximating the average degree (number of edges) [Feige], [Goldreich,R] • Approximating the number of stars[Gonen,R,Shavit] • Other sublinear algorithms for approximating graph parameters: MST[Chazelle,Rubinfeld,Trevisan], [Czumaj&Sohler], [Czuman,Ergun,Fortnow,Magen,Newman,Rubinfeld,Sohler], Min VC [Parnas&R], [Nguyan&Onak], [Marko&R], [Yoshida,Yamamoto,Ito], [Onak,R,Rosen,Rubinfled], Max Match [Nguyan&Onak], [Yoshida,Yamamoto,Ito] • Testing Triangle-Freeness[Alon,Fischer,Krivelevich,Szegedy], [Alon], [Alon,Kaufman,Krivelevich,R]
Towards an algorithm I Start with following assumptions (removed later) • Can sample a uniform edge • Can query t(e): num of triangles edge e participates in • Also assume that know m (estimate suffices - use [Feige]) and that know constant factor estimate of t(can remove by search) Given these assumptions can get (1) estimate of t: Select q edges uniformly at random. Denote sample by Y Query t(e) for each e in Y Return (eY t(e))/3q)m Analysis • Since et(e) = 3t, Expe[t(e)] = 3t/m, • so Exp[eYt(e)/(3q)] = t/m • To get h.c.p: Suffices to take q=O((m/t) maxe{t(e)}) (for const) • Difficulty:maxe{t(e)} may be large
Towards an algorithm II : Bounding t(e) Modify t(e) so that e = (u,v) only assigned triangles (u,v,w) s.t.d(w)>d(u),d(v) (break ties by id). Observe: each triangle assigned to single edge: et(e)=t Claim: t(e)=O(m1/2). Proof:If d(u) m1/2, then immediate. Otherwise (d(u)>m1/2), num of neighbors w of u with degree at least m1/2 is O(m1/2) (or else get more than m edges). If have oracle access to (modified definition of) t(e) and can sample edges uniformly, get an algorithm with query complexity O((m/t) maxe{t(e)}) = O(m3/2/t) w u v
Towards an algorithm III: Removing oracle assumption (for t(e)) Procedure replacing oracle for t(e) given edge e=(u,v) Consider lower degendpoint of e=(u,v), wlog, it’s u • Select neighbor w of uunif. at random • Query the pair (w,v) • If (w,v)E and d(w)>d(u),d(v), set (e)=d(u)o.w., (e)=0 w u v ? Analysis (for fixed e) • Exp[(e)] = Pr[hit tri assigned to e]d(u) = (t(e)/d(u))d(u) = t(e) • If d(u) m1/2then (e) m1/2 • Otherwise, to reduce variance “internal to procedure”, let (e) be average value over d(u)/m1/2repetitions of above. Resulting algorithm for estimating t: Select q=O(m3/2/t) edges uniformly at random. Denote sample by Y Run procedure on each e in Y to get (e) Return (m/q)eY(e) Expected query complexity O(m3/2/t)
Towards an algorithm IV: Removing assumption on unif edge selection Idea: Select subset S of vertices unif at random, consider set of incident (“ordered”) edges E(S) = {(u,v): uS, v(u)} If query deg of all S, can sample edge unif in E(S) S u (almost..) Algorithm Select s=O(n/t1/3) vertices uniformly at random. Denote sample by S Select q=O(m3/2/t) edges uniformly at random in E(S) Denote sample by Y Run procedure on each e in Y to get (e) Return (n/2s)(|E(S)|/q)eY(e) Exp[(n/2s)(|E(S)|/q)eY(e)] = (n/2s)((sdavg)/q)q(t/m) = t Can show that by modifying t(e) and procedure that computes (e), getalgorithm that computes (1) estimate of tbyperforming O(n/t1/3+ min{m3/2/t,m}) queries in expectation.
Towards an algorithm IV: Removing assumption on unif edge selection Algorithm (almost) Select s=O(n/t1/3) vertices uniformly at random. Denote sample by S Select q=O(m3/2/t) edges uniformly at random in E(S) Denote sample by Y Run procedure on each e in Y to get (e) Return (n/2s)(|E(S)|/q)eY(e) What’s missing? By slightly generalizing what we have already shown, whp, (|E(S)|/q)eY (e) is a good approximation of eE(S)t(e). If we write eE(S)t(e) asvSt(v), where t(v) = eE(v)t(e) Would like to show that (n/s)vSt(v) is close to vVt(v)=2t We show this for variant of t(v) (t(e)) which requires modifying the procedure for(e).
Lower bound idea(s) Recal: (n/t1/3 + min{m3/2/t,m}) LB of (n/t1/3) is a simple “hitting” lower bound: With fewer than n/t1/3queries cannot distinguish between: An empty graph - no triangles, A graph containing a clique of over t1/3 vertices, and n-t1/3 independent set – (t)triangles.
Lower bound idea(s) continued LB of (m3/2/t ) (for tm1/2)Basic structure: Complete bipartite graph with both sides of size m1/2(remaining vertices, independent set). No triangles. Consider adding edges btwn vertices on lhs of bipartite graph. Each edge givesm1/2triangles. (For example: t=(m), add (random) perfect matching.) Small difficulty: degrees of lhs vertices “give it away”. Take care by removing bipartite edges and adding matching edges on rhs. Intuition for LB: Let k be number of added edges so that k=t/m1/2.Probability of “hitting” added edge (or removed edge) is k/m=t/m3/2.
Summary Present algorithm computing s.t. with high constant probability (1-)t (1+)t Expected query complexity O(n/t1/3+ min{m,m3/2/t}) poly(log n,1/) Main ideas: • Assign triangles to edges so that each edge e assigned t(e)=O(m1/2) triangles (if had oracle to t(e) and could sample edges uniformly, would be done) • Give simple procedure for computing r.v. (e)s.t.Exp[(e)]=t(e) (if could sample edges uniformly, would be done) • Replace uniform sampling of edges from entire graph by uniformly sampling edges incident to uniformly sampled subset of vertices. Matching lower bound (up to polylog(n) factors and for constant )