160 likes | 301 Views
Towards Efficient Query Processing on Massive Evolving Graphs ( C-Big2012 ). Arash Fard , Amir Abdolrashidi , Lakshmish Ramaswamy and John A. Miller UGA Presentation by : Charith Wickramaarachchi. Time Evolving Graph. Paradigm for molding dynamic relationships in networks.
E N D
Towards Efficient Query Processing on Massive Evolving Graphs(C-Big2012) ArashFard, Amir Abdolrashidi, LakshmishRamaswamyand John A. Miller UGA Presentation by : CharithWickramaarachchi
Time Evolving Graph • Paradigm for molding dynamic relationships in networks. • TEG : Series of snapshots of a graph which evolves over time. • Web graph • Relationship structure of social networks • Communication flow networks • Evolution History of genome families
TEG and Scalability • Additional Dimension – Time • New queries • Historical • Inverse temporal • Continuous • Data volume • Indexing
Overview • Data distribution strategies for TEGs • Answering reachability queries • Sub graph queries in large TEGs
TEG distribution • Objectives • Improve node utilization • Minimize the communication cost • Strategies • Random distribution • Improves node utilization • High communication • connected sub-graph distribution • Low communication • Low node utilization
Type of Algorithm • High communication low computation • Page rank, HCC - Min-cut • Low communication • SSSP - Radom distribution • Dynamic Nature of Graph • Additions and deletions of nodes. • Repartitioning cost • Data transfer cost. • Re-wiring cost • Data node configuration • More partitions than compute nodes (Partition : CC ) • Smaller sized partitions • Small stragglers
Reachability queries in TEGs • {G1,G2,…… Gq, …..Gr} – Snapshots of TEG : G • Diff(Gq,Gq-1) – Changes between snapshots Gqand Gq-1 • Vertex addition • Edge addtion • Reach(v,w,q) – TRUE/FALSE
Reachability Queries in Static Graphs • Pre Indexing • O(1) – Pre computed spanning tree • High indexing time • Index table • On demand Traversal • O(M+N) • Limitations for TEGs • High indexing cost – Need to index per each snapshot • High storage overhead • Low cost benefit ratio
Approach • Interval – based indexing
Approach • Steps (Assume Reach (u,v,q) where q > p and Gp is indexed) • Reach(u,v,p) ? • Does Diff(Gp,Gq) change that • Naïve approach : process Diff(Gp,Gq) in Chronological order • A Better approach : Does the changes impact the reachability ?
Approach • Reach (A,H,3) • Add(E,F) ? Related ? • Add(B,E) & Add(F,G) & Add(E,F)
Observations • If Reach(u,v,q) = true • Need to process diffs if diff stack contains at least one delete(p,q) where p,q is a edge on a path from u,v in Gp • If Reach(u,v,q) = false • Contains at least one Add(p,q) • p is reachable from u • q is reachable from v
Graph Pattern Matching • Subgraph Isomorphism • Bijective mapping between query (Q(Vq,Eq))graph and subgraph(G’(V’,E’)) of target graph G. • There exist f : V’--> Vq • For all v’,w’ in V’ there is vq,wqin Vqs.t. (v’,w’) in E’ ↔ (vq,wq) in Eq • Simulation • G(V,E) matches Q(Vq,Eq) if there exist R subset of Vq X V s.t.(u,u’) in R -> u and u’ have same label • For all u in Vq there is u’ in V • For all (u,v) in Eq there is a (u’,v’) in E
Vertex Centric approach • Graph (V,E,l) • Query Q(Vq,Eq,lq) • Output M : a Maxmmatch in G for Q • Use GPS features • Master for global operations
Vertex Centric approach • 1ST - Master broadcasts the query • 2nd – Each vertex whose label is same as in Q will get flagged • S : set of matched nodes (Note v in G can be matched to two vertices in Q) • Each vertex keeps set of lists of labels for possible children. • # of outgoing edges < any list of children : remove. • Send id to children. • 3rd Children reply with id, label • 4th : If received child label is superset of matched children labels in Q keep, else remove. Pass the removal report to parents • 5th : Remove the child list , Check for validity in S . If not remove your self from S, Report to parents . • Next : Goto 5th.
Conclusion • TEG processing : an emerging research area with lot of applications • Need for new partitioning techniques and graph query techniques • Does TEG processing applications benefits more from an EDA based model than traditional query processing model ?