270 likes | 434 Views
Graph Algebra. with Pattern Matching and Aggregation Support. Nowadays Graph. Variety of Sources Scientific Studies Business Activities Social Needs Internet Data are often of Large Scale Highly Liked Schema-less. Managing Graph Data. Primary Role of Database Persistent store
E N D
Graph Algebra with Pattern Matching and Aggregation Support
Nowadays Graph • Variety of Sources • Scientific Studies • Business Activities • Social Needs • Internet • Data are often of • Large Scale • Highly Liked • Schema-less
Managing Graph Data • Primary Role of Database • Persistent store • Efficient Query • RDBMS • Storage Model : vertex and edge as tuples • Query: Link is by join • Graph Database • Storage Model: graphs • Query: path traversal
Why not RDBMS ? • Schema Issue • Every data inserted may of a different schema (Web Graph) • Hard to represent semi structured info • Scalability Issues • ACID property VS CAP theorem • Query performance • Difficult to optimize intensive Joins
Graph Databases and Query Languages No Universal Languages !!!
No Universal Language Like SQL? • No commonly agreed algebra • Relational Algebra ? • Expressive, test-of-time to be effective • NOT suitable for GRAPH • Graph Algebra ? • Still at preliminary work
Issues with Relational Algebra (RA) • Defined on Tuples or Set of Tuples • Mismatch with graph nature • Operators loose semantics • What is Union, Intersection, Join in GRAPH? • I/O type ? • Tables not GRAPH • Domain centric, not Data centric • Don’t anticipate out-of-order data • Treat Tuples as independent • Didn’t aware the links among Tuples • Queries written using RA are verbose and complex
Advantage of Graph Algebra • An algebra itself is a query language • Easy to work out a language with Strong theoretic support • Evaluate expressiveness of given languages • Justify when to use what: Gremlin, Cypher etc. • Query Optimization • Operator order EQUALS execution plan • Algebraic Equivalence IMPLIES query optimization
Advantage of Graph Algebra • Separation of Query and System: • One can write Query on any system as long as common algebra is supported. • Knowing RA, one can write SQL, PL/SQL, MS/SQL on MySQL, Oracle, SQLServer • Integrate new operators to database: • Current graph database systems didn’t support newly developed queries: • Graph OLAP, Graph Cube, Graph Aggregation etc. • Proper Algebra can incorporate these operators
Existing Works on Graph Algebra • Graph QL [1] • A graph based algebra, operators are based on graphs • Selection • Join – not properly defined • Template • VAQL [2] • Focused on visualization • Selection • Aggregation – restricted • Visualization • Selection is restricted on isomorphism • Aggregation is not defined over edges • No algebra equivalence [1] He, Huahai, and Ambuj K. Singh. "Graphs-at-a-time: query language and access methods for graph databases." Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, 2008. [2] Shaverdian, Anna A., et al. "A graph algebra for scalable visual analytics." Computer Graphics and Applications, IEEE 32.4 (2012): 26-33.
What we want for a Graph Algebra? • Universal • Independent of graph types: • Directed VS Undirected. Simple VS Hyper. Homogeneous VS heterogeneous. • Expressive • Able to answer typical graph queries: • Pattern match, Reachability, Path finding etc. • Cover Relational Algebra (RA) • This ensures that graph database can handle relational data as well • Scale • Able to manage data in-scale • Support queries to summarize, aggregate data
Extended Algebra – Graph Model • is an attributed graph • is vertex set, each has a unique ID • is edge set • contains attributes for each vertex • contains attributes for each edge • Edge contain identifier as well • In simple graph, edge can be represented by end points • contains information for the graph
Extended Algebra – Operators • Projection • Restriction • Unification • Pattern Matching • Aggregation
Operators: Projection • Purpose: • Select user interested data from base graph • Syntax: • are the attribute lists for vertex, edge and graph • The result is a new graph, whose attributes are trimmed by
Operators: Restriction • Purpose: • Restrict the attribute value from base graph • Syntax: • : vertex restriction, select all the vertices (and their induced edges) which matches predicate • : edge restriction, select all the edges (and their endpoints) which matches predicate • : graph restriction, select graphs whose every vertex matches predicate, every edge matches and the graph matches
Operator: Unification • Purpose: • Concatenate graphs • Syntax: • : vertex unification, unify vertices with identical ids • : edge unification, adding edges between two vertices matching • : attribute unification, create a virtual vertex for each distinct value in
Operator: Unification P(v1,v1) and P(v4,v5) are true
Operator: Pattern Matching • Purpose: • Find subgraphs out of base graph matching a given pattern • Syntax: • is a pattern, which is also a graph. The definition comes from [1] • returns all the matching graphs • returns abstractive matching, where only vertices appeared in is returned [1] Fan, Wenfei, et al. "Adding regular expressions to graph reachability and pattern queries." Data Engineering (ICDE), 2011 IEEE 27th International Conference on. IEEE, 2011.
Operator: Aggregation • Purpose: • To summarize a given graph • Syntax: • : graph aggregation, every vertex is supplied to and every edge set is supplied to • : vertex aggregation, given a set of vertices group them by • : edge aggregation, given a set of edges, group them by
Expressiveness • This set of operators aremore expressive than Relational Algebra and Graph QL • It can represent many graph queries • Reachability • Graph Cube computation • I-OLAP and T-OLAP
Algebra Equivalence • When operators are chained up, they can form a query execution plan friend Comment friend V-Unification Base Graph Matched Result Restriction v.name Find the network induced by the person whose friends comment on each other’s posts with birthday greater than 1989. Output those names as a graph
Algebra Equivalence • To generate multiple execution plans for a same query, we need theoretic support: • Identity Equivalence: • A operator can be represented by other operators • // p is a common attribute predicate • D(P) is to decompose a pattern P into edges • // • ...
Conclusion • Graph Algebra plays an important role in graph database development • We make one step forward by proposing a Graph Algebra which: • extends existing algebraic work with • Regular pattern matching • Aggregation • is expressive and well-defined • contains equivalence rules for further query optimization