510 likes | 528 Views
Explore how Graph Query Language (GQL) unifies different approaches, supports semantic search, traditional algorithms, metrics-oriented analysis, and more. Discover the computational complexity, future directions, and key features of GQL.
E N D
JTC1 SC32N1634 The Graph Query Language:Towards a Unification of Graph Query Approaches David Silberberg david.silberberg@jhuapl.edu 443-778-6231
Outline • Goals & Example Scenario • Key Features of GQL • Computational Complexity of Query Execution • Future Directions Heritage Style Viewgraphs
Goals of the Graph Query Language (GQL)Project • To unify disparate graph query approaches into a single, seamless, and declarative language • Supports semantic search over graph data structures represented by schemas • Supports traditional graph algorithms that systematically follow edges to discover interesting subgraphs (e.g., shortest path, minimal spanning tree, etc.) • Supports metrics-oriented graphs algorithms (e.g., social network analysis, etc.) • Supports special commands tailored to analysis of graphs • Supports ontology-assisted query • To quantify the scalability of this type of language Heritage Style Viewgraphs
Assumptions • Data model is a typed graph that adheres to a schema • Not XML – graphs tend to be more highly connected • Not a semantic model – inference cannot, in general, be performed on the schema • Data graphs can be large • Query languages are only an abstract representation of questions • The object is finding the right abstraction for the way people think about interacting with graphs • Other query languages onto other data models will work – but do those languages help facilitate or hinder the formulation of those requests or the interpretation of the results? • Algorithms are external to the graph management system • There are too many algorithms • New algorithms may be implemented or modified regularly • We are not the experts in writing efficient algorithms Heritage Style Viewgraphs
Graph Interaction Methods • Graph interactions take many forms • Browse • One-step-at-a-time exploration of a graph • Semantic Schema-Based Search • Several-steps-at-a-time graph query • Algorithms • Find subgraphs • Calculate graph metrics • Analysis • Hypothesis expressions, etc. • GQL is a declarative graph query language for integrating all these approaches! Heritage Style Viewgraphs
Jones Example Scenario • Farmer Jones' lettuce crop did well this year, but few other farmers did well. Why? • First, find Farmer Jones. (Browsing) Heritage Style Viewgraphs
Roman Prize Icy Harvey Bugs Jones Example Scenario • Rabbits usually eat lettuce. Let's find the rabbits that ate Farmer Jones' lettuce. (Semantic Schema-Based Search) Heritage Style Viewgraphs
Roman Crispy Green Leafy Tasty Prize Soft Icy Harvey Peter Bugs Jones Harris Smith Smalltown, USA Example Scenario • Let's look at all the farmers, and their locations, whose lettuce was eaten by fewer than 5 rabbits. (Semantic Schema-Based Search) Heritage Style Viewgraphs
Roman Crispy Green Tasty Leafy Prize Soft Icy Acme Rent-a-Fox Harvey Peter Bugs Jones Harris Smith Sly Red Smalltown, USA Example Scenario • What commonalities do the farmers have with each other and with the rabbits? (Semantic and Algorithmic Search) Heritage Style Viewgraphs
Roman Crispy Green Tasty Leafy Prize Soft Icy Fox Enterprises Acme Rent-a-Fox Harvey Peter Bugs Fred Jones Harris Smith Brer Red Sly Smalltown, USA Example Scenario • If Fred fox ate Prize lettuce, what else would we learn? (Analysis-specific Methods, Semantic Search, and Algorithmic Search) Heritage Style Viewgraphs
Outline • Goals & Example Scenario • Key Features of GQL • Computational Complexity of Query Execution • Future Directions Heritage Style Viewgraphs
Related Work • Four categories of graph query languages and examples • Knowledge base (subject-predicate-object) query languages • SPARQL, RQL, RAL, RDF Query Language • Graph reasoning query languages • OWL-QL, GraphLog, Query and Inference Service for RDF • Query languages with graph operators • GOQL • GRAM • Graphical user interface query language • QGRAPH Heritage Style Viewgraphs
NEXT Features of GQL that Support Analysis • Schema-based graph query • Returns a single graph or a set of graphs (not tables or XML files) • Aliasing • Graph exploration through wildcard search • Embedded queries (helps achieve first order logic expressiveness) • Creates new graph structures in query results • Query over defined patterns (of activity or behavior, for example) • Special commands tailored to analysis • Hypothesis expressions • Composite vertices (of vertices and edges) • External algorithms that return graphs (e.g., shortest path) • External algorithms that return metrics (e.g., social network analysis) • Ontology-assisted graph query Heritage Style Viewgraphs
Fox: fox1 Rabbit: rabbit1 Lettuce: lettuce1 Chases: chases1 Eats: eats3 time: 2pm name: George age: 3 name: Peter age: 2 time: 8am name: PrizeLettuce Eats: eats1 time: 3pm Chases: chases2 Rabbit: rabbit2 Carrot: carrot1 Eats: eats4 time: 5pm name: Bugs age: 4 time: 7pm name: CarrotTop time time Lettuce name Eats Chases Fox: fox2 Rabbit: rabbit3 Carrot: carrot2 Eats: eats2 Eats: eats5 Fox Rabbit Carrot name: Fred age: 2 time: 9am name: Jack age: 1 time: 7am name: BigCarrot name Eats Eats name age name age Lettuce: lettuce2 Eats: eats6 time time time: 8am name: Icy Example Graph Model Heritage Style Viewgraphs
GQL Operators - Overview • Basic Syntax • SUBGRAPH clause • Finds a subgraph in the source graph • CONSTRAINT clause • Filters the subgraph based on property constraints • RETURN clause • Describes the resulting graph or sets of graphs to return • Syntax for analysis • ASSUME clause • Supports hypothesis statements • PATTERN clause • Defines search patterns BACK Heritage Style Viewgraphs
Fox: fox1 Rabbit: rabbit1 Chases: chases1 time: 2pm name: George age: 3 name: Peter age: 2 Eats: eats1 time: 3pm Simple Query that Returns a Single Graph SUBGRAPH Fox Chases Rabbit AND Fox Eats Rabbit CONSTRAINT Chases.Time < Eats.Time RETURN Fox Chases Rabbit AND Fox Eats Rabbit • Type represents variable • Motivated by languages like SQL • In constrast to (Fox ?f1) Heritage Style Viewgraphs
Fox: fox1 Rabbit: rabbit1 Chases: chases1 result graph 1 time: 2pm name: George age: 3 name: Peter age: 2 Fox: fox1 Rabbit: rabbit2 Chases: chases2 result graph 2 time: 5pm name: George age: 3 name: Bugs age: 4 Returning a Set of Graphs • Can be done with edge expansion or joins in the RETURN clause • Can be seamlessly integrated with non-graph expansion expressions • Any query can be returned as a set of graphs if desired SUBGRAPH Fox Chases Rabbit RETURN Fox Chases# Rabbit BACK Heritage Style Viewgraphs
Aliasing SUBGRAPH Fox ALIAS ChasingFox Chases Rabbit AND Fox ALIAS EatingFox Eats Rabbit CONSTRAINT ChasingFox.name <> EatingFox.name RETURN ChasingFox Chases Rabbit AND EatingFox Eats Rabbit • If our graph had an additional edge in which George Fox chased Jack Rabbit at 8 a.m., the result would look like: Fox: fox1 name: George age: 3 Chases: chases3 time: 8am Fox: fox2 Rabbit: rabbit3 Eats: eats2 BACK name: Fred age: 2 time: 9am name: Jack age: 1 Heritage Style Viewgraphs
Embedded Queries • Significant component of first order logic expressiveness • To request the first fox that ate a rabbit, the following existential query is formulate: SUBGRAPH Fox Eats ALIAS E1 Rabbit CONSTRAINT NOT EXISTS (SUBGRAPH Fox Eats ALIAS E2 Rabbit CONSTRAINT E1.time > E2.time) RETURN Fox Eats Rabbit Fox: fox2 Rabbit: rabbit3 Eats: eats2 name: Fred age: 2 time: 9am name: Jack age: 1 BACK Heritage Style Viewgraphs
New Result Graph Structure Query SUBGRAPH Fox Eats Rabbit AND Rabbit Eats Lettuce RETURN Fox new(Ingests) Lettuce Fox: fox1 Lettuce: lettuce1 Ingests: ingests1 name: George age: 3 name: PrizeLettuce Fox: fox2 name: Fred age: 2 Lettuce: lettuce2 Ingests: ingests3 name: Icy BACK Heritage Style Viewgraphs
Hypothesis Expressions • Enables queries on hypothetical data SUBGRAPH Fox Chases Rabbit AND Fox Eats Rabbit AND Rabbit Eats Lettuce CONSTRAINT Chases.time < ‘8am’ RETURN Fox new(Ingests) Lettuce ASSUME EDGE Chases [NEW time = ‘7am’] FROM Fox[CONSTRAINT name= ‘Fred’] TO Rabbit[CONSTRAINT name= ‘Jack’] • Motivated by OWL-QL BACK Heritage Style Viewgraphs
Place name location OccuredAt HuntingEvent time time Lettuce name Eats Chases Fox Rabbit Carrot name Eats Eats name age name age time time time Composite Vertices • Composite vertices • Composed of vertices and edges • Contained vertices can be composite as well Heritage Style Viewgraphs
Composite Vertex Queries - continued SUBGRAPH HuntingEvent OccuredAt Place AND HuntingEvent DIRECTLY CONTAINS Rabbit AND Rabbit Eats Lettuce CONSTRAINT Place.name = ‘Smith Game Park’ RETURN Rabbit Eats Lettuce time Lettuce name Eats Rabbit name age • Addresses a subset of Harel's Higraphs • Multiple hops • CONTAINS or IS-CONTAINED-BY • Feasible because of the hierarchy BACK Heritage Style Viewgraphs
Wildcard Queries SUBGRAPH Fox * ALIAS InterestingEdge Rabbit RETURN Fox InterestingEdge Rabbit Fox: fox1 Rabbit: rabbit1 Chases: chases1 time: 2pm name: George age: 3 name: Peter age: 2 Eats: eats1 time: 3pm Chases: chases2 Rabbit: rabbit2 time: 5pm name: Bugs age: 4 Fox: fox2 Rabbit: rabbit3 Eats: eats2 name: Fred age: 2 time: 9am name: Jack age: 1 • One edge wildcard queries • Multiple hops • May be computationally expensive in a graph • Can be handled by an external AllPath() algorithm BACK Heritage Style Viewgraphs
Pattern Definition • Assigns names to interesting graph patterns • Can be reused in multiple queries PATTERN Predator (Fox new(PreysUpon) Rabbit) = SUBGRAPH Fox Chases Rabbit AND Fox Eats Rabbit CONSTRAINT Chases.time < Eats.time RETURN Fox new(PreysUpon) Rabbit Heritage Style Viewgraphs
Pattern Use • Query: SUBGRAPH Predator(Fox PreysUpon Rabbit) AND Rabbit Eats Lettuce RETURN Fox new(Ingests) Lettuce • Is evaluated as if it were: SUBGRAPH Fox Chases Rabbit AND Fox Eats Rabbit AND Rabbit Eats Lettuce CONSTRAINT Chases.time < Eats.time RETURN Fox new(Ingests) Lettuce BACK Heritage Style Viewgraphs
External Graph Algorithms that Return Subgraphs • Shortest Path SUBGRAPH GameWarden Chases Fox AND ShortestPath(Fox, Rabbit) ALIAS SP_alias AND Rabbit Eats Lettuce RETURN GameWarden Chases Fox AND SP_alias AND Rabbit Eats Lettuce • Adjacent Vertices SUBGRAPH AdjacentVertices(Rabbit) ALIAS AV_alias CONSTRAINT count_edges(Rabbit) > 10 RETURN AV_alias BACK Heritage Style Viewgraphs
External Graph Algorithms that Return Metrics • Centrality: Find the Foxes that eventually Eat the Rabbits, who play a central role in the garden activities SUBGRAPH Fox Eats Rabbit CONSTRAINT Centrality (Fox, Rabbit, Lettuce) > .8 RETURN Fox Eats Rabbit • Clustering Coefficient: Find the Foxes that are likely to work together when Chasing Rabbits SUBGRAPH Fox ALIAS Fox1 Chases Rabbit AND Fox ALIAS Fox2 Chases Rabbit CONSTRAINT ClusteringCoefficient (Fox1, Fox2) > .6 AND Fox1 <> Fox2 RETURN Fox Eats Rabbit Heritage Style Viewgraphs
Some Issues with External Algorithms • Algorithms do not filter results, they operate direction on the graph and tie into the rest of the results • Algorithms need to return a set of graphs (or a graph under some circumstances) in a standard format • Order of query execution • No current way to refer to the result vertices and edges of algorithms that are not specifically identified in the query SUBGRAPH AdjacentVertices(Rabbit) ALIAS AV_alias CONSTRAINT ClusteringCoefficient (<Vertex1 ?>, <Vertex2 ?>) > .6 RETURN <Vertex1 ?> <Edge1 ?> Rabbit AND <Vertex2 ?> <Edge2 ?> Rabbit BACK Heritage Style Viewgraphs
Ontology Assisted Query Organism isA isA Animal Ontology isA isA Chases Vegetable Carnivore Herbivore Eats Eats isA isA isA isA isA isA Wolf Fox Hare Sheep Lettuce Carrot Mappings time time Lettuce Graph Schema name Eats Chases Fox Rabbit Carrot name Eats Eats name age name age time time Heritage Style Viewgraphs
Ontology-Assisted Query Result SUBGRAPH Carnivore Eats Herbivore AND Herbivore Eats Vegetable RETURN Carnivore new(Ingests) Vegetable Fox: fox1 Lettuce: lettuce1 Ingests: ingests1 name: George age: 3 name: PrizeLettuce Fox: fox2 name: Fred age: 2 Lettuce: lettuce2 Ingests: ingests3 name: Icy Heritage Style Viewgraphs
Some Issues of Ontology-Assisted Query • Why not just have an ontology query language? • Performance issues? • Scaling issues? • Capitalize on features that semantics bring to bear on a graph query language • Semantic abstraction (e.g., subsumption, hierarchy) • Use inference to create semantically consistent models • Impose semantic on the graph model BACK Heritage Style Viewgraphs
Outline • Goals & Example Scenario • Key Features of GQL • Computational Complexity of Query Execution • Future Directions Heritage Style Viewgraphs
Query Optimization • Query execution time is the key to success for any query language – GQL is no exception • We apply relational database optimization techniques to graph queries • Optimization issues • Addressed query optimization on a per path-segment basis – yes • Address path-segment ordering – initial thoughts • Address the management of large amounts of intermediate results of a query – not yet • Address incorporating external algorithms – not yet • Address ontology elaboration performance – not yet Heritage Style Viewgraphs
Query Optimization • Query plan representations are used to define query execution plans • Query plan representations are constructed to optimize the query execution time • Via graph algebra • Via graph statistics to estimate query costs for each operation • Query optimizer determines • The best algorithm to execute each operation • The best operation ordering to optimize overall query execution time Heritage Style Viewgraphs
Graph Statistics • Estimating costs requires statistical knowledge of the graph • We estimate the cost of the path segment operator • One of the most common and costly operations • Statistics that we initially considered useful: • Vertex Cardinality: The number of vertices of type v is count(v) or just V. • Vertex Edge Set Cardinality: The total number of edges e that emanate from all vertices of type v is count(ev) or just EV. • Edge Cardinality: The number of edges of type e is count(e) or just E. • Edge Distribution: The number of different vertex type pairs that edges of type e connect of just ED. • Selectivity Factor: The percentage of vertices or edges that match a property constraint is sel(), where is the property constraint. • Uniformity assumption • Independence assumption Heritage Style Viewgraphs
Path Segment – Vertex Search, No Indices • Algorithm • Iterate through a set of vertices of type v in O(V) time • For each vertex, iterate through its edge list to find edges of type e in O(EV/V) time • Follow the edge to vertex w in constant time • Execution time is O(V*(EV/V)) = O(EV) Heritage Style Viewgraphs
Path Segment – Indices on Vertex Edge Set • Requires each edge set to be indexed through a logarithmic-time search tree (e.g., B+ tree) • Next values are (virtually) collocated with the matching value • Enables a constant time search for the next value(s) • Algorithm • Iterate through vertices of type v in time O(V) • Find matching edge(s) in logarithmic time O(log(EV/V) • Iterate through the matching edges in time O(E/EDV) • Execution time is O(V * (log(EV/V) + E/EDV) ) = O(V*log(EV/V) + E/ED) • If ED E (i.e., one edge of type e emanates from each v), then the algorithm tends to operate in time O(V*log(EV/V)) • If ED E and EVV, the algorithm tends operate in time O(V) • If ED E and EV>> V, the algorithm tends to operate in time O(V*log(EV)) • If ED >> E, then the algorithm tends to operate in time O(E/ED) Heritage Style Viewgraphs
Path Segment – Edge Indices, Constraint • Beneficial when the query includes a constraint v on an indexed property of vertices of type v • Vertex edge sets are indexed as well • Algorithm • Logarithmic-time search through the indexed properties v in time O(log(V)) • Iterate through vertices (collocated in the index) that satisfy the constraint in time O(sel(v)*V) • Performs a logarithmic-time search on the edges of each matching vertex in time O(log(EV/V)) • Iterate through the matching edges in time O(E/EDV) • Execution time is O(log(V) + (sel(v)*V*(log(EV/V) + E/EDV)) ) = O(log(V) + sel(v)*V*log(EV/V) + sel(v)*E/ED) • If sel(v) 0, the dominant factor is the search for vertices or O(log(V)) • If the selectivity factor is higher, the execution time approaches the times of the previous slide Heritage Style Viewgraphs
Path Segment – Edge Search, No Indices • Algorithm • Iterate over edge types e and select those that connect v to w in time O(E) • Find the corresponding vertices in constant time • Execution time is O(E) Heritage Style Viewgraphs
Path Segment – Edge Search, Constraint • Beneficial when the query statement includes a constraint e on an indexed property of edges of type e • Algorithm • Performs a logarithmic-time search through properties to find the first matching edge in time O(log(E)) • Performs a linear search through all subsequent matching edges in time O(sel(e)*E) • Find both vertices attached to each edge in constant time • Execution time is O(log(E) + sel(e)*E) • If sel(e) 0, the algorithm tends to an execution time of O(log(E)) • Otherwise, the algorithm tends to an execution time of O(E) Heritage Style Viewgraphs
Varying Number of Vertices per Vertex Type Heritage Style Viewgraphs
Varying Number of Edges per Vertex Heritage Style Viewgraphs
Varying Edge Types with Constraints Heritage Style Viewgraphs
Path Segment Ordering • Assume the following query SUBGRAPH Fox Chases Rabbit AND Rabbit Eats Lettuce CONSTRAINT Rabbit.age < 3 RETURN Fox new(Ingests) Lettuce • Query processing produces the following query execution plan pFox new (Ingests) Lettuce s Rabbit.age < 3 Eats Lettuce Fox Chases Rabbit Heritage Style Viewgraphs
s Rabbit.age < 3 Fox Chases Eats Lettuce Rabbit Path Segment Execution Order Choice • Which is more efficient? pFox new Ingests Lettuce pFox new Ingests Lettuce s Rabbit.age < 3 or Eats Lettuce Fox Chases Rabbit Heritage Style Viewgraphs
Execution Order Heuristics • In simple terms • Identify the path segment operation that promises to return the least number of results • Then identify the next operation that promises to return the next least number of results • It is actually more complicated than this • Need to search an exponential number of orderings to find the most efficient ordering • Heuristics can make this search tractable Heritage Style Viewgraphs
Path-Segment Ordering Metric • Order the path segment operators to return the fewest results • Rough heuristic: • If predicates v, e, and w are applied to V, E and W respectively • Start with V and use selectivity factors to estimate execution time • Execution time is: • V * sel(v) * (E/EDV) * sel(e) * (WED/E) * sel(w) • Or, sel(v) * sel(e) * sel(w) * W • Use this formula to determine whether Fox Chases Rabbit should precede or follow Rabbit Eats Lettuce Heritage Style Viewgraphs
Outline • Goals & Example Scenario • Key Features of GQL • Computational Complexity of Query Execution • Future Directions Heritage Style Viewgraphs
Prototype Implementation Schedule • Currently Implemented • Schema search returning a single graph • Pattern matching • Aliasing • Ontology assisted graph query • Next to be implemented – within approximately 6 months • Externally defined functions • Wildcard search • Hypothesis expressions • Future • Return a set of graphs (instead of a single graph) • Embedded queries • Return new graph structures in query results • Composite vertices (of vertices and edges) • Predefined patterns • Query Optimization Heritage Style Viewgraphs