490 likes | 692 Views
Heritage Style Viewgraphs. 2. Team Members. Wayne BetheaJim Cavanaugh Clay FinkPaul FrankJohn GershElisabeth ImmerRoger Remington. Heritage Style Viewgraphs. 3. Outline. Goals
E N D
1. The Graph Query Language David Silberberg
The Johns Hopkins University
Applied Physics Laboratory
July 18, 2006
2. Heritage Style Viewgraphs 2 Team Members Wayne Bethea
Jim Cavanaugh
Clay Fink
Paul Frank
John Gersh
Elisabeth Immer
Roger Remington
3. Heritage Style Viewgraphs 3 Outline Goals & Example Scenario
Related Work and Key Features of GQL
Graph Model and Query Language
Computational Complexity of Query Execution
Future Directions
4. Heritage Style Viewgraphs 4 Goals of the Graph Query Language (GQL)Project To introduce a new approach to graph query languages for graph analysis
Enable graph analysts to perform semantic search and iterative analysis over large graphs in a scalable fashion
Seamlessly integrate graph analysis functions into the graph query language
To quantify the scalability of this type of language
To use ontologies to enrich graph querying
5. Heritage Style Viewgraphs 5 Example Scenario Farmer Jones' lettuce crop did well this year, but few other farmers did well. Why?
First, find Farmer Jones.
6. Heritage Style Viewgraphs 6 Example Scenario Rabbits usually eat lettuce. Let's find the rabbits that ate Farmer Jones' lettuce.
7. Heritage Style Viewgraphs 7 Example Scenario Let's look at all the farmers, and their locations, whose lettuce was eaten by fewer than 5 rabbits.
8. Heritage Style Viewgraphs 8 Example Scenario What commonalities do the farmers have with each other and with the rabbits?
9. Heritage Style Viewgraphs 9 Graph Interaction Methods Graph Analysis is a process of both browsing and searching elements of the graph
Browsing
One-step-at-a-time graph navigation
One-operation-at-a-time graph algorithms
Searching
Several-steps-at-a-time graph navigation
The steps can include one or more graph algorithms
GQL is a declarative graph query language for searching!
10. Heritage Style Viewgraphs 10 Outline Goals & Example Scenario
Related Work and Key Features of GQL
Graph Model and Query Language
Computational Complexity of Query Execution
Future Directions
11. Heritage Style Viewgraphs 11 Related Work Four categories of graph query languages
Knowledge base (subject-predicate-object) query languages
SPARQL, RQL, RAL, RDF Query Language
Graph reasoning query languages
OWL-QL, GraphLog, Query and Inference Service for RDF
Query languages with graph operators
GOQL
GRAM
Graphical user interface query language
QGRAPH
12. Heritage Style Viewgraphs 12 Key Features of GQL Graph Paradigm
Syntax, operators and results use the graph paradigm
Returns a single graph or a set of graphs (not tables or XML files) to support analysis of large graphs
Facilitates iterative graph querying
Semantic Graph Query
Schema-based
Can be extended to utilize ontology-based inference
Graph Exploration
Wildcard searches
Query over patterns
13. Heritage Style Viewgraphs 13 Key Features of GQL (continued) Expressivity
Composite entities
New graph construction of results
Universal and existential quantification
Analysis support
Hypothesis expressions
Special graph functions (Shortest Path, Adjacent Vertices, etc.)
Aggregation functions (count, sum, average, min, max)
Set aggregation functions (union, intersection, difference)
14. Heritage Style Viewgraphs 14 Outline Goals & Example Scenario
Related Work and Key Features of GQL
Graph Model and Query Language
Computational Complexity of Query Execution
Future Directions
15. Heritage Style Viewgraphs 15 Graph Data Models Simple model
Vertices – usually represent concepts or objects
Edges – usually represent relationships between vertices
Properties – attributes of objects or relationships
Represent highly-connected information such as
Social networks
Knowledge bases
Disciplines that use graphs
Link mining analysis
Semantic Web
Bioinformatics
16. Heritage Style Viewgraphs 16 Example Graph Model Graph Schema
Data Graph
17. Heritage Style Viewgraphs 17 GQL Operators - Overview Basic Syntax
SUBGRAPH clause
Finds a subgraph in the source graph
CONSTRAINT clause
Filters the subgraph based on property constraints
RETURN clause
Describes the resulting graph or sets of graphs to return
Syntax for analysis
ASSUME clause
Supports hypothesis statements
PATTERN clause
Defines search patterns
18. Heritage Style Viewgraphs 18 Basic GQL Operators Subgraph Template Operators – SUBGRAPH clause
Conjunctions and disjunctions of path-segment operators
Hierarchy operators (for composite vertices)
Constraint Operators – CONSTRAINT clause
Standard first-order logic
Conjunctions, disjunctions and negations as well as universal and existential quantification of predicates.
Projection Operators – RETURN clause
Constructs the result graph(s)
Path segment operator
Hierarchy operator (for composite vertices)
Present results as a set of graphs
Edge expansion operator
Common join operator
19. Heritage Style Viewgraphs 19 Simple Query SUBGRAPH Fox Chases Rabbit AND Fox Eats Rabbit
CONSTRAINT Chases.Time < Eats.Time
RETURN Fox Chases Rabbit AND Fox Eats Rabbit
20. Heritage Style Viewgraphs 20 New Result Graph Structure Query SUBGRAPH Fox Eats Rabbit AND Rabbit Eats Lettuce
RETURN Fox new(Ingests) Lettuce
21. Heritage Style Viewgraphs 21 Aliasing SUBGRAPH Fox ALIAS ChasingFox Chases Rabbit AND
Fox ALIAS EatingFox Eats Rabbit
CONSTRAINT ChasingFox.name <> EatingFox.name
RETURN ChasingFox Chases Rabbit AND
EatingFox Eats Rabbit
If our graph had an additional edge in which George Fox chased Jack Rabbit at 8 a.m., the result would look like:
22. Heritage Style Viewgraphs 22 Wildcard Queries SUBGRAPH Fox * ALIAS InterestingEdge Rabbit
RETURN Fox InterestingEdge Rabbit
23. Heritage Style Viewgraphs 23 Composite Vertices Composite vertices
Composed of vertices and edges
Contained vertices can be composite as well
24. Heritage Style Viewgraphs 24 Composite Vertex Queries - continued SUBGRAPH HuntingEvent OccuredAt Place AND
HuntingEvent DIRECTLY CONTAINS Rabbit AND
Rabbit Eats Lettuce
CONSTRAINT Place.name = ‘Smith Game Park’
RETURN Rabbit Eats Lettuce
25. Heritage Style Viewgraphs 25 Patterns Pattern Definition
Assigns names to interesting graph patterns
Can be used in multiple queries
PATTERN Predator (Fox new(PreysUpon) Rabbit) =
SUBGRAPH Fox Chases Rabbit AND
Fox Eats Rabbit
CONSTRAINT Chases.time < Eats.time
RETURN Fox new(PreysUpon) Rabbit
26. Heritage Style Viewgraphs 26 Pattern Use Query:
SUBGRAPH Predator(Fox PreysUpon Rabbit) AND
Rabbit Eats Lettuce
RETURN Fox new(Ingests) Lettuce
Is evaluated as if it were:
SUBGRAPH Fox Chases Rabbit AND
Fox Eats Rabbit AND
Rabbit Eats Lettuce
CONSTRAINT Chases.time < Eats.time
RETURN Fox new(Ingests) Lettuce
27. Heritage Style Viewgraphs 27 Hypothesis Expressions Enables queries on hypothetical data
SUBGRAPH Fox Chases Rabbit AND
Fox Eats Rabbit AND
Rabbit Eats Lettuce
CONSTRAINT Chases.time < ‘8am’
RETURN Fox new(Ingests) Lettuce
ASSUME EDGE Chases [NEW time = ‘7am’]
FROM Fox[CONSTRAINT name= ‘Fred’]
TO Rabbit[CONSTRAINT name= ‘Jack’]
28. Heritage Style Viewgraphs 28 Special Graph Operator Queries Shortest Path
SUBGRAPH GameWarden Chases Fox AND
ShortestPath(Fox, Rabbit) ALIAS SP_alias AND
Rabbit Eats Lettuce
RETURN GameWarden Chases Fox AND
SP_alias AND
Rabbit Eats Lettuce
Adjacent Vertices
SUBGRAPH AdjacentVertices(Rabbit) ALIAS AV_alias
CONSTRAINT count_edges(Rabbit) > 10
RETURN AV_alias
29. Heritage Style Viewgraphs 29 Returning a Set of Graphs Can be done with edge expansion or joins in the RETURN clause
Can be seamlessly integrated with non-graph expansion expressions
Any query can be returned as a set of graphs if desired
SUBGRAPH Fox Chases Rabbit
RETURN Fox Chases# Rabbit
30. Heritage Style Viewgraphs 30 Outline Goals & Example Scenario
Related Work and Key Features of GQL
Graph Model and Query Language
Computational Complexity of Query Execution
Future Directions
31. Heritage Style Viewgraphs 31 Query Optimization Query execution time is the key to success for any query language – GQL is no exception
Our approach
Address query optimization on a per path-segment basis
Address path-segment ordering
Address the management of large amounts of intermediate results of a query
Our efforts so far
Addressed per path-segment optimization
Started to address path-segment ordering
Have not yet addressed the management of large amounts of intermediate results
32. Heritage Style Viewgraphs 32 Query Optimization Query plan representations are used to define query execution plans
Query plan representations are manipulated to optimize the query execution time
Via laws of graph algebra
Via graph statistics to estimate query costs for each operation
Query optimizer determines
The best algorithm to execute each operation
The best operation ordering to optimize overall query execution time
33. Heritage Style Viewgraphs 33 Query Planning and Optimization Query planning process determines the operators required to solve a query
Query optimization process determines the most efficient way to:
Execute query operators
Order the execution of query operators
Heuristics have been identified to implement query planning and optimization based on statistical analysis
34. Heritage Style Viewgraphs 34 Graph Statistics Estimating costs requires statistical knowledge of the graph
We estimate the cost of the path segment operator
One of the most common and costly operations
Statistics that we initially considered useful:
Vertex Cardinality: The number of vertices of type v is count(v) or just V.
Vertex Edge Set Cardinality: The total number of edges e that emanate from all vertices of type v is count(ev) or just EV.
Edge Cardinality: The number of edges of type e is count(e) or just E.
Edge Distribution: The number of different vertex type pairs that edges of type e connect of just ED.
Selectivity Factor: The percentage of vertices or edges that match a property constraint is sel(?), where ? is the property constraint.
Uniformity assumption
Independence assumption
35. Heritage Style Viewgraphs 35 Path Segment – Vertex Search, No Indices Algorithm
Iterate through a set of vertices of type v in O(V) time
For each vertex, iterate through its edge list to find edges of type e in O(EV/V) time
Follow the edge to vertex w in constant time
Execution time is O(V*(EV/V)) = O(EV)
36. Heritage Style Viewgraphs 36 Path Segment – Indices on Vertex Edge Set
37. Heritage Style Viewgraphs 37 Path Segment – Edge Indices, Constraint Beneficial when the query includes a constraint ?v on an indexed property of vertices of type v
Vertex edge sets are indexed as well
Algorithm
Logarithmic-time search through the indexed properties ?v in time O(log(V))
Iterate through vertices (collocated in the index) that satisfy the constraint in time O(sel(?v)*V)
Performs a logarithmic-time search on the edges of each matching vertex in time O(log(EV/V))
Iterate through the matching edges in time O(E/EDV)
Execution time is O(log(V) + (sel(?v)*V*(log(EV/V) + E/EDV)) ) = O(log(V) + sel(?v)*V*log(EV/V) + sel(?v)*E/ED)
If sel(?v) ? 0, the dominant factor is the search for vertices or O(log(V))
If the selectivity factor is higher, the execution time approaches the times of the previous slide
38. Heritage Style Viewgraphs 38 Path Segment – Edge Search, No Indices Algorithm
Iterate over edge types e and select those that connect v to w in time O(E)
Find the corresponding vertices in constant time
Execution time is O(E)
39. Heritage Style Viewgraphs 39 Path Segment – Edge Search, Constraint Beneficial when the query statement includes a constraint ?e on an indexed property of edges of type e
Algorithm
Performs a logarithmic-time search through properties to find the first matching edge in time O(log(E))
Performs a linear search through all subsequent matching edges in time O(sel(?e)*E)
Find both vertices attached to each edge in constant time
Execution time is O(log(E) + sel(?e)*E)
If sel(?e) ? 0, the algorithm tends to an execution time of O(log(E))
Otherwise, the algorithm tends to an execution time of O(E)
40. Heritage Style Viewgraphs 40 Varying Number of Vertices per Vertex Type
41. Heritage Style Viewgraphs 41 Varying Number of Edges per Vertex
42. Heritage Style Viewgraphs 42 Varying Edge Types with Constraints
43. Heritage Style Viewgraphs 43 Path Segment Ordering Assume the following query
SUBGRAPH Fox Chases Rabbit AND
Rabbit Eats Lettuce
CONSTRAINT Rabbit.age < 3
RETURN Fox new(Ingests) Lettuce
Query processing produces the following query execution plan
44. Heritage Style Viewgraphs 44 Path Segment Execution Order Choice Which is more efficient?
45. Heritage Style Viewgraphs 45 Execution Order Heuristics In simple terms
Identify the path segment operation that promises to return the least number of results
Then identify the next operation that promises to return the next least number of results
It is actually more complicated than this
Need to search an exponential number of orderings to find the most efficient ordering
Heuristics can make this search tractable
46. Heritage Style Viewgraphs 46 Path-Segment Ordering Metric Order the path segment operators to return the fewest results
Rough heuristic:
If predicates ?v, ?e, and ?w are applied to V, E and W respectively
Start with V and use selectivity factors to estimate execution time
Execution time is:
V * sel(?v) * (E/EDV) * sel(?e) * (WED/E) * sel(?w)
Or, sel(?v) * sel(?e) * sel(?w) * W
Use this formula to determine whether Fox Chases Rabbit should precede or follow Rabbit Eats Lettuce
47. Heritage Style Viewgraphs 47 Outline Goals & Example Scenario
Related Work and Key Features of GQL
Graph Model and Query Language
Computational Complexity of Query Execution
Future Directions
48. Heritage Style Viewgraphs 48 Future Work Create an operational prototype of a Graph Query Language system
Continue to address query optimization issues
Use ontologies to enrich graph queries
Address language issues
Define the query execution process
Inferences
Ontology to graph mappings
Tie GQL to a graphical interface
Enables analysts to express queries through graphical means
Can leverage several technologies (QGraph, Conceptual Graphs, etc.)
Augment GQL to include Uncertainty, Geospatial and Temporal operators and data structures
49. Heritage Style Viewgraphs 49 Backups
50. Heritage Style Viewgraphs 50 Costs of Various Path Strategies Search by Vertex Type
Plain: O(EV)
With indexed Edges: O(V*log(EV/V) + E/ED)
If ED ? E (i.e., one edge of type e emanates from each v), then the algorithm tends to operate in time O(V*log(EV/V))
If ED ? E and EV ?V, the algorithm tends operate in time O(V)
If ED ? E and EV?>> V, the algorithm tends to operate in time O(V*log(EV))
If ED >> E, then the algorithm tends to operate in time O(E/ED)
With indexed Properties and Edges: O(log(V) + sel(?v)*V*log(EV/V) + sel(?v)*E/ED)
If sel(?v) ? 0, the dominant factor is the search for vertices or O(log(V))
Otherwise, the execution time approaches the times of the previous strategy
Search by Edge Type
Plain: O(E)
Since EVW ? EV, the execution time is at least as fast as that of the first algorithm
With indexed Properties: O(log(E) + sel(?e)*E)
If sel(?e) ? 0, the algorithm tends to an execution time of O(log(E))
Otherwise, the algorithm tends to an execution time of O(E)