190 likes | 383 Views
Triangle Finding: How Graph Theory can Help the Semantic Web . Edward Jimenez, Eric Goodman. The Semantic Web as a Graph. The Semantic Web as a Graph. Optimizing Queries with Graph Theory. Query9 SELECT ?X, ?Y, ?Z WHERE { ? X rdf:type ub:Student . ? Y rdf:type ub:Faculty .
E N D
Triangle Finding: How Graph Theory can Help the Semantic Web Edward Jimenez, Eric Goodman
Optimizing Queries with Graph Theory Query9 SELECT ?X, ?Y, ?Z WHERE { ?X rdf:typeub:Student . ?Y rdf:typeub:Faculty . ?Z rdf:typeub:Course . ?X ub:advisor ?Y . ?Y ub:teacherOf ?Z . ?X ub:takesCourse ?Z} Query2 SELECT ?X, ?Y, ?Z WHERE { ?X rdf:typeub:GraduateStudent . ?Y rdf:typeub:University . ?Z rdf:typeub:Department . ?X ub:memberOf ?Z . ?Z ub:subOrganizationOf ?Y . ?X ub:undergraduateDegreeFrom ?Y} • Graph theory has a lot to offer the semantic web • One example: triangle finding • O(|E|1.5) • Much more efficient than what a typical database would do.
Experiment • Compare these three approaches, finding all triangles in a graph • Sesame • Jena • MultiThreaded Graph Library (MTGL) • MTGL • Open source library of graph algorithms, targeted towards shared memory supercomputers • Used MTGL’s implementation of J. Cohen’s triangle finding algorithm • Had to modify slightly to allow for multiple edges between vertices.
Data a b a b c d c d • Data: An Recursive Matrix (R-MAT) graph • Specify • |V| • edge factor (average number of edges per vertex) • Probabilities a, b, c, d, wherea+b+c+d=1. • Has properties similar to real-world graphs such as short diameters and small-world properties. • Used as basis of Graph500 benchmark. • Nodes are given a unique IRI and edges are given a random value. • |V| = {25-219} • Edge factor: {16, 32, 64}
Trying to Find Triangles via SPARQL UNION {?X ?a ?Y . ?Z ?b ?Y . ?Z ?c ?X } UNION {?Y ?a ?X ?Y ?b ?Z ?Z ?c ?X}} UNION {?X ?a ?Y . ?Z ?b ?Y . ?X ?c ?Z } UNION {?Y ?a ?X ?Y ?b ?Z ?X ?c ?Z} UNION {?Y ?a ?X ?Z ?b ?Y ?Z ?c ?X} Redundant Solutions SELECT ?X ?Y ?Z WHERE { {?X ?a ?Y . ?Y ?b ?Z . ?Z ?c ?X } UNION {?Y ?a ?X ?Z ?b ?Y ?X ?c ?Z} UNION {?X ?a ?Y ?Y ?b ?Z ?X ?c ?Z}
The Problem: Graph Isomorphism ?X iii ?Z ?Y ?X iv Alice Alice ?Z ?Y Bob Charlie Charlie Bob ?X = Alice ?Y = Charlie ?Z = Bob ?X = Alice ?Y = Bob ?Z = Charlie
The Other Problem: Automorphism ?X i ?Z ?Y Alice ?X = Alice ?Y = Bob ?Z = Charlie Charlie Bob Charlie ?X = Charlie ?Y = Alice ?Z = Bob Bob Alice
The SPARQL Query SELECT ?X ?Y ?Z WHERE {{ ?X ?a ?Y . ?Y ?b ?Z . ?Z ?c ?X FILTER (STR(?X) < STR(?Y)) FILTER (STR(?Y) < STR(?Z)) } UNION { ?X ?a ?Y . ?Y ?b ?Z . ?Z ?c ?X FILTER (STR(?Y) > STR(?Z)) FILTER (STR(?Z) > STR(?X)) } UNION { ?X ?a ?Y . ?Y ?b ?Z . ?X ?c ?Z }}
Cohen’s Triangle Algorithm • Assumptions • Simplified graph • Completely connected • Map 1: O(m) • Use v1< v2< ··· < vnfortie-breaking
Cohen’s Triangle Algorithm <v1,v2>, <v1,v3> <v1,v2>, <v1,v4> … … <v1,v2>, <v1,vn> Reduce: O(m3/2)
Cohen’s Triangle Algorithm v8 v8 v8 v20 <v8, v20> bin v20 v20 … v1 v3 v2 • Reduce 2: O(m3/2) • Emit triangles for the contents of each <vi, vj> bin when the edge exists between vi and vj. v8 v20 • Map 2: O(m3/2) • Identity mapping of previous reduce step. • Map edges
Comparison at Larger Scales • With 1 billion edges, assuming the same constant • An O(x1.39) implementation versus an O(x1.58) is 50x faster • An O(x1.39) implementation versus an O(x1.83) is 9000x faster
Conclusions The Semantic Web is a graph Graph theory can add a lot in terms of speeding up queries It also has other approaches for analyzing the data SPARQL has unexpected issues when graph isomorphism or automorphisms arise.