190 likes | 292 Views
A Path-based Relational RDF Database. Author: Akiyoshi Matono y, Toshiyuki Amagasa y, Masatoshi Yoshikawa z, Shunsuke Uemura y. Semantic Web.
E N D
A Path-based Relational RDF Database Author: Akiyoshi Matonoy, Toshiyuki Amagasay, Masatoshi Yoshikawaz, Shunsuke Uemuray
Semantic Web • The World Wide Web growing ever larger and more complex, the Semantic Web has emerged as a vision of the next generation of the web. Compared with the current Web, the Semantic Web makes human-to-machine and machine-to-machine interactions more intelligent with the good quality and quantity of metadata on Web resources.
RDF • Resource Description Framework (RDF), the core of the Semantic Web, describes its metadata and semantics. With the popular utilization of the Semantic Web, the storage and retrieval of RDF data come into the light accordingly. • RDF is commonly used for large data, such as ontology or dictionaries. If we use conventional RDF databases to process such large data, some problems may emerge.
RDF • RDF Schema is a specification for defining schematic information of RDF data. It makes developers define a particular vocabulary for RDF data and specify the kinds of object. • RDF data can be decomposed into statements, so it also can be modeled as a directed graph, where nodes and arcs represent resources and relationships separately. It is composed of RDF-meta schema data, RDF schema data and RDF data, and each group are instances of the former one.
The conventional approach • Flatly store • Problems? Any query contains RDF schema information will not be handled properly.
The conventional approach • Creates relational tables for classes and properties, storing resources according to their classes. • Problems? Doesn’t make any distinction between schema and data, will have problem when you perform a schema query other than RDF data query.
The conventional approach • Store the subject , predicate and object as keys into three tables. using these keys , we can retrieve corresponding statements. • Problems? • Poor performance when processing path-based queries. • Join operation makes the query string longer
Sub graphs • Graph CI, inheritance relationships between classes • Graph PI, inheritance relationships between properties • Graph T, a single-labeled directed acyclic graph • Graph DR, domain (rdfs:domain) or range (rdfs:range) of each property • Graph G, consist of all the remaining statements not included in the above sub graphs • Separate RDF schema information and RDF instance data • Simpler structure ease to store
Path expression Store arc paths of the graphs into path table in relational database
Extended interval numbering scheme • Add virtual root if the graph has more than one root node • Add new node (s) for the node which is reachable through multiple path • Each node is assigned (preorder, postorder, depth) • V is an ancestor of u: pre (v) < pre (u) ^ post (v) > post (u), v, u are nodes in the graph. • V is a parent of u: v is an ancestor of u, and depth (u) – depth (v) = 1
Query processing • Path query - Find the title of something painted by someone: SELECT r.resourceName FROM path AS p, resource AS r WHERE p.pathID = r.pathID AND p.pathexp = '#title<#paints' • Schema query - Find the names of the classes that are http://www.w3.org/2000/01/rdf-schema# Resource’s direct super class: SELECT c1.className FROM class AS c, class AS c1 WHERE c.pre < c1.pre AND c.post > c1.post AND c.depth = c1.depth - 1 AND c.className = 'http://www.w3.org/2000/01/rdf-schema#Resource'
Summary & Conclusion • The main reason for the study is to improve the performance, while retrieving RDF related data and path based querying of Relational RDF data is efficient as it reduces number of joins. Also, It is for both RDF without schema, and RDF with schema data. The paper assumes that most of the RDF data is acyclic. The other thing to observe is, sub graph extraction into 5 sub graphs.
Data is stored based on 5 sub graphs. Extended interval numbering scheme is used to detect parent – child relationships, resulting into fast retrieval of super classes, sub classes. • It is mentioned that most of the queries for RDF data are generally queries to detect sub graphs matching a given graph. Also, they are, in general, queries to detect a set of nodes, which can be reached via given path expression. So, RDF data can be dealt more efficiently using path based queries.
Why Relational RDF… • Because Flat & Hash approaches do not make any distinction between schema information & resource descriptions. • Schema approach is able to process RDF based queries. What about schema less RDF data. Also, there is a big overhead while maintaining schema, as it evolves. • Hence, Relational DB and store the RDF data, schema in separate tables.
Conclusions : As both RDF schema & RDF instance data are stored in to distinct relational tables, We • Can handle schema less RDF data. • Can process, schema based queries. (using the extended interval numbering scheme.) • Can process, path based expressions as the RDF data is stored in the Relational DB based on path expressions.
Also, the performance is dramatically improved, as the length of path expression is increased. Refer to the graph on Page 6. • Problems: • Sub graphing, Assumption of Acyclic data, No mention of ETL if we want to convert from conventional. Not easy to query (compared SQL).