RDF languages and storages part 2 - indexing semi-structure data

RDF languages and storagespart 2 - indexing semi-structure data Maciej JanikConrad IbanezCSCI 8350, Fall 2004

Outline • Jena storage • Indexing techniques

Jena • Implemented in Java • One of the most popularly used RDF storages and query engines • Supports RDF, RDFS and OWL • In memory and persistent storage (Oracle, MySQL, PostgreSQL) • RDQL • Reasoning/inference engine

Jena - storage schema • Previous version used normalized relational DB tables • statements • literals • resources • Taken approach to store triples as (Subject, Predicate, Object) in denormalized tables • Optimization for common statement patterns - grouping of properties

Normalized tables Denormalized Jena - storage „Efficient RDF Storage and Retrieval in Jena2” - Wilkinson et al.

Jena - storage • Do certain trade-off for space and search time • Cluster properties that are likely to be accessed together - optimize for common patterns • Special treatment of reified statements

Jena - graph abstraction • Graph interface is separated from (persistent) triple storage layer • Special support for different types of graphs - optimized for performance • Support operations like add, delete, find.

Jena - query processing • Converting multiple patterns in query into one query to DB • Use DB query optimizer instead of executing multiple queries from Jena level (as it was in Jena1) • Associate a table with pattern (best) or span pattern between tables (requires join operation) • Query may span between different graphs, but it can be optimized only if they are in the same database

What to index? How to index?

Indexing semistructured data • XML cannot be indexed directly as relational DB • Indexing may take advantage of tree structure • depth of node • common path from the root • convert each path to string expression • precalculate the path tree

Indexing semistructured data • Idea is based on Particia’s trie • Index should scale with the growth of data • Path together with leaf is encoded into string -> the Index Fabric „A Fast Index for Semistructured Data” - Brian F. Cooper et al.

A Layered Index „A Fast Index for Semistructured Data” - Brian F. Cooper et al.

Index Fabric • Index is used to accelerate path expressions - mainly for queries that ask for root-to-leaf path • Idea of prefix encoding • xml: <A>alpha<B>beta<C>gamma</C></B></A> • paths: <A>alpha ; <A><B>beta ; <A><B><C>gamma • encoded: A alpha ; A B beta ; A B C gamma • infix (not common): A alpha B beta C gamma • Convert path to string for fast searches • Replace tags with ‘non-terminal’ characters (like in automata)

Index Fabric - raw paths „A Fast Index for Semistructured Data” - Brian F. Cooper et al.

Graphs - how to index? http://www.aisee.com/ Backbone

Graphs - how to index? http://www.aisee.com/ Tree-type - prefixes - tries

2-index 1-index Graphs - how to index? T-index Path templates „Index Structure for Path Expressions” - Tova Milo, Dan Suciu

Graphs - how to index? http://www.aisee.com/ Landmarks

Indexing - summary • Indexing semistructure data • index fabric - encoding, multilayered • common prefixes - trie structure • backbone - highways between points • landmarks - county division • path templates - precalculated expressions • clustering - grouping by theme access • Indexing such data is NOT easy, solution depends how you want to search the graph

References • „Efficient RDF Storage and Retrieval in Jena2” - Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds • „A Fast Index for Semistructured Data” - Brian F. Cooper, Neal Sample, Michael J. Franklin, Gisli Hjaltason, Moshe Shadmon • „Index Structures for Path Expressions” - Tova Milo, Dan Suciu

RDF languages and storages part 2 - indexing semi-structure data

RDF languages and storages part 2 - indexing semi-structure data

Presentation Transcript

XML Indexing Structure

Data Indexing

RDF storages and indexes

Capital Structure: Part 2

Data Structure (Part III)

Data Structure (Part I)

Semi-Indexing Semi-Structured Data (in tiny space)

Data Indexing

5. File Structure and Indexing

RDF and Linked Data

Data Structure and c Part-5

Data Structure and C Part-6

Magnetic Data Storages

Domain Specific Languages (Part 2)

Data Structure (Part II)

RDF Query Languages

Capital Structure: Part 2

Designing Indexing Structure for Discovering Relationships in RDF Graphs

Indexing Languages

Types of Indexing Languages

Subject languages part 2: