1 / 13

RDF-3X : RISC-Style RDF Database Engine

RDF-3X : RISC-Style RDF Database Engine . Thomas Neumann, Gerhard Weikum PVLDB 2008 09 Jan 2014 SNU IDB Lab. Woo Hyun Lee. Introduction. IS. IS. Schema-less & Not normalized. EXPENSIVE. RDF. Needs. IsFor. SEMANTIC WEB. Effective System. IS. Effective & Efficient. IS.

darin
Download Presentation

RDF-3X : RISC-Style RDF Database Engine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RDF-3X : RISC-Style RDF Database Engine Thomas Neumann, Gerhard WeikumPVLDB 2008 09 Jan 2014 SNU IDB Lab. Woo Hyun Lee

  2. Introduction IS IS Schema-less & Not normalized EXPENSIVE RDF Needs IsFor SEMANTIC WEB Effective System IS Effective & Efficient IS RDF-3X

  3. RDF-3X*: Existing RDF Systems • Triples are stored in RDB • Type1: Triples Table • All in a single table with 3 columns (Subject, Predicate, Object) • Type2: Property Table • Grouped by predicates • Type3: Cluster-Property Table • Clustered by correlated predicates, entity class, occurrence Statistics RDF Triples Capital * Thomas Neumann, Gerhard Weikum, RDF-3X : a RISC-style Engine for RDF, PVLDB ‘08

  4. RDF-3X: RDF Storage • Huge Triples Table • All triples stored in a clustered B+ tree in lexicographical order • Fast Range scan • Mapping Dictionary • All literals are mapped to IDs • Compressed • Simple query processing

  5. RDF-3X: RISC-Style RDF Storage • RISC (Reduced Instruction Set Computing) • Proposed by John Cocke (IBM) in 1974 • 20% of all instructions does 80% of the work • Use simple instructions • Simplification leads to more intuitive processing and less overhead • RISC-Style RDF Storage • Reduced Complexity • Mapping Dictionary • Convert literals to Integer-based IDs • Compare IDs • Produce streams of ID tuples • Compressed triples

  6. RDF-3X: Compressed Index • Six separate indexes • (SPO, SOP, OSP, OPS, PSO, POS) • Stored in the leaf pages of the clustered B+ tree Triple Index ?var- <P> - <O> SPO SOP PSO SPO SOP PSO <S> - var?- <O> <S> - <P> - var? POS OSP OPS ?var- <P> - var? POS OSP OPS ∙ ∙ ∙

  7. RDF-3X : Compressed Index SPO <Malaysia> <Capital> <Kuala Lumpur> <Malaysia> <Kuala Lumpur> <Capital> SOP PSO <Capital> <Malaysia> <Kuala Lumpur> POS <Capital> <Kuala Lumpur> <Malaysia> OSP <Kuala Lumpur> <Malaysia> <Capital> OPS <Kuala Lumpur> <Capital> <Malaysia> SPARQL ?X- <Capital> - <Kuala Lumpur> <Malaysia>- ?X- <Kuala Lumpur>

  8. RDF-3X: Compressed Index • Store collation order • Neighboring indexes are very similar • Stores the change between triples Compression using LZ77

  9. RDF-3X: Compressed Index • Compression • Stores only the change (δ) between triples . . . . B+ Tree Index . . . . <Malaysia, Capital, K-L> <Malaysia, Capital, KL> <Malaysia, Capital, Kuala Lumpur> <651, 954, 260> <651, 954, 270> <651, 954, 275> Compressed B+ Tree Index

  10. RDF-3X: Compressed Indexes • Leaf level Compression • Directly read triple (less decompression cost) • Easy update • Better concurrency control and recovery <Previous Approach> <RISC Style Approach> Leaf Level Compressed B+ Tree Index Chunk Level Compressed B+ Tree Index . . . . . . . .

  11. RDF-3X: Aggregated Indexes [1/2] • For many SPARQL patterns • Indexing partial triples rather than full triples would be sufficient • SELECT ?a ?b WHERE { ?a ?b ?c } • Two-value indexes • Two of three columns of a triple • (value1, value2, count) • (SP, PS, SO, OS, PO, OP) • One-value indexes • One of three columns of a triple • (value, count) • (S, P, O)

  12. RDF-3X: Aggregated Indexes [2/2] SELECT ?a ?c WHERE{ ?a ?b ?c. } Aggregated Indexes Aggregated Indexes

  13. Conclusion • Redundancy • From simple single index to multiple duplicated indices • Complex compression algorithm • High computation cost • Need for Index • Cheaper & Faster Hardware • Reliability • Not verified

More Related