1 / 23

Optimized Index Structures for Querying RDF from the Web

Andreas Harth, Stefan Decker andreas.harth@deri.org. 3 rd Latin American Web Congress. Optimized Index Structures for Querying RDF from the Web. Contents. Scenario Index Structure Query Processing Implementation Evaluation Summary. Scenario. Data collected from the Web

neil
Download Presentation

Optimized Index Structures for Querying RDF from the Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Andreas Harth, Stefan Decker andreas.harth@deri.org 3rd Latin American Web Congress Optimized Index Structures for Querying RDF from the Web

  2. Contents Scenario Index Structure Query Processing Implementation Evaluation Summary

  3. Scenario • Data collected from the Web • Much instance data, few ontology • Storage of large amounts of data with mostly unknown schemas, fast retrieval essential

  4. Example RDF with Context

  5. Notation 3 • #s #p #o . syntax for RDF • E.g. @prefix foaf: http://xmlns.com/foaf/0.1/ . http://decker.cn/stefan/ foaf:name “Stefan Decker” . • N3 is extension of RDF data model with quotation of graphs and universally quantified variables • Able to express queries with ql:select and ql:where predicates

  6. Example Query • Get all triples where predicate is foaf:name and object is “Stefan Decker” @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix yars: <http://sw.deri.org/2004/06/yars#> . @prefix ql: <http://www.w3.org/2004/12/ql#> . <> ql:where { ?s foaf:name “Stefan Decker” . } .

  7. Contents • Scenario • Index Structure • Query Processing • Implementation • Evaluation • Summary

  8. Indexes • Lexicon: store mappings from literal values and resources to object IDs and vice versa • Quad Index: store quads (s, p, o, c)

  9. Object Identifiers • OIDs help to save space (only need to store 64 bit OID instead of whole resource/literal all the time)

  10. Keyword Index • Simple search UI require keyword searches • Inverted index on literals

  11. Quad Access Patterns • We want to be able to retrieve all combinations of (s,p, o, c) without performing a join • In total, 2*2*2*2 = 16 combinations

  12. Recap: B+-Trees • Underlying storage technique based on B+-trees • One property of B+-trees: range lookups/prefix lookups • (key, value) pairs with fast retrieval on given (partial) key

  13. Complete Index on Quads • Given prefix lookup capabilities, only 6 indexes are needed to cover all access patterns

  14. Occurrence Counts • Queries for in-degree/out-degree of a node in a graph • Also: statistics for join reordering • Store occurrence counts for quad patterns directly in index

  15. Contents • Scenario • Index Structure • Query Processing • Implementation • Evaluation • Summary

  16. Physical Access Plans • Access pattern (?s, foaf:name, “Stefan Decker”, ?c) • Translate string values to OIDs (2, 11) • Determine index (POCS) • Construct key (2:11:*:*) • Perform prefix lookup on POCS with key 2:11:*:* • Translate result back to string values

  17. Contents • Scenario • Index Structure • Query Processing • Implementation • Evaluation • Summary

  18. Prototype Implementation • Java, JDBM, Apache Tomcat • HTTP interface: GET/POST for querying, PUT for adding data, DELETE for removing data

  19. Contents • Scenario • Index Structure • Query Processing • Implementation • Evaluation • Summary

  20. Index Construction Performance - Lehigh Univ(20)

  21. Query Performance Evaluation Queries: 1: ?x rdf:type univ:UndergradStudent 2: ?x ?p "UndergraduateStudent0" 3: <http://www.Univ965.edu> ?p ?o 4: ?x univ:worksFor ?y

  22. Contents • Scenario • Index Structure • Query Processing • Implementation • Evaluation • Summary

  23. Summary • Complete index on RDF quads to minimize joins • Keyword-based searches • Extensive statistical information • High-performance metadata repository • Core storage backend and query-processing technology for SWSE (Semantic Web Search Engine) • Used in projects at e.g. University of Karlsruhe (RDFReactor, RDF2Go)

More Related