1 / 18

Storage Engine for Semantic Web

Storage Engine for Semantic Web. Assertion. Storage engine for semantic web has requirements similar to those for e-commerce aplications. Draw upon results and lessons from R. Agrawal, A. Somani, Y. Xu: Storage and Retrieval of E-Commerce Data. VLDB-2001.

Download Presentation

Storage Engine for Semantic Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Storage Engine for Semantic Web

  2. Assertion • Storage engine for semantic web has requirements similar to those for e-commerce aplications. • Draw upon results and lessons from • R. Agrawal, A. Somani, Y. Xu: Storage and Retrieval of E-Commerce Data. VLDB-2001.

  3. Typical E-Commerce Data Characteristics An Experimental E-marketplace for Computer components • Constantly evolving schema • Sparsely populated data (about 50-100 attributes/component) • Nearly 2 Million components • More than 2000 leaf-level categories • Large number of Attributes (5000)

  4. Alternative Physical Representations • Horizontal • One N-ary relation • Binary • N 2-ary relations • Vertical • One 3-ary relation

  5. Conventional horizontal representation (n-ary relation) • DB Catalogs do not support thousands of columns (DB2/Oracle limit: 1012 columns) • Storage overhead of NULL values Nulls increase the index size and they sort high in DB2 B+ tree index • Hard to load/update • Schema evolution is expensive • Querying is straightforward

  6. Monitor Name Val Height Output PAN DVD-L75 7 inch Name Val Name Val 3.75 KLH DVD221 Digital PAN DVD-L75 KLH DVD221 S-Video Binary Representation(N 2-ary relations) • Dense representation • Manageability is hard because of large number of tables • Schema evolution expensive • Decomposition Storage Model [Copeland et al SIGMOD 85], [Khoshafian et al ICDE 87] • Monet: Binary Attribute Tables [Boncz et al VLDB Journal 99] • Attribute Approach for storing XML Data [Florescu et al INRIA Tech Report 99]

  7. Oid (object identifier) Key (attribute name) Val (attribute value) Vertical representation(One 3-ary relation) • Objects can have large number of attributes • Handles sparseness well • Schema evolution is easy • Implementation of SchemaSQL [LSS 99] • Edge Approach for storing XML Data [FK 99]

  8. Querying over Vertical Representation is Complex • Simple query on a Horizontal scheme SELECT MONITOR FROM H WHERE OUTPUT=‘Digital’Becomes quite complex: SELECT v1.Val FROM vtable v1, vtable v2 • WHERE v1.Key = ‘Monitor’ • AND v2.Key = ‘Output’ • AND v2.Val = ‘Digital’ • AND v1.Oid = v2.Oid Writing applications becomes much harder. What can we do ?

  9. Horizontal view (H) Attr1 Attr2 … Attrk … Query Mapping Layer Vertical table (V) Oid Key Val Solution • Provide horizontal view of the vertical table • Translation layer automatically maps operations on H to operations on V

  10. Transformation Algebra • Defined an algebra for transforming expressions over horizontal views into expressions over the vertical representation. • Two key operators: • v2h () • h2v ()

  11. Sample Algebraic Transforms • v2h () Operation – Convert from vertical to horizontal k(V) = [Oid(V)]  [i=1,kOid,Val(Key=‘Ai’(V))] • h2V () Operation – Convert from horizontal to vertical k(H) = [i=1,kOid,’Ai’Ai(Ai  ‘’(V))]  [i=1,k Oid,’Ai’Ai(i=1,kAi=‘’(V)) • Similar operations such as Unfold/Fold and Gather/Scatter exist in SchemaSQL [LSS 99] and [STA 98] respectively • Complete transforms in VLDB-2001 Paper

  12. From the Algebra to SQL • Equivalent SQL transforms for algebraic transforms • Select, Project • Joins (self, two verticals, a horizontal and a vertical) • Cartesian Product • Union, Intersection, Set difference • Aggregation • Extend DDL to provide the Horizontal View CREATE HORIZONTAL VIEW hview ON VERTICAL TABLE vtable USING COLUMNS (Attr1, Attr2, … Attrk, …)

  13. Alternative Implementation Strategies • VerticalSQL • Uses only SQL-92 level capabilities • VerticalUDF • Exploits User Defined Functions and Table Functions to provide a direct implementation • Binary (hand-coded queries) • 2-ary representation with one relation per attribute (using only SQL-92 transforms)

  14. density = 10%, 1000 cols x 20K rows 25 20 VerticalSQL_oid 15 Execution time (seconds) VerticalSQL_key 10 5 0 0.1% 1% 5% Join selectivity Data Organization Matters: Clustering by Key significantly outperforms by Oid Join

  15. density = 10% 60 50 40 HorizontalSQL Execution time (seconds) 30 VerticalSQL 20 Binary 10 0 200x100K 400x50K 800x25K 1000x20K Table (#cols x #rows) • VerticalSQL comparable to Binary and outperforms Horizontal Projection of 10 columns

  16. 30 20 VerticalSQL Execution time (seconds) Binary 10 VerticalUDF 0 200x100K 400x50K 800x25K 1000x20K Table (#cols x #rows) • VerticalUDF is the best approach density = 10% Projection of 10 columns

  17. Summary Vertical (w/ Mapping) Binary (w/ Mapping) Horizontal Manageability + + - Flexibility - + - + Querying + + - Performance + +

  18. Remarks • Lessons of this study directly apply to building storage engine for semantics webs • Performance of vertical representation can be further improved by: • Enhanced table functions • First class treatment of table functions • Native support for v2h and h2v operations • Partial indices

More Related