220 likes | 426 Views
Daniel J. Abadi, Adam Marcus, Samuel R. Madden, and Kate Hollenbach. 2009. The VLDB Journal. SW-Store: a vertically partitioned DBMS for Semantic Web data Management. Group 4 Surabhi Mithal 4282643 Nipun Garg 4282567 http://www-users.cs.umn.edu/~smithal/. Surabhi Mithal
E N D
Daniel J. Abadi, Adam Marcus, Samuel R. Madden, and Kate Hollenbach. 2009. The VLDB Journal. SW-Store: a vertically partitioned DBMS for Semantic Web data Management Group 4 Surabhi Mithal 4282643 Nipun Garg 4282567 http://www-users.cs.umn.edu/~smithal/ Surabhi Mithal Nipun Garg
Outline • Introduction to Semantic Web • Motivation • Problem Statement • Challenges • Major Contributions • Related Work • Key Concepts • Assumptions • Validation Methodology • Results • Improvements
Introduction to semantic web : An example Asimplified bookstore data (dataset “A”) Source : http://www.w3.org/People/Ivan/CorePresentations/SWTutorial/
EXAMPLE CONT : GRAPH REPRESENATION a:title http://…isbn/000651409X The Glass Palace a:year 2000 a:publisher a:city London a:author a:p_name Harper Collins a:name a:homepage Ghosh, Amitav http://www.amitavghosh.com
EXAMPLE CONT : GRAPH REPRESENATION http://…isbn/000651409X Le palais des miroirs f:original f:titre f:auteur http://…isbn/2020386682 f:traducteur f:nom f:nom Ghosh, Amitav Besse, Christianne
DATA INTEGRATION ACROSS THE TWO DATASETS : SEMANTIC WEB a:title The Glass Palace http://…isbn/000651409X a:year 2000 a:publisher a:city London a:author Harper Collins a:p_name a:name http://…isbn/000651409X a:homepage Le palais des miroirs f:original Ghosh, Amitav http://www.amitavghosh.com f:titre f:auteur http://…isbn/2020386682 f:traducteur f:nom f:nom Ghosh, Amitav Besse, Christianne
DATA INTEGRATION ACROSS THE TWO DATASETS : SEMANTIC WEB a:title The Glass Palace http://…isbn/000651409X a:year 2000 SAME URI a:publisher a:city London a:author Harper Collins a:p_name a:name http://…isbn/000651409X a:homepage Le palais des miroirs f:original Ghosh, Amitav http://www.amitavghosh.com f:titre f:auteur http://…isbn/2020386682 f:traducteur f:nom f:nom Ghosh, Amitav Besse, Christianne
DATA INTEGRATION ACROSS THE TWO DATASETS :SEMANTIC WEB a:title The Glass Palace http://…isbn/000651409X a:year 2000 a:publisher a:city London a:author Harper Collins a:p_name f:original a:name f:auteur a:homepage Le palais des miroirs Ghosh, Amitav http://www.amitavghosh.com f:titre User of data “F” can now ask queries like: “give me the title of the original” http://…isbn/2020386682 f:traducteur f:nom f:nom Ghosh, Amitav Besse, Christianne
Motivation • Integration and sharing of data across different applications and organizations. • The Semantic Web logical data model is called “Resource Description Framework. • Semantic web concept has issues related to scalability and performance due to the nature of the data. Current data management solutions for RDF scale poorly.
Problem Statement • Input : RDF data in the form of triples <subject,property,object> e.g. The Glass Palace hasAuthor Amitav Ghosh • Output : Efficient storage system for RDF data. • Objective : Improve the query performance for complex real world queries.
Challenges Find all authors of books whose title has the word “Transaction”. 5 way self join!
Major Contributions and Novelty • Introduction of a new concept of vertically partitioning RDF data and use of a column-oriented database to improve performance and increase simplicity. • The performance evaluation of the new and existing techniques with a real world example. • A new column oriented database SW-store is proposed which is based on the above approach.
Related Work– Property tablesHP Laboratories - Jena • Property Clustered Tables and Property Class Tables • Approach 1: A data clustering approach. • Approach 2: Creates clusters based on subject’s type. • Limitations: • Accuracy of Clustering algorithms. • NULLs in data. • Multivalued attributes.
Sample database Too many NULLs Source: - SW-Store: a vertically partitioned DBMS for Semantic Web data management
Key Concepts: Vertical partitioning and Column Oriented Store • Vertical partitioning of data and further storing this vertically partitioned data into a column oriented database. • Subject-object columns for each property. Advantages: • Effective handling of Multivalued attributes. • Elimination of NULLs • The number of unions is less. • Column oriented storage. Advantages: • no wastage of bandwidth as projections on data happen before it is pulled into main memory. • record header is stored in separate columns thus reducing the tuple width and letting us choose different compression techniques for each column.
Key Concepts: sw-store • SW-store is a column oriented DBMS optimized for storing RDF • Single column table for subjects. • Representing Sparse data • Overflow tables
Assumptions • Postgres is assumed to be the best available choice for a row oriented RDBMS because of effective handling of NULLs. • Queries that do not restrict on property values are very rare for RDF applications. • Moderate amount of Insert/Updates on RDF store. • Critique for Assumption: Limited Insert/Update • If the overflow tables get filled rapidly, the batch operation to update the column oriented store will occur more often degrading the performance as a whole.
Validation methodology • Barton Libraries dataset provided by the Simile Project at MIT (http://simile.mit.edu/rdf-test-data/barton). • The benchmark is set of 7 queries which is based on a browsing session of Long well, a UI built by Simile group for querying the library dataset. These queries are executed on: • Triple data store (subject, property, object table with no improvements on Postgres). • Property tables ( on Postgres) • Vertically partitioned data in a row oriented store (Postgres). • Vertically partitioned data in a column oriented store (C- Store).
Validation methodology • Strengths : • Real world data and query scenarios. • Comparison of all the existing techniques the proposed technique. • Weaknesses :- • Avoiding queries involving unrestricted property problem which are particularly prevalent for vertical partitioned scenarios. • Accuracy of clustering for property tables. • Performance may differ when using different underlying databases.
Results • From the results, it is clear that proposed storage scheme outperforms the exiting methods in terms of query time.
Improvements – Spatial Perspective • Schema design- Queries are fired on vertically partitioned tables as well as overflow tables. Owing to the heaviness of spatial data, there should be some spatial indexing like R* TREE or GRID to make these queries faster. • Restrictive nature - Spatial queries are not restricted to only specific “properties” which is an important assumption on their part. • E.g. Landmarks • Tables should be partitioned in a better way rather than just handling one property per table! e.g. Grouping similar properties together based on domain knowledge.