370 likes | 504 Views
Virtuoso Product Family . Orri Erling - Program Manager, Virtuoso. © 2008 OpenLink Software, All rights reserved. Virtuoso Product Categories. Virtual Database Engine Native Data Management (multi-model covering: SQL, RDF, XML, and Free Text) Discussion Platform Mail Proxy Services
E N D
Virtuoso Product Family Orri Erling - Program Manager, Virtuoso © 2008 OpenLink Software, All rights reserved.
Virtuoso Product Categories • Virtual Database Engine • Native Data Management (multi-model covering: SQL, RDF, XML, and Free Text) • Discussion Platform • Mail Proxy Services • Client Connectivity Kit • Virtuoso Universal Server © 2008 OpenLink Software, All rights reserved. 2
Virtual Database Engine RDF, XML, SQL Conceptual Views over: • External ODBC or JDBC accessible SQL Data Sources • External XML based Data Sources • External SOAP or RESTful Web Services • External RDF Data (e.g. Oracle) • Custom Data Sources via Server Extensions API © 2008 OpenLink Software, All rights reserved. 3
Virtual Database Engine Contd. • SQL Queries over Remote SQL, RDF, XML, and Web Services based Data Sources • SPARQL Queries over Remote SQL, RDF, XML, and Web Services based Data Sources • XQuery/XPath Queries over Remote RDF, SQL, and XML based Data Sources • Web Services based access to Remote RDF, SQL, XML, and other Web Services based Data Sources © 2008 OpenLink Software, All rights reserved. 4
Virtual Database Engine Contd. • Distributed Query Optimization • Locality Sensitive Query Cost Optimization (Collocated Joins, Pass-Through Queries, and Array Parameters) • Deductively Abstracts SQL Dialect Differences (via ODBC and JDBC metadata call exploitation) • Message Latency Factored into Cost Model • Hash Joins Used When Appropriate, Replacing Multiple Remote Lookups with Single Sequential Read • 2-Phase Commit for Distributed Transactions • MS DTC for Windows • Tuxedo on Unix © 2008 OpenLink Software, All rights reserved. 5
Virtual Database Engine Contd. • ATTACH TABLE Statement incorporates Remote Table, Indexes and Statistics into Local Virtuoso Schema • Allows Incorporation of SQL Functions and Stored Procedures from Remote Relational Database Engines • Support for Remote XML, Full Text Indexing for Oracle, Microsoft SQL Server © 2008 OpenLink Software, All rights reserved. 6
Native Data Management – Relational (RDBMS) • Native SQL 92/2K Engine • Rich Procedure Language (PSM-95 based) • Database Engine Optimized for SMP Performance • Native Full Text Indexing © 2008 OpenLink Software, All rights reserved. 7
RDBMS Features - Transactions • Full ACID Properties • Checkpoint + Roll Forward Log, Optional Archiving of Logs • Uncommitted/Read Committed/Repeatable/Serializable Isolations • Non-blocking Read Committed Shows Latest Committed Versions of Uncommitted Updated Rows • Can Work as XA/MS DTC Resource Manager © 2008 OpenLink Software, All rights reserved. 8
RDBMS Features - SQL • Full SQL 92 with many 2K Features • SQLX, XPATH, XSLT, Xquery • SQL 2K Objects, Implementation in SQL/Java/.net • Transparent Mixing of Local and Remote Tables © 2008 OpenLink Software, All rights reserved. 9
RDBMS Features – Query Optimization • Cost Based Optimization • On The Fly Sampling of Table/Column/Literal Key Cardinalities • Fixed Statistics for Deterministic Query Plans • Loop/Hash/Merge Join • SQL Options for Explicitly Specifying Query Plan © 2008 OpenLink Software, All rights reserved. 10
RDBMS Features - Storage Engine • Rows Stored At Leaves of Primary Key Index Tree • Non PK Indexes Refer to Row By Value of PK • Bitmap Index • Full Text Index • Striping Across Disks, No Separate Files Per Table/Key • Incremental Online backup © 2008 OpenLink Software, All rights reserved. 11
RDBMS Features - Run Time Hosting • User Defined Type via Java or .NET Objects Hosted in Process • User Defined Types Persisted in LOB Columns • Java/.NET Methods Called Transparently From SQL • ‘C” based Plugin Mechanism for adding SQL Functions © 2008 OpenLink Software, All rights reserved. 12
RDBMS Features - Security • SQL Role Based Security, Column/Table/View/Procedure Level • Row Level Security With Policy Functions • A Policy Function Can Add Extra Conditions to Queries/Updates Depending on User, Time, Other Considerations © 2008 OpenLink Software, All rights reserved. 13
Data Center Features - Clustering • Combine Multiple Servers for Massive Scale and Parallelism • All Servers Show the Same SQL/RDF Data and Application Logic, A SQL or Web Client Can Connect to Any for the Same Service • Data Partitioning Specifiable Index by Index • Optional Replicated Storage of Partitions for More Load Balancing, Fault Tolerance • Shared Nothing Architecture, Works With Commodity Hardware and Networks © 2008 OpenLink Software, All rights reserved. 14
Data Center Features - Query Penalization • Latency: One Message Round Trip is 20 Single Row Random Lookups • Virtuoso Divides Queries into Collocated Fragments, Ships All Filtering, Aggregation, Joining to Where the Data Is. • Sends Arrays of Hundreds of Operations at a Time, Whenever Possible © 2008 OpenLink Software, All rights reserved. 15
Data Center Features - Transactions • Full ACID Properties • Two Phase Commit with Single Phase Optimization • Detection of Distributed Deadlocks Without Timing Out • Each Cluster Node Keeps Own Transaction Log • No External Monitor, Virtuoso Handles Distributed Recovery Cycle By Itself • Transactions/Logging Can BE Disabled for Bulk Load etc. © 2008 OpenLink Software, All rights reserved. 16
Data Center Features - Parallel SQL • Transparent Map-Reduce Style Execution of Specified Partitioned SQL Functions/Procedures • PL Extensions for Async Remote Execution of SQL Code, With and Without Transactional Semantics © 2008 OpenLink Software, All rights reserved. 17
Data Center Features - Futures • Dynamic Deployment, Adding and Removing Cluster Nodes Without Interruption of Service • Keeping Data in Small, Self-Contained, Easily Relocatable Mini-Partitions © 2008 OpenLink Software, All rights reserved. 18
SQL Client Connectivity - Data Access Drivers • Cross Platform ODBC 3.0 Drivers • JDBC 2.0 Drivers • OLE-DB Provider • ADO.NET Provider • XMLA Provider © 2008 OpenLink Software, All rights reserved. 19
Native Data Management - XML • Native XML Data Type • SQLX + Oracle Compatible XML Functions in SQL • Document Centric Persistence of XML with Special Support in Text Index • XSLT • XQuery • XML Views – XML Mapping Schema based Views of SQL Data Sources © 2008 OpenLink Software, All rights reserved. 20
Native RDF Data Management • Native RDF Quad Storage (Physical Quads) • SQL Enhanced With RDF IRI and Typed/Language Tagged Data • Bitmap Indices and Key Compression for Compact Storage • Selectable Index Scheme, Optionally Allows Queries Against Union of All Graphs • Optional Full Text Index of Literals • Reuses SQL Cost Model and Execution Engine With RDF Tailored Statistics © 2008 OpenLink Software, All rights reserved. 21
RDF Data Services – Client Connectivity • SPARQL Protocol • Jena Storage Provider • Sesame Storage Provider • Redland Storage Provider • Linq2Rdf Storage Provider • SPASQL • SPARQL execution within SQL Processor • Plethora of Built-In Functions, Stored Procedures, Web Services © 2008 OpenLink Software, All rights reserved. 22
RDF Data Services – SPARQL • Full SPARQL, Language and Protocol Support • Jena Compatible SPARUL for Create Graph, Insert, Update, and Delete • Extensions for Aggregates & Grouping • Nested Queries, SQL-Like Existence and Value Subqueries • Expressions in Result Sets • Path Expressions for Compact Notation, Also in Expressions • Full Text & XPath Magic Predicate Extensions © 2008 OpenLink Software, All rights reserved. 23
RDF Data Services – Inference • Backward Chaining Inference Support, No Materialization of Entailed Triples needed for: • Subclass and Subproperty Hierarchies • OWL sameAs for Instances, Classes and Properties • OWL equivalentClass and equivalentProperty • Inference Enabled at Query or Individual Triple Pattern Level © 2008 OpenLink Software, All rights reserved. 24
Linked Data Services - RDF-ization Middleware • Declarative RDF Views (or Covers) over SQL Data • In-Built RDF Middleware (Sponger) for RDF-ization of Harvested Web Content (bulk ingest or “on the fly”) • Extended SPARQL Against Mapped and Stored RDF • RDF-ization Cartridges for 30+ non RDF data sources • Used by SPARQL Processor • Used by in-built Content Crawler • Cache Invalidation based on HTTP Caching Rules • Configurable URI dereferencing via pragmas for node selection and path traversal © 2008 OpenLink Software, All rights reserved. 25
Linked Data Services - Deployment • URL Rewrite Rules combined with SPARQL for flexible association of URIs and RDF Data Sets • Proxy (or wrapper) URIs construction for materializing Linked Data “on the fly” from existing Web information resources • REST or SOAP based Web Services that expose functionality to Web Clients such as OpenLink Data Explorer, Marbles, Zitgist Data Explorer, DISCO, Tabulator etc. © 2008 OpenLink Software, All rights reserved. 26
RDF Data Services – RDF Views over SQL Data Sources • SPARQL Data Definition Statements for RDB Mapping • Declare Correspondences Between Graph/Triple Patterns and SQL Objects • Specify Mapping Between URI's and Keys , Supporting All Data Types, Multipart Keys • Not Restricted to Table per Class and Column per Property • Use Arbitrary Joins, SQL Functions and Search Conditions • Automatically Generate Basic Class per Table, Property per Column Mapping of Given SQL Schema © 2008 OpenLink Software, All rights reserved. 27
RDF Data Services - RDF Views Contd. • Evaluate Arbitrary SPARQL Against an RDF View • In One Query, Some Graphs May Come from Views, Others From Stored RDF • RDF Views Generate a Single SQL Statement, The IRI Generation and IRI Parsing is Only in Selection and Constant Expressions • SQL Has Full Optimization Possibilities and the Generated SQL Does not Depend on Virtuoso Specifics • Hence, RDF Views Are Efficient for Querying Remote, non-Virtuoso SQL Data © 2008 OpenLink Software, All rights reserved. 28
RDF Data Services - Clustering • Cluster-Optimized RDF Loader and SPARUL • RDF-Aware Data Partitioning • Automatic Statistics Sampling Across Cluster for Best Query Plan © 2008 OpenLink Software, All rights reserved. 29
RDF Benchmarks Bundled With: • TPC H With SPARQL Extensions and RDF Views • LUBM • Berlin SPARQL Benchmark with Triples and with RDF Views © 2008 OpenLink Software, All rights reserved. 30
Web Services Platform – HTTP Services • HTTP/1.1 and HTTPS Server for Static and Dynamic Content • Dynamic Web Pages in PHP, Virtuoso SQL Procedures, ASP .net, Others • SOAP and Rest Web Services in Virtuoso PL, Java, .NET • DAV © 2008 OpenLink Software, All rights reserved. 31
Web Services Platform - WebDAV • Documents Stored in Virtuoso Database • ACL Based plus Unix Style Security, SQL User Accounts and Roles Own Documents and Collections • Automatic RDF Metadata Extraction • Optional Full Text Indexing and Versioning • Dynamic Collections for Alternate Views of Directory Hierarchy © 2008 OpenLink Software, All rights reserved. 32
Web Services Platform – SOAP & REST • SOAP 1.1/1.2 End Points Exposing SQL Procedures in All SOAP Styles • Automatic WSDL Generation • SQL Extensions for Declaring Full XML Schema Signatures for End Points • Exposing Java and .net via SOAP • Dynamic Web Pages and XML Functions for REST Services • XMLA for SQL Access over SOAP © 2008 OpenLink Software, All rights reserved. 33
Web Services Platform - Dynamic Server Pages • Configure a Virtual Directory as Executable • Publish Dynamic Web Pages in PHP, Virtuoso PL, Ruby, PERL, ASP .net Without Using External Web Server © 2008 OpenLink Software, All rights reserved. 34
Administration Services • Web Interface for Setup of Web End Points, SQL, XML, RDF Functions • SQL Functions for Full Programmatic Admin Access • Simple Tuning, Only Specify File Layout and Amount of Threads and Memory to Use © 2008 OpenLink Software, All rights reserved. 35
Virtuoso RDF Applications • Dbpedia • BIO2RDF • Neurocommons • Zitgist, Pingthesemanticweb, Musicbrainz © 2008 OpenLink Software, All rights reserved. 36
Product • Open Source and Closed Source Versions, Closed Source AddsVirtual Database and Clustering • All Code, Applications, Samples, Docs in Single Download • Minimal Installation Consists of Single Executable + Config File • Web Admin Interface and Bundled ODS Collaborative Apps Suite • Available for All Linux, Unix, Windows, 32 and 64 bit • Available Preinstalled on Amazon EC2, With Optional Preloaded Dbpedia, BIO2RDF, Other RDF Data Sets © 2008 OpenLink Software, All rights reserved. 37