270 likes | 277 Views
Discover how Oracle Database integrates semantic technology to facilitate data integration, enhance queries, and support large-scale datasets, benefiting various industries such as biosurveillance, social networks, and utilities.
E N D
<Insert Picture Here> Semantic Technology in Oracle Database
Data Interoperability Challenges • Data locked into schemas, formats, software systems • Semantic technology seen as a possible solution • Specialty RDF data management engines are isolated from the data to be integrated • In addition there are high training costs, systems admin costs, management costs. • Tightly coupling semantics (RDF/OWL) functionality to the data storage infrastructure will facilitate data integration using semantics RDF/OWL Triples Business Data RDF Data Server Enterprise Data Server Semantic Apps Business Apps
Adding advanced RDF services to Oracle Database • Database features and queries can be enhanced using semantics • Hybrid queries between enterprise data and semantic data possible • Databases are part of infrastructure in several categories of applications that use semantics for data integration • Biosurveillance, Social Networks, Telcos, Utilities, Text, Life Sciences, GeoSpatial • All database benefits become available for semantic applications • Scalability: Manage datasets 10X larger than specialized RDF/OWL stores (billions of triples), no scalability boundaries • Billions of nodes, large graphs, parallel loading, query, indexing • Security, transaction control, availability, backup and recovery, lifecycle management, etc. • Can combine multiple datatypes (geospatial, sensor, etc. with semantic data)
Oracle 10g RDF Approach • Provide an open and persisted RDF data model and analysis platform for semantic applications • RDF Data Model with inferencing (RDFS and user-defined rules) • Inferencing based on forward-chaining • Perform SQL-based access to triples and inferred data • Combine SQL query of business with RDF graphs and ontologies • Support large graphs (billion+ triples) • Easily extensible by 3rd party tools/apps
Use Case: Knowledge Mining Solutions Ontology Engineering Modeling Process Information Extraction Categorization, Feature/term Extraction RDF/OWL OWL Ontologies Processed Document Collection Web Resources Domain Specific Knowledge Base • Knowledge Mining & Analysis • Text Indexing using Oracle Text • Non-Obvious Relationship Discovery • Pattern Discovery • Text Mining • Faceted Search News, Email, RSS SQL/SPARQL Query Content Mgmt. Systems Explore Analyst Browsing, Presentation, Reporting, Visualization, Query
Geospatial Semantic Search • GeoSemantic Processes • Text Extraction • Semantic Modeling • Rules/Policy Mgmt. • Geospatial Analysis • Map Visualization • Semantic Search • Schemas: • Persisted RDF/OWL data • Persisted spatial data • Persisted business data • Persisted text data Oracle 10g RDBMS Spatial Data RDF Models Text Data Business Data
Simple Features GeoRaster Topology Networks Spatial Data Mining Geocoding Routing Versioning DBMS Rules J2EE Container SOAP Web sevices Orchestration & Workflow Security Policy based resource mgmt Workload scaling Portal Wireless & Sensor National Security Financial Risk Analysis Regulatory Compliance Life Sciences Drug Discovery Health Science BioSurveillance Semantic Solutions on the WebDeploying on a SOA Infrastructure Core Software Infrastructure Semantic- Enabled tools Applications & Services • Business Logic • Entity Extraction • Visualization • Ontology Modeling • Faceted Search • Link/Graph Analysis • Advanced Inference • Metadata Repository • Entity Categorization • Relationship analysis Manufacturing Configuration Management
Semantic Technology Stack Standards based
Based on Standards • Our implementation entirely based on W3C standards (RDF, RDFS, OWL) • SPARQL support is planned • We are members of: • W3C DAWG (WG responsible for SPARQL) • W3C SWEO Interest group • W3C HCLS Interest group • W3C Multimedia Semantics Incubator group • Soon to be formed W3C OWL 1.1 Working group
Technical Features • Database storage model for data represented in RDF • SQL-based query of RDF data • Combining RDF queries with relational queries • Native inferencing engine to infer new relationships from RDF data
Technical Overview QUERY INFER Combining relational queries with RDF/OWL queries Query RDF/OWL data and ontologies RDF/S User def. rules Incr. Loadand DML STORE RDF/OWL data and ontologies Enterprise (Relational) data Batch-Load
Storage: Highlights :employeeOf John Oracle • Stores <subject, predicate, object> triples • Set of triples form an RDF/OWL graph (model) • Optimized storage structure: repeated values stored only once (uses normalization) • Scales to very large datasets • No limits to amount of data that can be stored • Current users: 600Million+ triples (UTH) • Can handle multiple lexical forms of the same value • Ex: “0010”^^xsd:decimal and “010”^^xsd:decimal • Maintains fidelity (user-specified lexical form) • Supports long literal values
Semantic Data Storage Optional columns for related enterprise data Application table 1 Application table 2 Model Model • Application table links to model in • internal semantic store Internal Semantic Store
Query RDF Data • SPARQL-like graph pattern embedded in SQL query • Matches RDF/OWL graph patterns with patterns in stored data • Returns a table of results • Can use SQL operators/functions to process results • Avoids staging when combined with queries on relational data • Scales: millisecond query times for large data sets (10M+ triples) SELECT … FROM …, TABLE ( SDO_RDF_MATCH invocation ) t, … WHERE … SDO_RDF_MATCH( '(?x rdf:type :Person)', -- pattern: all persons SDO_RDF_Models('family'), -- RDF data models SDO_RDF_Rulebases(‘RDFS'), -- rulebases SDO_RDF_Aliases(…) -- aliases null -- no filter condition )
Query Example: Family Data select x, y, name from TABLE(SDO_RDF_MATCH( ‘(:Tom :hasParent ?x) (?x :hasFather ?y) (?y :name ?name)', SDO_RDF_Models('family'), .., .., ..)); Returns the name of Tom’s grandfather “John D” :John :Janice :Matt :Suzie :Tom :Jack
Combining RDF Queries with Relational Queries • Find salary and hiredate of Tom’s grandfather(s) • SELECT emp.name, emp.salary, emp.hiredateFROM emp, TABLE(SDO_RDF_MATCH( ‘(:Tom :hasParent ?y) (?y :hasFather ?x) (?x :name ?name)’, SDO_RDF_Models(‘family'), …)) tWHERE emp.name=t.name;
Inference: Overview • Native inferencing in the database for • RDF, RDFS • User-defined rules • Rules are stored in rulebases in the database • RDF graph is entailed (new triples are inferred) by applying rules in rulebase/s to model/s • Inferencing is based on forward chaining: new triples are inferred and stored ahead of query time • Minimizes on-the-fly computation and results in fast query times
Inferencing • RDFS Example: A rdf:type B, B rdfs:subClassOf C => A rdf:type C Ex: Matt rdf:type Father, Father rdfs:subClassOf Parent => Matt rdf:type Parent • User-defined Rules Example: A :hasParent B, B :hasParent C => A :hasGrandParent C Ex: Tom :hasParent Matt, Matt :hasParent John => Tom :hasGrandParent John
Query Example: Family Data select y, name from TABLE(SDO_RDF_MATCH( ‘(:Tom :hasGrandParent ?y) (?y :name ?name)’ (?y rdf:type :Male), SEM_Models('family'), SEM_Rulebases(‘family_rb), .., ..)); Returns the name of Tom’s grandfather “JohnD” “JohnD” Male :John :Janice :Matt :Suzie :Tom :Jack
Data Integration in the Life Sciences “Find all pieces of information associated with a specific target” • Data integration of multiple datasets • Across multiple representation formats, granularity of representation, and access mechanisms • Across In-house and public sets (Gene Ontology, UniProt, NCI thesaurus, etc.). • Standardized and machine-understandable data format with an open data access model is necessary to enable integration • Data-warehousing approach represents all data to be integrated in RDF/OWL • Semantic metadata layer approach links metadata from various sources and maps data access tool to relevant source • Ability to combine RDF/OWL queries with relational queries is a big benefit • Lilly and Pfizer are using semantic technology to solve data integration problems
Use Case: SenseLab Overview Courtesy, SenseLab, Yale University
Pathological Change Agent involves involves Neuronal Property inhibits Pathological Agent Neuron inhibits inhibits has Drug is_located_in Receptor Compartment is_located_in Channel Relational to Ontological Mapping Courtesy, SenseLab, Yale University
<Insert Picture Here> Semantic Technology Plans for the Next Release
Safe Harbor Statement & Confidentiality The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions.The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
Plans for the Next Release • Fast bulk-loadRDF/OWL data into the database • Several times faster than 10.2.0.2 batch load • Infer new triples with native OWL inferencing • Faster query of RDF/OWL data and ontologies • Ontology-Assisted Query of relational data
Query RDF/OWL data and ontologies OWLsubsets Ontology-Assisted Query of Enterprise Data Bulk-Load Overview QUERY INFER RDF/S User-def. Incr. DML STORE Batch-Load RDF/OWL data and ontologies Enterprise (Relational) data
Technical Overview Summary • Semantic Technology support in the database • Store RDF/OWL data and ontologies • Infer new RDF/OWL triples via native inferencing • Query RDF/OWL data and ontologies • Ontology-Assisted Query of relational data