210 likes | 363 Views
Sigma EE: Reaping low-hanging fruits in RDF-based data integration. Richard Cyganiak I-Semantics 2010, Graz. Intro. Semantic Technologies conferences In-use Tracks Applications session D2RQ Expose contents of relational databases as RDF/SPARQL
E N D
Sigma EE: Reaping low-hanging fruits in RDF-based data integration Richard Cyganiak I-Semantics 2010, Graz
Intro • Semantic Technologies conferences • In-use Tracks • Applications session • D2RQ • Expose contents of relational databases as RDF/SPARQL • Just a format converter; what do people use it for?
The common theme … Integration of data across the organization/project
The RDF-based data integration project • Probably limited budget … • Otherwise would buy from SAP or Oracle
Sigma EE • Originally not built for enterprise data but for web data • Sindice, search engine for the Web of Data • Microformats, RDFa, Linked Data on the Web • For building apps on top of data search API • http://sindice.com/ • How to show the richness of all that data? • http://sig.ma/
Background • The problem: How to provide uniform access to heterogeneous data sources? • Value-added services: • Search • Browsing • Recommendations of related items • Reporting • Dashboarding • Notifications • …
Solutions? • Data Warehousing • Enterprise Information Integration • Enterprise Search • A middle ground in-between?
Data Warehousing, EII • Integrate enterprise data sources into a new data source • Data Warehouse: materialized (new DB) • Enterprise Information Integration: virtual (distrib. queries) • Focus on data • Tight integration • High up-front cost
Enterprise Search • Provides the most sought-after service (search) • Focus on documents • full-text search • Lower up-front cost (no schema alignment) • Providing value-added services on top is difficult
A middle ground • Start by providing access to data on a per-business-object basis without prior schema alignment • Services: Browsing of the catalog of objects; search • Align, link and reconciliate as required to enable more services, e.g., expressive queries
A middle ground • No accepted term yet • Data Spaces? • Pay-as-you-go Data Integration? • Linked Enterprise Data?
The RDF technology stack • A standards-based “data-first” approach • RDF, SPARQL, OWL – W3C standards • Off-the-shelf components • Integrates well with web data sources
The “RDF Bus” • Various implementation strategies • ETL + One Big Triple Store with SPARQL endpoint • Several SPARQL endpoint (SPARQL 1.1 SERVICE feature?) • Linked Data style (resolvable URIs) • Bus details determine what services can be provided • Can you do high-performance SPARQL? • Can you do full-text search? • Real-time up-to-date information or significant delay? • Where is alignment handled? • Who can hook in new data sources?
Sigma EE • Services: search, browsing • Strengths • Minimal requirements for the RDF bus • Strong support for provenance • Dynamic UI • Bus has to provide Search and Entity descriptions • E.g., SPARQL endpoint with full-text search • E.g., Solr • E.g., Sindice + (part of) the Web • E.g., custom Java classes • Or multiple of the above
Sigma UI • Full-text search • On-the-fly fuzzy merge of data sources • Empower user to evaluate provenance, reject and accept data sources • Show/hide/rearrange properties and values • Browse to related entities • Permalinks, embeddable widgets
Summary • Sigma EE: front-end for your RDF Bus • E.g., for your triple store • Off-the-shelf UI with minimum configuration • Available under GPL or other licenses on request • Running at http://sig.ma/