60 likes | 68 Views
Explore the challenges and solutions for treating data from multiple systems as one integrated dataset, understanding semantic structure, data mapping, query capabilities, performance, scalability, federated query, and algorithm optimization.
E N D
Breakout group computation RDF triple store vs relational database vs noSQL databases vs X
Issues • Treat data from several systems or locations as one integrated data set (materially or virtually • Get a clear understanding of the semantic structure of data • Data mapping issues. A big part of this is understanding the semantic structure • What data can be expressed • What queries can be posed, esp. with respect to inference, including inference using ontology structure. Built-in query language features, e.g. transitive closure on subsumption • Performance as related to complexity of queries • Scalability to very large datasets • Federated query vs centralized store • Optimization of algorithms
Practical experience Mayo: • SPARQL to SQL performance not practical • RDF materialized also has performance problems – a triple store with data on 100K patients, 3 billion triples needs a Cray supercomputer to get reasonable performance VIVO • many people are looking at performance issues • VIVO can work with different triple stores • There is a performance issue with federated search
Conclusions • Use right tool for the right job, what is a good tool for what question with what kind of data • Understand the semantic structure of your data no matter what tool you use • Ontologies can be used with any data store, not bound to present model of linked data
T • S
T • S