Why am I here? A Universal Storage Approach to Distributed Data Management

Why am I here?A Universal Storage Approach to Distributed Data Management Kai-Uwe SattlerTU Ilmenau, Germany Database & Information Systems Group www.tu-ilmenau.de/dbis

Universal Storage Approach • Vision: Google for structured public data, but distributed! • „Uniform“ database, but extensible • Decentralized management (scalability) • Features for querying, combining, ... data • Basis: structured overlay network - P-Grid (EPFL)

Index on (Value) Index on (OID) Index on (Attribute#Value) From the Universal Relation to a TripleStore • Storing tuples as set of triples (OID, attribute, value) • similar to RDF: subject, predicate, object • Extensible schema • No explicit null values • Simple indexing

Query Model • Query language VQL • Similar to RDF language SPARQL • conjunctive queries • With additional operators both on instance and schema level SELECT ?oid, ?valWHERE { (?oid, price, ?val), FILTER(?val > 500) }

Queries in VQL • Schema level queries • Similarity queries • Top-k, skyline, ... WHERE { (?oid1, ?attr1, ?val1), (?oid2, ?attr2, ?val2), FILTER(?attr1 = ?attr2), FILTER(?val1 = ?val2) } WHERE { (?o, attrib, ?value) FILTER (edist(?value, v) < 2) }

subsumes equiv Semantic Level: Representing Mappings • Simple form of attribute correspondences • Stored as triples: (A4, equiv, A5), (A3, subsumes, A6) • „Mapping“ queries: ... WHERE { (price, equiv, ?a), (?oid, ?a, ?v) FILTER (?v < 10.000) }

Query Execution • Goal: „stateless“ execution • No state information; no blocked waiting • Multiple instances of the plan „travel“ through the network • Message: plan + intermediate results • Peers processing next plan operators are identified by hashing intermediate results • Basic cost model for choosing between different operator implementations • Coat factors (#messages, #hops)

Conclusions • Running prototype in PlanetLab: ≈400 real nodes, not a simulation! • Further issues • Guarantees in query execution (quality, response time, ...) • Exploiting heartbeat messages • Addressing the semantic level • More advanced mappings • Mapping discovery • Acknowledgements: joint work with • Marcel Karnstedt, Manfred Hauswirth (DERI), Roman Schmidt (EPFL), and our students • Further readings • P2P 2006 Conference, ICDE Workshop NetDB 2006, ICDE 2007 Demo

Why am I here? A Universal Storage Approach to Distributed Data Management