1 / 22

A Physical Query Algebra for DHT-based P2P Systems

A Physical Query Algebra for DHT-based P2P Systems. Kai-Uwe Sattler 1 , Philipp Rösch 1 , Erik Buchmann 2 , Klemens Böhm 2 1 Department of Computer Science and Automation, TU Ilmenau 2 Department of Computer Science, University of Magdeburg. Distributed Hash Tables.

chidi
Download Presentation

A Physical Query Algebra for DHT-based P2P Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Physical Query Algebra for DHT-based P2P Systems Kai-Uwe Sattler1, Philipp Rösch1, Erik Buchmann2, Klemens Böhm2 1Department of Computer Science and Automation, TU Ilmenau 2Department of Computer Science, University of Magdeburg

  2. Distributed Hash Tables • Examples: CAN, CHORD, PASTRY, etc. • Advantages of P2P systems, e.g., • No SPOF, shared infrastructure costs, censorship-resistance • Manage huge sets of (key, value)-pairs • Cope with large numbers of parallel transactions • Efficient query processing: • Greedy forward routing, • But only simple exact-match queries on unstructured data sets A Physical Query Algebra for DHT-based P2P Systems

  3. Extended Queries in DHT • Some extensions: • Trigrams - text retrievalbeethoven: bee eet eth tho hov ove ven • Bloom filters - hash-based AND • Feature vectors - multimedia documents • But: • Extensions are application-specific • No universal query algebra • Idea: • Relational data sets, SQL-like queries Applications: management of genom data, semantic web, distributed indexes A Physical Query Algebra for DHT-based P2P Systems

  4. Relational Data in DHT? • Storing relational data in DHT • Fragmentation scheme? • Accessing secondary keys? • Support for SQL-like query processing • Distribution scheme for complex queries? • Join operations? • Full-table scan without flooding? • Exploiting the P2P nature • No central instance, no global knowledge • Parallel processing • Problems with availability and failures A Physical Query Algebra for DHT-based P2P Systems

  5. Outline of Our Approach • Use Content-Addressable Networks (CAN) • Locality-aware hash function • Preserving neighborhood of similar tuples • Space-filling curve • API Extension • Multicast • Temporary re-hashing • Distributed query plan operators (POP) • Selection, join, grouping/aggregation • POP distribution scheme A Physical Query Algebra for DHT-based P2P Systems

  6. Content-Addressable Networks • Proposed by S. Ratnasamy (2001) • Keys: d-dimensional points • Key space is a torus in d dimensions • Example: d=2 A Physical Query Algebra for DHT-based P2P Systems

  7. Zones and Neighbors in CAN • Each peer is responsible for one zone, i.e., stores all (key, value) pairs of the zone • Each peer knows the neighbors of its zone • Random assignment of peers to zones at startup • Overloading of zones, multiple realities, ... A Physical Query Algebra for DHT-based P2P Systems

  8. Greedy Forward Routing in CAN • get(k): • Forward request to that neighbor whose zone is closest to k • Repeat until the peer responsible for k is reached (k,v) get(k) A Physical Query Algebra for DHT-based P2P Systems

  9. Managing Relational Data:Simple Approach • Relation r  R,Tuple t  r, t = {ak, a1, ..., an }Key k‘ = h(ak) • Problems: • Tuples are irregularly disseminated over the key space, i.e., only exact-match queries are supported • No search for attributes other than primary key x x x σ5<ak<10(r) ? x x σab=20(r) ? x x x A Physical Query Algebra for DHT-based P2P Systems

  10. RelationID Key (RelationID, Key Value) hk hr 0 0 0 1 0 1 0 0 0 0 0 1 0 0 1 0 Dimension #2 Dimension #1 Fragmentation Scheme • Reverse bit interleaving (z-curve) • Tuple t  r, t = {ak, a1, ..., an } • Two hash functions:Key k‘ = hr(r) ° hk(ak) Key k‘ = h(ak) (1,2) A Physical Query Algebra for DHT-based P2P Systems

  11. ra, rb, rc Two Hash Functions • Key k‘ = hr(r) ° hk(ak) • hr(r): RelationID determines the placement of the space-filling curve • hk(ak): primary key determines the position on the curve,locality-awarenessak = 0, 1, 2, 3, 4, ... A Physical Query Algebra for DHT-based P2P Systems

  12. Additional API Primitives • Standard operations: put(k, v), v=get(k) • Only two additional operations needed for our query algebra: put_temp(), multicast()put_temp(k, v, t) • Re-hashing of a given relation • Temporary put-operation • Allows indexed access to other attributes than the primary key A Physical Query Algebra for DHT-based P2P Systems

  13. Additional API Primitives (Cont.) multicast(zmin, zmax, POP) • Sends a message to a group of peers • Peers are identified by an interval of the z-curve Example: σ3<ak<6(r) multicast(3,6, POP) send(σak=3) send(σ4<ak<6) A Physical Query Algebra for DHT-based P2P Systems

  14. T S R Query Plan Operators (POP) • Hash-based implementation for selection, join, grouping, aggregation • Distributed query processing • Operator Trees A Physical Query Algebra for DHT-based P2P Systems

  15. Selection • Selection POP • On the primary key: • Example: σ3<ak<6(r) • Determine the interval on the z-curve • Send selection operator via multicast • On other attributes: • Example: σ3<a5<6(r) • Perform full-table scan, e.g., multicast( min(a5), max(a5), POP) A Physical Query Algebra for DHT-based P2P Systems

  16. Join • Nested Loop Join POP, Symmetric Hash Join POP • On the primary key: • Perform join immediately • On other attributes: • Re-hash the relation using put_temp first • Perform join as above A Physical Query Algebra for DHT-based P2P Systems

  17. shjoin(R,S) put_temp(h(tR),tR,x) S1 shjoin(R,S) RS1 R1 R S shjoin(R,S) shjoin(R,S) S2 RS2 R2 put_temp(h(tS),tS,x) Example: Symmetric Hash Join A Physical Query Algebra for DHT-based P2P Systems

  18. Sorting/Aggregation • Central grouping POP: • One peer iterates over the z-curve, performs central sorting/aggregation • Hash group POP: • Re-distribute the relation using a hash function on the attribute to be sorted/aggregated • “Aggregation Peers” are responsible for sorting/aggregation of incoming attribute values A Physical Query Algebra for DHT-based P2P Systems

  19. T S R Query Evaluation • Input • Left-handed POP trees • Design Principles • Stateless evaluation • Blocking operations:delivery of intermediate data (early aggregation) A Physical Query Algebra for DHT-based P2P Systems

  20. rra rr rrb r2 r1 Query Evaluation: Example P1 r1 P4 P0 P2 P0 r2a P5 P3 r2b A Physical Query Algebra for DHT-based P2P Systems

  21. Conclusion • Current state: • Prototype is fully implemented • Execution of plans like(shjoin a1=a2 (scan a3>42 REL1) (scan REL2)) • First experiments in small CAN (100 Peers) are promising A Physical Query Algebra for DHT-based P2P Systems

  22. Conclusion (cont.) • Future topics: • Experiments with large data sets and many nodes (100,000 nodes, 10 mio. queries, test data from the TCP-H benchmark) • Optimization of the different POP implementations • Efficient range queries • Dynamic query operations A Physical Query Algebra for DHT-based P2P Systems

More Related