PIER: Peer-to-Peer Information Exchange and Retrieval

PIER: Peer-to-Peer Information Exchange and Retrieval Ryan Huebsch Joe Hellerstein, Nick Lanham, Boon Thau Loo, Scott Shenker, Ion Stoica p2p@db.cs.berkeley.edu UC Berkeley, CS Division Berkeley P2P 2/24/03

Outline • Motivation • General Architecture • Brief look at the Algorithms • Potential Applications • Current Status • Future Research

P2P DHTs are Cool, but… • Lots of effort is put into making DHTs • Scalable (thousands  millions of nodes) • Reliable (every imaginable failure) • Security (anonymity, encryption, etc.) • Efficient (fast access with minimal state) • Load balancing, and others • Still only a hash table interface, put and get • Hard (but not impossible) to build real applications using only the basic primitives

Databases are Cool, but… • Relational databases bring a declarative interface to the user/application. • Ask for what you want, not how to get it • Database community is not new to parallel and distributed systems • Parallel: Centralized, one administrator, one point of failure • Distributed: Did not catch on, complicated, never really scaled above 100’s of machines

Databases + P2P DHTsMarriage Made in Heaven? • Well, databases carry a lot of other baggage • ACID transactions • Consistency above all else • So we just want to unite the query processor with DHTs • DHTs + Relational Query Processing = PIER • Bring complex queries to DHTs  foundation for real applications

Architecture • DHT is divided into 3 modules • We’ve chosen one way to do this, but may change with time and experience • Goal is to make each simple and replaceable • PIER has one primary module • Add-ons can make it look more database like.

Very simple interface Plug in any routing algorithm here: CAN, Chord, Pastry, Tapestry, etc. lookup(key)  ipaddr join(landmarkNode) leave() CALLBACK: locationMapChange() Architecture: DHT: Routing

Currently we use a simple in-memory storage system, no reason a more complex one couldn’t be used store(key, item) retrieve(key)  item remove(key) Architecture: DHT: Storage

Connects the pieces, and provides the ‘DHT’ interface get(ns, rid)  item put(ns, rid, iid, item, lifetime) renew(ns, rid, iid, lifetime)  success? multicast(ns, item) lscan(ns)  items CALLBACK: newData(ns, item) Architecture: DHT: Provider

Architecture: PIER • Currently, consists only of the relational execution engine • Executes a pre-optimized query plan • Query plan is a box-and-arrow description of how to connect basic operators together • selection, projection, join, group-by/aggregation, and some DHT specific operators such as rehash • Traditional DBs use an optimizer + catalog to take SQL and generate the query plan, those are just add-ons to PIER

Joins: The Core of Query Processing • A relational join can be used to calculate: • The intersection of two sets • Correlate information • Find matching data • Goal: • Get tuples that have the same value for a particular attribute(s) (the join attribute(s)) to the same site, then append tuples together. • Algorithms come from existing database literature, minor adaptations to use DHT.

Joins: Symmetric Hash Join (SHJ) • Algorithm for each site • (Scan) Use two lscan calls to retrieve all data stored at that site from the source tables • (Rehash) put a copy of each eligible tuple with the hash key based on the value of the join attribute • (Listen) use newData to see the rehashed tuples • (Compute) Run standard one-site join algorithm on the tuples as they arrive • Scan/Rehash steps must be run on all sites that store source data • Listen/Compute steps can be run on fewer nodes by choosing the hash key differently

Joins: Fetch Matches (FM) • Algorithm for each site • (Scan) Use lscan to retrieve all data from ONE table • (Get) Based on the value for the join attribute, issue a get for the possible matching tuples from the other table • Note, one table (the one we issue the gets for) must already be hashed on the join attribute • Big picture: • SHJ is put based • FM is get based

Joins: Additional Strategies • Bloom Filters • Use of bloom filters can be used to reduce the amount of data rehashed in the SHJ • Symmetric Semi-Join • Run a SHJ on the source data projected to only have the hash key and join attributes. • Use the results of this mini-join as source for two FM joins to retrieve the other attributes for tuples that are likely to be in the answer set • Big Picture: • Tradeoff bandwidth (extra rehashing) for latency (time to exchange filters)

Group-By/Aggregation • A group-by/aggregation can be used to calculate: • Split data into groups based on value • Max, Min, Sum, Count, etc. • Goal: • Get tuples that have the same value for a particular attribute(s) (group-by attribute(s)) to the same site, then summarize data (aggregation).

Group-By/Aggregation • At each site • (Scan) lscan the source table • Determine group tuple belongs in • Add tuple’s data to that group’s partial summary • (Rehash) for each group represented at the site, rehash the summary tuple with hash key based on group-by attribute • (Combine) use newData to get partial summaries, combine and produce final result after specified time, number of partial results, or rate of input • Can add multiple layers of rehash/combine to reduce fan-in. • Subdivide groups in subgroups by randomly appending a number to the group’s key

Why Would a DHT Query Processor be Helpful? • Data is distributed  centralized processing not efficient or not acceptable • Correlation, Intersection  Joins • Summarize, Aggregation, Compress  Group-By/Aggregation • Probably not as efficient as custom designed solution for a single particular problem • Common infrastructure for fast application development/deployment

Network Monitoring • Lot’s of data, naturally distributed, almost always summarized  aggregation • Intrusion Detection usually involves correlating information from multiple sites  join • Data comes from many sources • nmap, snort, ganglia, firewalls, web logs, etc. • PlanetLab is our natural test bed (Timothy, Brent, and Nick)

Enhanced File Searching • First step: Take over Gnutella (Boon) • Well, actually just make PlanetLab look an UltraPeer on the outside, but run PIER on the inside • Long term: Value added services • Better searching, utilize all of the MP3 ID tags • Reputations • Combine with network monitoring data to better estimate download times

i3 Style Services • Mobility and Multicast • Sender is a publisher • Receiver(s) issue a continuous query looking for new data • Service Composition • Services issue a continuous query for data looking to be processed • After processing data, they publish it back into the network

Codebase • Approximately 17,600 lines of NCSS Java Code • Same code (overlay components/pier) run on the simulator or over a real network without changes • Runs simple simulations with up to 10k nodes • Limiting factor: 2GB addressable memory for the JVM (in Linux) • Runs on Millennium and Planet Lab up to 64 nodes • Limiting factor: Available/working nodes & setup time • Code: • Basic implementations of Chord and CAN • Selection, projection, joins (4 methods), and aggregation. • Non-continuous queries

Simulations of 1 SHJ Join

1 SHJ Join on Millennium

Future Research • Routing, Storage and Layering • Catalogs and Query Optimization • Hierarchical Aggregations • Range Predicates • Continuous Queries over Streams • Semi-structured Data • Applications, Applications, Applications…

PIER: Peer-to-Peer Information Exchange and Retrieval