290 likes | 424 Views
PIER: Peer-to-Peer Information Exchange and Retrieval. Ryan Huebsch Joe Hellerstein, Nick Lanham, Boon Thau Loo, Scott Shenker, Ion Stoica p2p@db.cs.berkeley.edu UC Berkeley, CS Division. Berkeley P2P 2/24/03. Outline. Motivation General Architecture Brief look at the Algorithms
E N D
PIER: Peer-to-Peer Information Exchange and Retrieval Ryan Huebsch Joe Hellerstein, Nick Lanham, Boon Thau Loo, Scott Shenker, Ion Stoica p2p@db.cs.berkeley.edu UC Berkeley, CS Division Berkeley P2P 2/24/03
Outline • Motivation • General Architecture • Brief look at the Algorithms • Potential Applications • Current Status • Future Research
P2P DHTs are Cool, but… • Lots of effort is put into making DHTs • Scalable (thousands millions of nodes) • Reliable (every imaginable failure) • Security (anonymity, encryption, etc.) • Efficient (fast access with minimal state) • Load balancing, and others • Still only a hash table interface, put and get • Hard (but not impossible) to build real applications using only the basic primitives
Databases are Cool, but… • Relational databases bring a declarative interface to the user/application. • Ask for what you want, not how to get it • Database community is not new to parallel and distributed systems • Parallel: Centralized, one administrator, one point of failure • Distributed: Did not catch on, complicated, never really scaled above 100’s of machines
Databases + P2P DHTsMarriage Made in Heaven? • Well, databases carry a lot of other baggage • ACID transactions • Consistency above all else • So we just want to unite the query processor with DHTs • DHTs + Relational Query Processing = PIER • Bring complex queries to DHTs foundation for real applications
Outline • Motivation • General Architecture • Brief look at the Algorithms • Potential Applications • Current Status • Future Research
Architecture • DHT is divided into 3 modules • We’ve chosen one way to do this, but may change with time and experience • Goal is to make each simple and replaceable • PIER has one primary module • Add-ons can make it look more database like.
Very simple interface Plug in any routing algorithm here: CAN, Chord, Pastry, Tapestry, etc. lookup(key) ipaddr join(landmarkNode) leave() CALLBACK: locationMapChange() Architecture: DHT: Routing
Currently we use a simple in-memory storage system, no reason a more complex one couldn’t be used store(key, item) retrieve(key) item remove(key) Architecture: DHT: Storage
Connects the pieces, and provides the ‘DHT’ interface get(ns, rid) item put(ns, rid, iid, item, lifetime) renew(ns, rid, iid, lifetime) success? multicast(ns, item) lscan(ns) items CALLBACK: newData(ns, item) Architecture: DHT: Provider
Architecture: PIER • Currently, consists only of the relational execution engine • Executes a pre-optimized query plan • Query plan is a box-and-arrow description of how to connect basic operators together • selection, projection, join, group-by/aggregation, and some DHT specific operators such as rehash • Traditional DBs use an optimizer + catalog to take SQL and generate the query plan, those are just add-ons to PIER
Outline • Motivation • General Architecture • Brief look at the Algorithms • Potential Applications • Current Status • Future Research
Joins: The Core of Query Processing • A relational join can be used to calculate: • The intersection of two sets • Correlate information • Find matching data • Goal: • Get tuples that have the same value for a particular attribute(s) (the join attribute(s)) to the same site, then append tuples together. • Algorithms come from existing database literature, minor adaptations to use DHT.
Joins: Symmetric Hash Join (SHJ) • Algorithm for each site • (Scan) Use two lscan calls to retrieve all data stored at that site from the source tables • (Rehash) put a copy of each eligible tuple with the hash key based on the value of the join attribute • (Listen) use newData to see the rehashed tuples • (Compute) Run standard one-site join algorithm on the tuples as they arrive • Scan/Rehash steps must be run on all sites that store source data • Listen/Compute steps can be run on fewer nodes by choosing the hash key differently
Joins: Fetch Matches (FM) • Algorithm for each site • (Scan) Use lscan to retrieve all data from ONE table • (Get) Based on the value for the join attribute, issue a get for the possible matching tuples from the other table • Note, one table (the one we issue the gets for) must already be hashed on the join attribute • Big picture: • SHJ is put based • FM is get based
Joins: Additional Strategies • Bloom Filters • Use of bloom filters can be used to reduce the amount of data rehashed in the SHJ • Symmetric Semi-Join • Run a SHJ on the source data projected to only have the hash key and join attributes. • Use the results of this mini-join as source for two FM joins to retrieve the other attributes for tuples that are likely to be in the answer set • Big Picture: • Tradeoff bandwidth (extra rehashing) for latency (time to exchange filters)
Group-By/Aggregation • A group-by/aggregation can be used to calculate: • Split data into groups based on value • Max, Min, Sum, Count, etc. • Goal: • Get tuples that have the same value for a particular attribute(s) (group-by attribute(s)) to the same site, then summarize data (aggregation).
Group-By/Aggregation • At each site • (Scan) lscan the source table • Determine group tuple belongs in • Add tuple’s data to that group’s partial summary • (Rehash) for each group represented at the site, rehash the summary tuple with hash key based on group-by attribute • (Combine) use newData to get partial summaries, combine and produce final result after specified time, number of partial results, or rate of input • Can add multiple layers of rehash/combine to reduce fan-in. • Subdivide groups in subgroups by randomly appending a number to the group’s key
Outline • Motivation • General Architecture • Brief look at the Algorithms • Potential Applications • Current Status • Future Research
Why Would a DHT Query Processor be Helpful? • Data is distributed centralized processing not efficient or not acceptable • Correlation, Intersection Joins • Summarize, Aggregation, Compress Group-By/Aggregation • Probably not as efficient as custom designed solution for a single particular problem • Common infrastructure for fast application development/deployment
Network Monitoring • Lot’s of data, naturally distributed, almost always summarized aggregation • Intrusion Detection usually involves correlating information from multiple sites join • Data comes from many sources • nmap, snort, ganglia, firewalls, web logs, etc. • PlanetLab is our natural test bed (Timothy, Brent, and Nick)
Enhanced File Searching • First step: Take over Gnutella (Boon) • Well, actually just make PlanetLab look an UltraPeer on the outside, but run PIER on the inside • Long term: Value added services • Better searching, utilize all of the MP3 ID tags • Reputations • Combine with network monitoring data to better estimate download times
i3 Style Services • Mobility and Multicast • Sender is a publisher • Receiver(s) issue a continuous query looking for new data • Service Composition • Services issue a continuous query for data looking to be processed • After processing data, they publish it back into the network
Outline • Motivation • General Architecture • Brief look at the Algorithms • Potential Applications • Current Status • Future Research
Codebase • Approximately 17,600 lines of NCSS Java Code • Same code (overlay components/pier) run on the simulator or over a real network without changes • Runs simple simulations with up to 10k nodes • Limiting factor: 2GB addressable memory for the JVM (in Linux) • Runs on Millennium and Planet Lab up to 64 nodes • Limiting factor: Available/working nodes & setup time • Code: • Basic implementations of Chord and CAN • Selection, projection, joins (4 methods), and aggregation. • Non-continuous queries
Outline • Motivation • General Architecture • Brief look at the Algorithms • Potential Applications • Current Status • Future Research
Future Research • Routing, Storage and Layering • Catalogs and Query Optimization • Hierarchical Aggregations • Range Predicates • Continuous Queries over Streams • Semi-structured Data • Applications, Applications, Applications…