Querying the Internet with PIER (PIER = Peer-to-peer Information Exchange and Retrieval)

Querying the Internet with PIER(PIER = Peer-to-peer Information Exchange and Retrieval)

What is PIER? • Peer-to-Peer Information Exchange and Retrieval • Query engine that runs on top of P2P network • step to the distributed query processing at a larger scale • way for massive distribution: querying heterogeneous data • Architecture meets traditional database query processing with recent peer-to-peer technologies

Key goal is scalableindexing system for large-scale decentralized storage applications on the Internet • P2P, Large scale storage management systems (OceanStore, Publius), wide-area name resolution services

Internet Scale 1000’s – Millions Single Site Clusters Distributed 10’s – 100’s What is Very Large?Depends on Who You Are Internet scale systems vs. hundred node systems • How to run DB style queries at Internet Scale! DatabaseCommunity Network Community

What are the Key Properties? • Lots of data that is: • Naturally distributed (where it’s generated) • Centralized collection undesirable • Homogeneous in schema • Data is more useful when viewed as a whole

Who Needs Internet Scale?Example 1: Filenames • Simple ubiquitous schemas: • Filenames, Sizes, ID3 tags • Born from early P2P systems such as Napster, Gnutella etc. • Content is shared by “normal” non-expert users… home users • Systems were built by a few individuals ‘in their garages’  Low barrier to entry

Example 2: Network Traces • Schemas are mostly standardized: • IP, SMTP, HTTP, SNMP log formats • Network administrators are looking for patterns within their site AND with other sites: • DoS attacks cross administrative boundaries • Tracking virus/worm infections • Timeliness is very helpful • Might surprise you how useful it is: • Network bandwidth on PlanetLab (world-wide distributed research test bed) is mostly filled with people monitoring the network status

Our Challenge • Our focus is on the challenge of scale: • Applications are homogeneous and distributed • Already have significant interest • Provide a flexible framework for a wide variety of applications

Four Design Principles (I) • Relaxed Consistency • ACID transactions severely limits the scalability and availability of distributed databases • We provide best-effort results • Organic Scaling • Applications may start small, withouta priori knowledge of size

Four Design Principles (II) • Natural habitat • No CREATE TABLE/INSERT • No “publish to web server” • Wrappers or gateways allow the information to be accessed where it is created • Standard Schemas via Grassroots software • Data is produced by widespread software providing a de-facto schema to utilize

Declarative Queries Query Plan Overlay Network Physical Network >>based on Can

Applications • P2P Databases Highly distributed and available data • Network Monitoring Intrusion detection Fingerprint queries

DHTs • Implemented with CAN (Content Addressable Network). • Node identified by hyper-rectangle in d-dimensional space • Key hashed to a point, stored in corresponding node. • Routing Table of neighbours is maintained. O(d)

Key = (15,14) Data Given a message with an ID, route the message to the computer currently responsible for that ID (16,16) (16,0) (0,0) (0,16)

DHT Design • Routing Layer Mapping for keys (-- dynamic as nodes leave and join) • Storage Manager DHT based data • Provider Storage access interface for higher levels

DHT – Routing Routing layer maps a key into the IP address of the node currently responsible for that key. Provides exact lookups, callbacks higher levels when the set of keys has changed Routing layer API lookup(key)  ipaddr (Asynchronous Fnc) join(landmarkNode) leave() locationMapChange()

DHT – Storage Storage Manager stores and retrieves records, which consist of key/value pairs. Keys are used to locate items and can be any data type or structure supported Storage Manager API store(key, item) retrieve(key) item remove(key)

DHT – Provider (1) Provider ties routing and storage manager layers and provides an interface • Each object in the DHT has a namespace, resourceID and instanceID • DHT key = hash(namespace,resourceID) • namespace - application or group of object, table or relation • resourceID – primary key or any attribute(Object) • instanceID– integer, to separate items with the samenamespace and resourceID • Lifetime - item storage duration CAN’s mapping of resourceID/Object is equivalent to an index

DHT – Provider (2) Provider API get(namespace, resourceID)  item put(namespace, resourceID, item, lifetime) renew(namespace, resourceID, instanceID, lifetime)  bool multicast(namespace, resourceID, item) lscan(namespace)  items newData(namespace, item)

Query Processor • How it works? • performs selection, projection, joins, grouping, aggregation ->Operators • Operators push and pull data • simultaneous execution of multiple operators pipelined together • results are produced and queued as quick as possible • How it modifies data? • insert, update and delete different items via DHT interface • How it selects data to process? • dilated-reachable snapshot – data, published by reachable nodes at the query arrival time

Join Algorithms • Limited Bandwidth • Symmetric Hash Join: - Rehashes both tables • Semi Joins: - Transfer only matching tuples • At 40% selectivity, bottleneck switches from computation nodes to query sites

Future Research • Routing, Storage and Layering • Catalogs and Query Optimization • Hierarchical Aggregations • Range Predicates • Continuous Queries over Streams • Sharing between Queries • Semi-structured Data

Distributed Hash Tables (DHTs) • What is a DHT? • Take an abstract ID space, and partition among a changing set of computers (nodes) • Given a message with an ID, route the message to the computer currently responsible for that ID • Can store messages at the nodes • This is like a “distributed hash table” • Provides a put()/get() API • Cheap maintenance when nodes come and go

Distributed Hash Tables (DHTs) • Lots of effort is put into making DHTs better: • Scalable (thousands  millions of nodes) • Resilient to failure • Secure (anonymity, encryption, etc.) • Efficient (fast access with minimal state) • Load balanced • etc.

PIER’s Three Uses for DHTs • Single elegant mechanism with many uses: • Search: Index • Like a hash index • Partitioning: Value (key)-based routing • Like Gamma/Volcano • Routing: Network routing for QP messages • Query dissemination • Bloom filters • Hierarchical QP operators (aggregation, join, etc) • Not clear there’s another substrate that supports all these uses

Metrics • We are primarily interested in 3 metrics: • Answer quality (recall and precision) • Bandwidth utilization • Latency • Different DHTs provide different properties: • Resilience to failures (recovery time)  answer quality • Path length  bandwidth & latency • Path convergence  bandwidth & latency • Different QP Join Strategies: • Symmetric Hash Join, Fetch Matches, Symmetric Semi-Join, Bloom Filters, etc. • Big Picture: Tradeoff bandwidth (extra rehashing) and latency

Symmetric Hash Join (SHJ)

Fetch Matches (FM)

Symmetric Semi Join (SSJ) • Both R and S are projected to save bandwidth • The complete R and S tuples are fetched in parallel to improve latency

Overview • CAN is a distributed system that maps keys onto values • Keys hashed into d dimensional space • Interface: • insert(key, value) • retrieve(key)

Overview y State of the system at time t Peer Resource Zone x In this 2 dimensional space a key is mapped to a point (x,y)

DESIGN • D-dimensional Cartesian coordinate space (d-torus) • Every Node owns a distinct Zone • Map Key k1 onto a point p1 using a Uniform Hash function • (k1,v1) is stored at the node Nx that owns the zone with p1

Node Maintains routing table with neighbors Ex: A Node holds{B,C,E,D} • Follow the straight line path through the Cartesian space

Q(x,y) key Routing y • d-dimensional space with n zones • 2 zones are neighbor if d-1 dim overlap • Routing path of length: • Algorithm: Choose the neighbor nearest to the destination (x,y) Peer Q(x,y) Query/ Resource

CAN: construction* Bootstrap node new node

I CAN: construction Bootstrap node new node 1) Discover some node “I” already in CAN

CAN: construction (x,y) I new node 2) Pick random point in space

CAN: construction (x,y) J I new node 3) I routes to (x,y), discovers node J

CAN: construction new J 4) split J’s zone in half… new owns one half

Maintenance • Use zone takeover in case of failure or leaving of a node • Send your neighbor table to neighbors to inform that you are alive at discrete time interval t • If your neighbor does not send alive in time t, takeover its zone • Zone reassignment is needed

Node Departure • Some one has to take over the Zone • Explicit hand over of the zone to one of its Neighbors • Merge to valid Zone if ”possible” • If not Possible ”then to Zones are temporary handled by the smallest neighbor

Zone reassignment 1 3 1 3 2 4 4 2 Partition tree Zoning

Zone reassignment 1 3 1 3 4 4 Partition tree Zoning

Design Improvements • Multi-Dimension • Multi-Coordinate Spaces • Overloading the Zones • Multiple Hash Functions • Topologically Sensitive Construction • Uniform Partitioning • Caching

Querying the Internet with PIER (PIER = Peer-to-peer Information Exchange and Retrieval)