Grid vs. Peer-to-Peer

Grid vs. Peer-to-Peer Yin Chen s0231189@sms.ed.ac.uk 25 June 2003

Content • Grid vs. P2P • What’s the request • Why P2P architecture • Issues of P2P • P2P case study- Freenet • Design

Grid vs. P2P Grid Standards- based Persistent Addresses security issues Resources are more powerful,more diverse, better connected Data intensive Facing problems of autonomic configuration and management Not much scalable

Grid vs. P2P • P2P Much scalability Fault tolerance Self-configuration Automatic problem determination Higher variable behaviour But lack of infrastructure Security problems Less concerned with qualities of service

What’s the request • A user requests the car service, and keeps logs recording if the request success or fail • The user may asks all other users about history request records. By statistic, we can know particular service responding ability. • Which can also gives prediction of further request.

Why P2P • Not run-time information • Better fault tolerance, • Pull model efficient and less network traffic

Issues of P2P - Topology

Issues of P2P - Response Modes

Issues of P2P … It turns to be problem of query from distributed data stores, which is different from central database query …

Issues of P2P - Query Processing Recursively Partitionable Query

Issues of P2P - Abort Timeout (1) • Problems - User no longer interested in query results - Query will forever roaming the network without stop it - The query should be fade away after sometime - Static timeout remains unchanging across hops • Solution ->Dynamic Abort Timeout - Nodes further away from the originator timeout earlier than nodes closer to the originator. - Decrease the timeout at each hop - Exponential decay with halving

Issues of P2P - Abort Timeout (2)

Issues of P2P - Query Scope (1) • Problems -No necessary to search the whole net - Broadcast model will flooding the network. • Solutions ->Select a neighbour subset - Search only a specific domain, host, owner - Random select half of the neighbours - In a tree-like topology, select all child and ignore all parent - Only find a single result. - Specify the maximum number of result (maxResults) and bytes(maxResultBytes) to be returned.

Issues of P2P - Query Scope (2) • Maintain a statisticsabout its neighbours. Only select neighbours that meet minimum requirements in term of latency, bandwidth or historic (maxLatency, minBandwidth, minHistoricResult) • Neighbour Selection Query • Radius of a query - is a measure of path length. - Set the maximum number of hops a query is allowed to travel - The radius is decreased by one at each hop. - The roaming query and response fade away when a radius of less than zero.

Issues of P2P - Routing • Random forwarding(random walk) • Learning: nodes record the requests answered by other nodes. A request is forwarded to the peer that answered similar requests previously or randomly. • Best neighbour: records the number of answers received from each peer. A request is forwarded to the peer who answered the largest number of requests. • Learning + best neighbour: identical with the learning, when no relevant experience exists, the request is forwarded to the best neighbour.

P2P Case Study - Freenet • Freenet provides a file-storage service • The network is entirely decentralised • Information publishers and consumers are anonymous • Communications are encrypted • Files in the data store are encrypted

Adding New File • A user assigns the file a GUID key, sends an insert message, containing file identifier(GUID) and a time-to-live(TTL)value. • GUID is location-independent globally unique identifier. By hashing the contents of the file. • On receiving an insert, the node checks if the key already exist. If not, stores it, creates a routing entry for it, looks up the closest key, and forwards the message to the related node. • If TTL expires, the final node returns an “all clear” message. The user then sends the data alone the path.

Requesting File • Every node maintains a routing table, listing addresses of other nodes and GUID keys. • On receiving a query, it first checks its own store. If it finds the file, it announces itself as the holder. Otherwise, it forwards the query to the node with the closest key. • If the file is found, each node passes the file alone the chain, and creates a new entry in its routing table. • Each node might also cache a copy locally. • The query maintains a TTL, decreased at each hop. • If a node runs out of candidates, it reports failure and back the its predecessor, which then tries its second choice

Adding New Node • New node sends a announcement to an existing node, with a TTL. • The receiving node forwards the announcement to another node chosen randomly from its routing table. • The announcement continues to propagate until its TTL runs out.

Training Routes • Nodes that reliably answer queries will be added to more routing tables. • Well-known nodes tend to see more requests and become better connected. • Similar keys tend to cluster in the nodes along the same path, because requests will be for similar files which have similar keys.

Managing Storage • Given finite disk space, sometime need to decide which file to keep. • Freenet decides by the frequency of requests per file, keeps the more popular files. • Frequently requested files have more copies in the network. Treegrows in that direction • Unrequested files are subjected to delete. Treeshrinks in that direction.

Design Tree Topology Each node maintains a Log File Each node also maintains a Local Data Store for storing the queries result.

Design • Adding New Node - When a new node adds to the network, it connects itself to only one existing node. • Adding LogRecord - When a user accesses services, a log record will be created - Log records should provide information about service name, service accessing time, success/fail flag

Design - Query • Query - When a node sets up a query, it first looks up its local data store to see if the same query exists. - If it is a new query , the node multicasts a query message to all connecting nodes. The query message contains Query Conditions, Maximum Data Volume value and a Dynamic Abort Timeout(DAT) value. - Query Condition may contains time period which user concerns, services name etc.

Query - On receiving the query message, a node first looks up its own local data store, if there is no same query, it multicasts the query to all connecting nodes. - When DAT expires, the final node begins to return data along the chain. - Response using Routed Response mode

Design - Query - To reduce network traffic, calculation will operate at each node. Using Recurisively Query Plan. The calculation result will propagate up along the chain.

Design - Query - To avoid data flooding, only necessary volume data will be calculated, that is specified by Maximum Data Volume - Each chain will return zero or one result - Dynamic Abort Time (DAT) using Exponential decay with halving model. DAT will decrease at each hop.

Design • Calculation - By particular statistics methodology • Showing Result - Final result will be shown in graph style - The query result will also be saved in the Local Data Store • Deleting log records - To save disk space, early log records should be deleted after period of time

Grid vs. P2R Thanks !

Grid vs. Peer-to-Peer