Peer to Peer Information Retrieval

Peer to Peer Information Retrieval Going beyond Napster

What is P2P IR? • No index on a central server • Content is distributed across all users of the system • Content is more then text • Binary files • Associated Metadata

An example of a P2P system

Why go P2P • Spiraling costs of maintaining indexes • Look at Google’s server farm • New content forces new thinking on IR • Large binary files are hard to index • Freedom of speech • Society is striving to communicate data which is being legislated against

First P2P Systems • Central hash of distributed content • Only the central hash was used for queries • Disadvantages: • Scalability • Known location of content • Single point of failure • Advantages • Quick searching • Deterministic search results

Bumps that caused change • Legal • Centralized services were easy targets • Owners of index could not claim they had no knowledge of content • Growth • Cost of maintaining service grew • Hardware requirements exploded

Decentralized P2P • Content spread between users w/ no explicit intent • Centralized server is replaced by self-maintaining network • Every user is also a server • There is no index of content • How do we search?

Searching Decentralized P2P Systems • Many methods, none perfected yet • Broadcast search • Advantages • Every node takes part in query • Disadvantages • As system grows, network bandwidth, query time grow exponentially

Intelligent P2P Crawls • Ways to improve decentralized P2P query • Intelligently place data (FreeNet) • By knowing the algorithm that distributes data, querying can be done more intelligently • Clustering (Fireworks model) • Clients with similar properties are logically grouped • Queries that don’t apply to a group will not be sent to that entire group of clients • Both change the paradigm of what kind of data is shared and the means of sharing

Other improvements • Today, most networks still rely on brute-force-search • CRC/MD5 hashing • A checksum of each file is computed • Instead of searching metadata, search for file hash • Files that are identical, but mislabeled, are still returned

Query time limiting • Save on inter-system bandwidth, searches terminate after X hops • Client ends query after 100 results • Searches time out after X seconds

Distributed IR • Traditional IR with the advantages of distributed systems • A central server still stores the index • Multiple brokers allow access to the data repository • Multiple gatherers crawl data near to them • Advantages are seen in the data acquisition end

Examples

Future Directions • Next steps will be drastic re-thinking of content placement ala FreeNet • Donate X amount of bandwidth, Y amount of HD space • Share Z directories of content • Actual content files are distributed to the network intelligently • Most requested files are blanketed • Unique files are still accessible

Future directions for Traditional IR • Large central repositories such as Google will fade • Internet will be fragmented into clusters of interest • Similar interest groups will have decentralized search facilities • An index of these groups will replace the Google’s of today

Peer to Peer Information Retrieval

Peer to Peer Information Retrieval

Presentation Transcript

Peer to peer

Peer to Peer

Peer to Peer

Information Retrieval Techniques For Peer-To-Peer Networks

Coordinating Peer-to-Peer information sources

PIER: Peer-to-Peer Information Exchange and Retrieval

PIER ( Peer-to-Peer Information Exchange and Retrieval )

Peer-to-Peer

Recent Problems in Peer-to-peer Content Retrieval

PEER-TO-PEER

“Information Retrieval in Peer-to-Peer Systems”

Peer-to-Peer

Information Retrieval in Peer to Peer Systems

Exploiting locality for scalable information retrieval in peer-to-peer networks

Peer-to-Peer

Information Retrieval in Peer to Peer Systems

Peer to Peer

Peer-to-Peer Information Systems

Peer-to-Peer

Distributed Content Based Visual Information Retrieval System On Peer To Peer Network

Coordinating Peer-to-Peer information sources