Progressive Distributed Top-k Retrieval in Peer-to-Peer Networks

Progressive Distributed Top-k Retrieval in Peer-to-Peer Networks Wolf-Tilo Balke, Wolfgang Nejdl Proceedings of the 21st International Conference on Data Engineering (ICDE 2005)

1.Introduction 2. Top-k Retrieval Model and Related Work 3. A Distributed Top-k Retrieval Algorithm for Peer-to-Peer Networks 4. Correctness and Optimality Results 5. Simulation and Results 6. Summary and Outlook

Introduction • 以計分方式來定最matching的檢索結果。 • 在peer to peer 網路上延伸傳統查詢，考慮以最小的目標資料流來做分散處理。 • 以HyperCuP topology 來組織 super-peer 的骨幹模式。

Top-k Retrieval Model and Related Work • 考慮不同的peer有不同的能力。 • 通常不同peer在使用頻寬或運算能力會有非常大的差異性，利用這些具不同能力的peer，可以導致更有效率的網路結構。 • Super-peer： • 控管特定的任務，例如：query routing。 • 整個網路中，只有少數百分比的node是super-peer。 • super-peer具有較高的運算能力可被運用。 • Peers join：直接連結到其中一個super-peer。

Top-k Retrieval Model and Related Work (cont.) • Super-peer以P2P的基本架構為基礎，通常利用兩階段的路線結構來發送查詢訊息（query）。 • 先在super-peer之間的骨幹上傳送query，然後再由super-peer分散傳送到與它連結的各個peer。 peer SP SP peer 第一段第二段

Top-k Retrieval Model and Related Work (cont.) • 如何在網路上安排super-peers的分佈，以達到最佳化的搜尋路線？ • 以超立方體的實體網路架構型態（Hypercube topology）組織一P2P的網路結構。 • 有N個super-peers，則Hypercube topology之最長路徑為Log2N。

Hypercube topology的介紹 • 超立方體(Hypercube)在數學上的定義，是一種”對稱”概念的架構，它的頂點 (Node)和邊(Link)都是由最初的單一頂點”長”出來的，超立方體的維度(Dimension)代表和每個頂點相鄰的頂點數目。 • 當超立方體隨著維度增加的時候，它的頂點和邊會呈對稱性地成長，下圖說明了此種成長的方式：

Dimension = 1；node=2 Dimension = 0；node=1 Dimension = 2；node=4 Dimension = 3；node=23=8

dimension為n時，每個node會有n個相鄰的頂點，hypercube中的dimension為n時，每個node會有n個相鄰的頂點，hypercube中的 • node數將是2n • link數將是 ( dimension=d-1時的Link數 )*2 + 2n-1 • 由於在peer-to-peer網路中每個super-peer的地位都是平等的，hypercube的架構可以讓網路中的每個super-peer都能當做展開樹(spanning tree)的根部(root)，從root發送訊息給網路上任何一點。

優點就是當網路做廣播時，可以保證每個點不會收到重複的訊息，而網路上若是有N個點的話，網路上總訊息數只要N-1個就可以通知到網路上每一個點。優點就是當網路做廣播時，可以保證每個點不會收到重複的訊息，而網路上若是有N個點的話，網路上總訊息數只要N-1個就可以通知到網路上每一個點。 • 有N個super-peers，則Hypercube topology之最長路徑為Log2N。 • 如下圖：在3維的立方體架構中，最多只要3個Link即可以到達距本身最遠的node。

6 6 0 7 7 2 1 2 3 3 2 2 0 1 4 0 5 4 5 1 1 2 2 8 1 1 8 0

A Distributed Top-k Retrieval Algorithmfor Peer-to-Peer Networks • 在P2P網路中以最少的目標資料流量來進行 top-k 查詢功能。 • 依據P2P 網路分散檢索的性質分為三部份： • super-peer 是最初接收query訊息的 • super-peer 在HypercuP的骨幹上 • local peer 朝著各自的super-peer

接收query的super-peer (i.e. the root node of our implicit HyperCuP spanning tree)才需要全部的資訊。 • Super-peer沿著與他相鄰的super-peer的backbone 傳遞 query 到適當的super-peers,依次往前傳送query到與他相鄰的super-peer和他連結的local peers上，而不必要有全部的資訊。

The local peers will just execute the query over their local object collections or databases and retrieve some best matching objects. • Each super-peer SP manages an index ISP in that information about which of its local peers and adjacent super-peers contributed results for answering recently posed queries. • All index are time-stamped and expire after a certain time.

The individual index entries can be kept “up-to-date enough” to allow for improved query processing even in volatile networks • generally speaking the number k of objects to be returned is an integral part of the query Q • if index ISP does not contain query Q, but query Q’ with the same query predicates, but a larger number of objects to return than k. The resulting sets PT and SPT will in that case not be optimal for query Q, but usually still result in much better performance than simply flooding the query through the network.

Example： • SPA as root of super-peer backbone spanning tree. • Assume the qury is a top 2 query that has recently been posed. • SPA check its local index to find out that only P1 and SPB have contributed to the result. • Top 2 results came from SPD and P4 • Assume that P1 offers an object o1 with score 0.8. • Assume P7 delivers an object o2 with score 0.7. • P4 object o3 with score 0.9 • SPB pass on max object o3 to SPA • SPA chooses max o3 and passes it on to PQ • another object from P1(o1 with score 0.8)

P3 SPC P2 執行本身的local query P5 P4 P6 P7 P1 SPB SPA SPD query PQ PQ Look up its index

Correctness and Optimality Results • 在P2P網路中，要達到完全正確的檢索結果是件困難的事（因為其網路的易變性質）。 • 因此在P2P的搜尋中，一個檢索結果的正確性，通常考慮” static snapshot ”的P2P網路。 • 在估算query與傳送結果期間 no peers drop out no new peer injoin

Simulation and Results • Network size between 100 and 2000 peers • Connected via 2 to 16 super-peers • each holding 50 documents on average

Summary and Outlook • 在此作者為了要達到可以在很大規模的peer-to-peer 網路中檢索，僅以動態的收集query的統計量為基礎。 • 而且為了要證明檢索的結果正確，所以並沒有針對peer的離開與加入即時的做很詳盡的Index更新。

Progressive Distributed Top-k Retrieval in Peer-to-Peer Networks

Progressive Distributed Top-k Retrieval in Peer-to-Peer Networks

Presentation Transcript

Peer-To-Peer Networks

Peer-to-Peer Networks

Peer-to-Peer Networks

Information Retrieval Techniques For Peer-To-Peer Networks

Peer To Peer Distributed Systems

Peer-to-peer networks

Peer to Peer Networks

Peer to peer networks Distributed innovation

“Information Retrieval in Peer-to-Peer Systems”

Peer-to-Peer Networks

Information Retrieval in Peer to Peer Systems

Distributed Classification in Peer-to-Peer Networks

Streaming in Peer-to-peer Networks

Peer-to-Peer Networks

Peer-to-peer networks

Information Retrieval in Peer to Peer Systems

Peer-to-Peer Distributed Search

Peer-to-Peer Networks

Peer to Peer Information Retrieval

Peer-to-Peer Networks

Peer-to-peer networks