Supporting Ranked Search in Parallel Search Cluster Networks

Supporting Ranked Search in Parallel Search ClusterNetworks Fang XiongQiong LuoDyce Jing Zhao {xfang, luo, zhaojing}@cs.ust.hk Hong Kong University of Science and Technology

Introduction • Environment: P2P • Unstructured, super-peer, Parallel Search Cluster Network (PSCN) • Task: search • Data object ID • filename • Content: ranked keyword search • Previous work on ranked search in P2P • PlanetP: in the unstructured P2P network • Shen et al.: in the super-peer network

Both are unstructured P2P networks FSL: Forwarding Search Link NIL: Non-forwarding Index Link FIL: Forwarding Index Link

The Process of Ranked Search in a PSCN • Indexing time • Build the local indexes • Transmit the local indexes across clusters through NILs • Querying time • Forward the query within a cluster through FSLs • Collect the Local Aggregate Information (LAI), and merge into the Global Aggregate Information (GAI) • The document-level index vs. the peer-level index • In the case of using the document-level index, additional steps include: calculating the Local Ranking (LR), and merging into the Global Ranking (GR) • In the case of using the peer-level index, additional steps include: calculating the Local Peer Ranking (LPR), merging into the Global Peer Ranking (GPR), calculating the Local Document Ranking (LDR) and merging into the Global Document Ranking (GDR) • Merge the locally rankedquery results into globally ranked ones and return allor top-K of them to the user

Average processingtime spenton each step, using the document-level index • The majority of processing time is spent on local processing • This suggests that it is necessary to distribute the search workload evenly over multiple peers Average processing time spenton each step, using the peer-level index

Average processing time inthree overlays, using the document-level index • The processing time in the unstructured network is much larger than in the other two • The processing time in the super-peer network is about 30% larger than that in the PSCN • The processing time is slightly more when using the document-level index than that using the peer-level index Average processing time inthree overlays, using the peer-level index

22~31%less 41~47% less 4.5~7.7 times higher 22~25% lower

Summary • The majority of processing time is spent on local processing. Therefore, it is beneficial todistribute the search workload over peers; otherwise, thebottleneck will be at the super-peers in a super-peer networkor at the querying peer in an unstructured network. • The processing time and the storage cost perpeer in a PSCN is the lowest among the three overlays. • The downside of a PSCN is the flooding communicationwithin a cluster and the index replication costacross clusters. The super-peer network wins on the networkbandwidth usage and the total storage cost. • Compared with document-levelindexes, peer-level indexes save 70% of the processing time,30% of the network bandwidth usage and 30% of the storagespace, with a slight decrease in precision.

Supporting Ranked Search in Parallel Search Cluster Networks

Supporting Ranked Search in Parallel Search Cluster Networks

Presentation Transcript

XRANK: Ranked Keyword Search Over XML Documents

In Search of a Canadian Space Cluster

Fast Parallel Similarity Search in Multimedia Databases

High redshift cluster search (SA22 field)

SI 614 Search in structured networks

Learning to Cluster Web Search Results

Identity and search in social networks

Chapter 3 Parallel Search

Lecture 18: Search in structured networks

Improving Search in P2P Networks

Search in Distributed Networks

XRANK: Ranked Keyword Search over XML Documents

Improving Search in P2P Networks

Learning to Cluster Web Search Results.

Search In Small World Networks

Search in structured networks

V5 cluster search

Parallel Search Algorithm

Search in Unstructured Networks

Parallel models of STM search

Chapter 3 Parallel Search

Parallel Sort, Search, Graph Algorithms