1 / 8

Supporting Ranked Search in Parallel Search Cluster Networks

Supporting Ranked Search in Parallel Search Cluster Networks. Fang Xiong Qiong Luo Dyce Jing Zhao {xfang, luo, zhaojing}@cs.ust.hk Hong Kong University of Science and Technology. Introduction. Environment: P2P Unstructured, super-peer, Parallel Search Cluster Network (PSCN) Task: search

trudy
Download Presentation

Supporting Ranked Search in Parallel Search Cluster Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Supporting Ranked Search in Parallel Search ClusterNetworks Fang XiongQiong LuoDyce Jing Zhao {xfang, luo, zhaojing}@cs.ust.hk Hong Kong University of Science and Technology

  2. Introduction • Environment: P2P • Unstructured, super-peer, Parallel Search Cluster Network (PSCN) • Task: search • Data object ID • filename • Content: ranked keyword search • Previous work on ranked search in P2P • PlanetP: in the unstructured P2P network • Shen et al.: in the super-peer network

  3. Both are unstructured P2P networks FSL: Forwarding Search Link NIL: Non-forwarding Index Link FIL: Forwarding Index Link

  4. The Process of Ranked Search in a PSCN • Indexing time • Build the local indexes • Transmit the local indexes across clusters through NILs • Querying time • Forward the query within a cluster through FSLs • Collect the Local Aggregate Information (LAI), and merge into the Global Aggregate Information (GAI) • The document-level index vs. the peer-level index • In the case of using the document-level index, additional steps include: calculating the Local Ranking (LR), and merging into the Global Ranking (GR) • In the case of using the peer-level index, additional steps include: calculating the Local Peer Ranking (LPR), merging into the Global Peer Ranking (GPR), calculating the Local Document Ranking (LDR) and merging into the Global Document Ranking (GDR) • Merge the locally rankedquery results into globally ranked ones and return allor top-K of them to the user

  5. Average processingtime spenton each step, using the document-level index • The majority of processing time is spent on local processing • This suggests that it is necessary to distribute the search workload evenly over multiple peers Average processing time spenton each step, using the peer-level index

  6. Average processing time inthree overlays, using the document-level index • The processing time in the unstructured network is much larger than in the other two • The processing time in the super-peer network is about 30% larger than that in the PSCN • The processing time is slightly more when using the document-level index than that using the peer-level index Average processing time inthree overlays, using the peer-level index

  7. 22~31%less 41~47% less 4.5~7.7 times higher 22~25% lower

  8. Summary • The majority of processing time is spent on local processing. Therefore, it is beneficial todistribute the search workload over peers; otherwise, thebottleneck will be at the super-peers in a super-peer networkor at the querying peer in an unstructured network. • The processing time and the storage cost perpeer in a PSCN is the lowest among the three overlays. • The downside of a PSCN is the flooding communicationwithin a cluster and the index replication costacross clusters. The super-peer network wins on the networkbandwidth usage and the total storage cost. • Compared with document-levelindexes, peer-level indexes save 70% of the processing time,30% of the network bandwidth usage and 30% of the storagespace, with a slight decrease in precision.

More Related