80 likes | 201 Views
Supporting Ranked Search in Parallel Search Cluster Networks. Fang Xiong Qiong Luo Dyce Jing Zhao {xfang, luo, zhaojing}@cs.ust.hk Hong Kong University of Science and Technology. Introduction. Environment: P2P Unstructured, super-peer, Parallel Search Cluster Network (PSCN) Task: search
E N D
Supporting Ranked Search in Parallel Search ClusterNetworks Fang XiongQiong LuoDyce Jing Zhao {xfang, luo, zhaojing}@cs.ust.hk Hong Kong University of Science and Technology
Introduction • Environment: P2P • Unstructured, super-peer, Parallel Search Cluster Network (PSCN) • Task: search • Data object ID • filename • Content: ranked keyword search • Previous work on ranked search in P2P • PlanetP: in the unstructured P2P network • Shen et al.: in the super-peer network
Both are unstructured P2P networks FSL: Forwarding Search Link NIL: Non-forwarding Index Link FIL: Forwarding Index Link
The Process of Ranked Search in a PSCN • Indexing time • Build the local indexes • Transmit the local indexes across clusters through NILs • Querying time • Forward the query within a cluster through FSLs • Collect the Local Aggregate Information (LAI), and merge into the Global Aggregate Information (GAI) • The document-level index vs. the peer-level index • In the case of using the document-level index, additional steps include: calculating the Local Ranking (LR), and merging into the Global Ranking (GR) • In the case of using the peer-level index, additional steps include: calculating the Local Peer Ranking (LPR), merging into the Global Peer Ranking (GPR), calculating the Local Document Ranking (LDR) and merging into the Global Document Ranking (GDR) • Merge the locally rankedquery results into globally ranked ones and return allor top-K of them to the user
Average processingtime spenton each step, using the document-level index • The majority of processing time is spent on local processing • This suggests that it is necessary to distribute the search workload evenly over multiple peers Average processing time spenton each step, using the peer-level index
Average processing time inthree overlays, using the document-level index • The processing time in the unstructured network is much larger than in the other two • The processing time in the super-peer network is about 30% larger than that in the PSCN • The processing time is slightly more when using the document-level index than that using the peer-level index Average processing time inthree overlays, using the peer-level index
22~31%less 41~47% less 4.5~7.7 times higher 22~25% lower
Summary • The majority of processing time is spent on local processing. Therefore, it is beneficial todistribute the search workload over peers; otherwise, thebottleneck will be at the super-peers in a super-peer networkor at the querying peer in an unstructured network. • The processing time and the storage cost perpeer in a PSCN is the lowest among the three overlays. • The downside of a PSCN is the flooding communicationwithin a cluster and the index replication costacross clusters. The super-peer network wins on the networkbandwidth usage and the total storage cost. • Compared with document-levelindexes, peer-level indexes save 70% of the processing time,30% of the network bandwidth usage and 30% of the storagespace, with a slight decrease in precision.