1 / 1

COSTA: Adaptive Indexing for Terms in a Large-scale Distributed System

COSTA: Adaptive Indexing for Terms in a Large-scale Distributed System. # $. #. *. $. Aoying Zhou , Rong Zhang , Quang Hieu Vu , Weining Qian. # Department of Computer Science and Engineering, Fudan University, Shanghai, China; { ayzhou,rongzh }@ fudan.edu.cn

lena
Download Presentation

COSTA: Adaptive Indexing for Terms in a Large-scale Distributed System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. COSTA: Adaptive Indexing for Terms in a Large-scale Distributed System # $ # * $ Aoying Zhou , Rong Zhang , Quang Hieu Vu , Weining Qian # Department of Computer Science and Engineering, Fudan University, Shanghai, China; {ayzhou,rongzh}@fudan.edu.cn * ¤Singapore MIT Alliance, National University of Singapore, Singapore; hieuvq@nus.edu.sg $ Software Engineering Institute, East China Normal University, Shanghai, China; {ayzhou,wnqian}@sei.ecnu.edu.cn Our Method: Motivation: Existing methods supporting content-based search in P2P systems use either servers or super-peers to maintain global statistic. These methods are not scalable and are prone to the bottleneck problem. • COSTA (COntent-based Search using Term Aggregation) is based on an adaptive indexing structure that combines a Chord ring and a balanced tree. • The tree built on top of the Chord ring is used to aggregate and classify terms adaptively. • The Chord ring is used to index terms of tree nodes. • At each tree node, terms are classified to • Important terms: can distinguish a node from its neighbor nodes. These terms are indexed directly to the Chord ring. • Unimportant terms: are either popular or rare terms. They are aggregated to higher level nodes. System Architecture Term Indexing • Classification: Standard IR method • Documents: term vectors in Cartesian space. • Weight of a term: • Aggregation: • Each leaf node summaries its shared documents to a summary vector and sends this vector to its parent. • Each internal node calculates weights of terms of its children from which terms are classified. • Important terms are indexed to the Chord ring. • Others are aggregated to a summary vector sent to the parent node. Node Structure *Term index: term, its local frequency, its local inverse document frequency and information of the publisher node Query Processing Algorithm: Query format: Q= keyword1, …,keywordn • Look up indices of queried terms from the Chord ring. • Based on indices, compute the similarity score of each tree node involving Q. • Rank these tree nodes according to their similarity scores and select top-K of them for query processing. • * Query distribution quota is used to control the number of queries a node can send to its descendant leaf nodes. Conclusion: By adaptively aggregating and classifying terms along a hierarchical tree structure, COSTA is able to support content-based search well without using servers or super-peers for maintaining global statistic. Reference: http://homepage.fudan.edu.cn/~wnqian/Costa.pdf

More Related