1 / 1

Goal: build a decentralized, distributed search engine framework no single point of failure

news. sports. science. radio. TV. print. volleyball. comp. sci. biology. database. networks. theory. systems. Components of the IDG System:. Topical group: top-level. data source - represents an information provider (e.g., a website).

darin
Download Presentation

Goal: build a decentralized, distributed search engine framework no single point of failure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. news sports science radio TV print volleyball comp. sci. biology database networks theory systems Components of the IDG System: Topical group: top-level data source - represents an information provider (e.g., a website) manager - assigned a topic from the taxonomy; holds listings of data sources with that topic client- searches for managers that have listings of interesting data sources Manager: science Manager: sports Manager: news Cache Client ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... Topical group: science Topical group: news Topical group: sports topical group - groups together related managers cache- helps clients find popular managers quickly Manager: radio Manager: TV Manager: print Manager: vball Manager: comp. sci. Manager: biology .….….….… .….….….… .….….….… .….….….… .….….….… .….….….… How a query is answered: cache IDG Topical group: sports Topical group: top-level Topical group: top-level client issues query to its local cache managers periodically multicast heartbeat messages query: “TCP” Manager: science Manager: science cache searches internal memory, sending query to best matching IDG manager Manager: hockey Manager: golf Manager: tennis Client Topical group: science Topical group: science managers learn about other managers for failure recovery and to help forward queries Manager: biology Manager: comp. sci. Manager: biology Manager: comp. sci. IDG finds true best manager and responds to cache, who relays response to client Topical group: comp. sci. Manager: networks response: “networks” manager IDG directory adapts to changing topic popularity: science manager is overloaded with too many data sources new manager from pool of free managers is activated and assigned the topic physics Manager: science Manager: tennis Manager: science ...... ...... ...... ...... ...... ...... ...... ...... Topical group: science Topical group: science Other managers: hockey tennis Other managers: golf tennis Manager: comp. sci. Manager: biology Manager: physics Manager: comp. sci. Manager: biology data sources more specific to physics are moved to new manager .….… Topical group: comp. sci. Topical group: comp. sci. Manager: networks Manager: networks • Simulation configuration: • implemented using Parsec language • Excite search engine trace over 24 hours; approx. 2.5 million queries, 537,000 unique users (IP addresses) • queries hashed into a manually-built taxonomy based on Yahoo directory • to simulate data source registration, queries treated as data sources Multicast overhead% of total multicast with global scope Hierarchy stability# of data sources per manager Hierarchy overhead# of managers per group Query search time# of hops per query The Information Discovery Graph A Distributed Search Engine Framework • Goal: build a decentralized, distributed search engine framework • no single point of failure • not controlled by any one administration or corporation • Design challenge: • scalable, robust, and adaptable to changing topic popularity • Approach: • partition the search space by semantic topic using a hierarchical taxonomy • generic topics are higher up, more specific topics are lower I N T E R N E T R E S E A R C H L A B Maintaining the IDG directory: Associate topics with locations to reduce heartbeat bandwidth: managers use locally-scoped multicast to limit their heartbeats Los Angeles New York Topical group: sports faraway managers use a proxy to maintain a presence in the other scope Manager: tennis Manager: hockey Manager: golf Proxy: tennis Topical group: tennis a unicast channel connects the manager and proxy Manager: players Manager: lessons • Future work: • measure effects of enhancements: system-wide “Hot Topics” cache, cross-references, duplicate query detection • other trace data: UCLA traffic, more traces needed! • Summary: • IDG is framework for dencentralized, distributed search engine • semantic taxonomy provides intuitive browsing • design addresses scalability, adaptability, and robustness Nelson Tang (tang@cs.ucla.edu) and Lixia Zhang (lixia@cs.ucla.edu) http://irl.cs.ucla.edu/IDG/

More Related