10 likes | 76 Views
news. sports. science. radio. TV. print. volleyball. comp. sci. biology. database. networks. theory. systems. Components of the IDG System:. Topical group: top-level. data source - represents an information provider (e.g., a website).
E N D
news sports science radio TV print volleyball comp. sci. biology database networks theory systems Components of the IDG System: Topical group: top-level data source - represents an information provider (e.g., a website) manager - assigned a topic from the taxonomy; holds listings of data sources with that topic client- searches for managers that have listings of interesting data sources Manager: science Manager: sports Manager: news Cache Client ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... Topical group: science Topical group: news Topical group: sports topical group - groups together related managers cache- helps clients find popular managers quickly Manager: radio Manager: TV Manager: print Manager: vball Manager: comp. sci. Manager: biology .….….….… .….….….… .….….….… .….….….… .….….….… .….….….… How a query is answered: cache IDG Topical group: sports Topical group: top-level Topical group: top-level client issues query to its local cache managers periodically multicast heartbeat messages query: “TCP” Manager: science Manager: science cache searches internal memory, sending query to best matching IDG manager Manager: hockey Manager: golf Manager: tennis Client Topical group: science Topical group: science managers learn about other managers for failure recovery and to help forward queries Manager: biology Manager: comp. sci. Manager: biology Manager: comp. sci. IDG finds true best manager and responds to cache, who relays response to client Topical group: comp. sci. Manager: networks response: “networks” manager IDG directory adapts to changing topic popularity: science manager is overloaded with too many data sources new manager from pool of free managers is activated and assigned the topic physics Manager: science Manager: tennis Manager: science ...... ...... ...... ...... ...... ...... ...... ...... Topical group: science Topical group: science Other managers: hockey tennis Other managers: golf tennis Manager: comp. sci. Manager: biology Manager: physics Manager: comp. sci. Manager: biology data sources more specific to physics are moved to new manager .….… Topical group: comp. sci. Topical group: comp. sci. Manager: networks Manager: networks • Simulation configuration: • implemented using Parsec language • Excite search engine trace over 24 hours; approx. 2.5 million queries, 537,000 unique users (IP addresses) • queries hashed into a manually-built taxonomy based on Yahoo directory • to simulate data source registration, queries treated as data sources Multicast overhead% of total multicast with global scope Hierarchy stability# of data sources per manager Hierarchy overhead# of managers per group Query search time# of hops per query The Information Discovery Graph A Distributed Search Engine Framework • Goal: build a decentralized, distributed search engine framework • no single point of failure • not controlled by any one administration or corporation • Design challenge: • scalable, robust, and adaptable to changing topic popularity • Approach: • partition the search space by semantic topic using a hierarchical taxonomy • generic topics are higher up, more specific topics are lower I N T E R N E T R E S E A R C H L A B Maintaining the IDG directory: Associate topics with locations to reduce heartbeat bandwidth: managers use locally-scoped multicast to limit their heartbeats Los Angeles New York Topical group: sports faraway managers use a proxy to maintain a presence in the other scope Manager: tennis Manager: hockey Manager: golf Proxy: tennis Topical group: tennis a unicast channel connects the manager and proxy Manager: players Manager: lessons • Future work: • measure effects of enhancements: system-wide “Hot Topics” cache, cross-references, duplicate query detection • other trace data: UCLA traffic, more traces needed! • Summary: • IDG is framework for dencentralized, distributed search engine • semantic taxonomy provides intuitive browsing • design addresses scalability, adaptability, and robustness Nelson Tang (tang@cs.ucla.edu) and Lixia Zhang (lixia@cs.ucla.edu) http://irl.cs.ucla.edu/IDG/