170 likes | 314 Views
P2P networks for distributed queries. Sebastian Lisek Infovide-Matrix S.A. (PL). Repository A. Repository B. Repository n. Index. Index. Index. Full-texts. Full-texts. Full-texts. Us. Context. M-CAST is a multilingual query-answering system M-CAST indexes full-texts and metadata
E N D
P2P networks for distributed queries Sebastian Lisek Infovide-Matrix S.A. (PL)
Repository A Repository B Repository n Index Index Index Full-texts Full-texts Full-texts Us Context • M-CAST is a multilingual query-answering system • M-CAST indexes full-texts and metadata • In one M-CAST instance resides many repositories • A user can query one or more repositories user M-CAST
Multi M-CAST Motivation • Users requirements • Size of full-texts and indexes • Hundredths of GB • Rights to resources • Each peer wants to control its resources • Organizational • Existing „links” between libraries • Problem with one organization maintaining a central search server • Performance objectives • Less then 3 seconds for a query
M-CAST D M-CAST n M-CAST A M-CAST B Search engine X M-CAST C Repository A.A Repository A.B Repository X.A Repository X.B Repository C.A Repository n.A Repository n.B Repository D.A Repository D.B Repository B.A Repository B.B Repository B.C Us M-CASTs search network Overview End users user M-CAST Network
Discovering peers Initializing connection Querying Getting results Merging results Process
Discovering peers Centralized P2P network • One central server with all peers registered • Advantages • Efficiency • Discovering all peers • Disadvantages • Organizational – maintaining central server • Single point of failure M-CAST A M-CAST B register get list register Central peers repository register register M-CAST C M-CAST n
Each peer knows several neighbors Discovering peers by passing request from peer to peer Advantages Lack of single point of failure Decentralized responsibility Using existing „links” between libraries Disadvantages Longer searching time Possibility of not discovering all peers Implemented in M-CAST Discovering peers Decentralized P2P network M-CAST B M-CAST C get list get list M-CAST D get list M-CAST A M-CAST F M-CAST G get list get list get list get list M-CAST E M-CAST n M-CAST I get list M-CAST J get list get list M-CAST H
Mediation connection parameters between two peers Supporting languages Supporting formats QoS List of repositories Initializing connection M-CAST A mediation connection parameters list of repositories M-CAST B
By the user A user selects repositories before making a query Advantages Simplicity Disadvantages Unscalability – how to handle hundreds of repositories? Implemented in M-CAST Automatically System selects most promising repositories based on metadata describing them Advantages Scalability Disadvantages Possibility of omitting valuable repositories Querying Repositories selection
Star Advantages response time to a user Acyclic Full control of the process of getting results Disadvantages Possibility of not discovering all answers More CPU/Memory/Network consumption on the requester side Implemented in M-CAST Us Querying Topology user query results M-CAST B M-CAST C query results query results M-CAST A query results query results M-CAST D M-CAST n
Getting results • Objectives • XML based • Based on open standards • Format for a result list • Metadata - Qualified DublinCore • Snippet/Exact answer(s) – new definition supporting marked words
Automatically System ordered results from different resources Advantages One ordered list presented to the user Disadvantages: Problem with ranks from different repositories No merging System presents all results to a user Advantages Simplicity A user decides what information is valuable Disadvantages Unscalability – problem with 100 repositories Implemented in M-CAST Merging results
Combining results Example
Objectives Webservice based Based on the open standards Simplicity M-CAST implementation Webservices Finding peers Initializing Simple Query Advance Query Possible protocols Z39.50 Client-server protocol Pre-Web technology OAI-PMH Designing for metadata harvesting SRU/SRW: Search/Retrieve via URL/Web Service CQL language – Common Query Language New context definition for query-answering systems Dienst HTTP based protocol ... Technology
Aleph catalogue PBI Kramerius Library catalogue Memoria Index Index Index Index Index Full-texts Full-texts Full-texts Full-texts Full-texts Tested network M-CAST Prague M-CAST Toruń
Future work • Automatically selecting repositories • Combining results from different repositories in one list • Support for protocols • SRU/SRW: Search/Retrieve via URL/Web Service • Z39.50 • New P2P backbone
Thank you! Sebastian Lisek Infovide-Matrix S.A. (PL) Information: m-cast@infovide.pl