1 / 17

P2P networks for distributed queries

P2P networks for distributed queries. Sebastian Lisek Infovide-Matrix S.A. (PL). Repository A. Repository B. Repository n. Index. Index. Index. Full-texts. Full-texts. Full-texts. Us. Context. M-CAST is a multilingual query-answering system M-CAST indexes full-texts and metadata

lowri
Download Presentation

P2P networks for distributed queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. P2P networks for distributed queries Sebastian Lisek Infovide-Matrix S.A. (PL)

  2. Repository A Repository B Repository n Index Index Index Full-texts Full-texts Full-texts Us Context • M-CAST is a multilingual query-answering system • M-CAST indexes full-texts and metadata • In one M-CAST instance resides many repositories • A user can query one or more repositories user M-CAST

  3. Multi M-CAST Motivation • Users requirements • Size of full-texts and indexes • Hundredths of GB • Rights to resources • Each peer wants to control its resources • Organizational • Existing „links” between libraries • Problem with one organization maintaining a central search server • Performance objectives • Less then 3 seconds for a query

  4. M-CAST D M-CAST n M-CAST A M-CAST B Search engine X M-CAST C Repository A.A Repository A.B Repository X.A Repository X.B Repository C.A Repository n.A Repository n.B Repository D.A Repository D.B Repository B.A Repository B.B Repository B.C Us M-CASTs search network Overview End users user M-CAST Network

  5. Discovering peers Initializing connection Querying Getting results Merging results Process

  6. Discovering peers Centralized P2P network • One central server with all peers registered • Advantages • Efficiency • Discovering all peers • Disadvantages • Organizational – maintaining central server • Single point of failure M-CAST A M-CAST B register get list register Central peers repository register register M-CAST C M-CAST n

  7. Each peer knows several neighbors Discovering peers by passing request from peer to peer Advantages Lack of single point of failure Decentralized responsibility Using existing „links” between libraries Disadvantages Longer searching time Possibility of not discovering all peers  Implemented in M-CAST Discovering peers Decentralized P2P network M-CAST B M-CAST C get list get list M-CAST D get list M-CAST A M-CAST F M-CAST G get list get list get list get list M-CAST E M-CAST n M-CAST I get list M-CAST J get list get list M-CAST H

  8. Mediation connection parameters between two peers Supporting languages Supporting formats QoS List of repositories Initializing connection M-CAST A mediation connection parameters list of repositories M-CAST B

  9. By the user A user selects repositories before making a query Advantages Simplicity Disadvantages Unscalability – how to handle hundreds of repositories?  Implemented in M-CAST Automatically System selects most promising repositories based on metadata describing them Advantages Scalability Disadvantages Possibility of omitting valuable repositories Querying Repositories selection

  10. Star Advantages response time to a user Acyclic Full control of the process of getting results Disadvantages Possibility of not discovering all answers More CPU/Memory/Network consumption on the requester side  Implemented in M-CAST Us Querying Topology user query results M-CAST B M-CAST C query results query results M-CAST A query results query results M-CAST D M-CAST n

  11. Getting results • Objectives • XML based • Based on open standards • Format for a result list • Metadata - Qualified DublinCore • Snippet/Exact answer(s) – new definition supporting marked words

  12. Automatically System ordered results from different resources Advantages One ordered list presented to the user Disadvantages: Problem with ranks from different repositories No merging System presents all results to a user Advantages Simplicity A user decides what information is valuable Disadvantages Unscalability – problem with 100 repositories  Implemented in M-CAST Merging results

  13. Combining results Example

  14. Objectives Webservice based Based on the open standards Simplicity M-CAST implementation Webservices Finding peers Initializing Simple Query Advance Query Possible protocols Z39.50 Client-server protocol Pre-Web technology OAI-PMH Designing for metadata harvesting SRU/SRW: Search/Retrieve via URL/Web Service CQL language – Common Query Language New context definition for query-answering systems Dienst HTTP based protocol ... Technology

  15. Aleph catalogue PBI Kramerius Library catalogue Memoria Index Index Index Index Index Full-texts Full-texts Full-texts Full-texts Full-texts Tested network M-CAST Prague M-CAST Toruń

  16. Future work • Automatically selecting repositories • Combining results from different repositories in one list • Support for protocols • SRU/SRW: Search/Retrieve via URL/Web Service • Z39.50 • New P2P backbone

  17. Thank you! Sebastian Lisek Infovide-Matrix S.A. (PL) Information: m-cast@infovide.pl

More Related