1 / 21

ESTEEM: Trust-aware P2P data integration

ESTEEM: Trust-aware P2P data integration. Carola Aiello,Tiziana Catarci, Diego Milano, Monica Scannapieco Dipartimento di Informatica e Sistemistica Università di Roma “La Sapienza”. Outline. Progetti precedenti Obiettivi ESTEEM Problematiche e direzioni di ricerca dell’unità

hisoki
Download Presentation

ESTEEM: Trust-aware P2P data integration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ESTEEM: Trust-aware P2P data integration Carola Aiello,Tiziana Catarci, Diego Milano, Monica Scannapieco Dipartimento di Informatica e Sistemistica Università di Roma “La Sapienza”

  2. Outline • Progetti precedenti • Obiettivi ESTEEM • Problematiche e direzioni di ricerca dell’unità • Data quality: Quality-aware query processing • Privacy: Privacy-aware record matching • Trust: Modello di trust per le sorgenti

  3. DaQuinCIS project (2003) • MIUR – COFIN/PRIN • Main focus: data quality in cooperative information systems (CISs) • Data Quality Problems: • Record Matching • Quality-driven query processing

  4. Motivations • A real example: e-Goverment project to integrate data about Italian companies Query Company XYZ ? DATA INTEGRATION LAYER Chambers of Commerce Social Insurance Agency Accident Insurance Agency

  5. Address Id Name Type of activity City Social Insurance Agency Accident Insurance Agency Chambers of Commerce

  6. The Three Real Records • Which is the actual company XYZ to be returned to the client ? • One of 3 ? Which ? • A “merge” of the 3 ?

  7. Objectives of the Research • Given a set of distributed and heterogeneous data sources that are affected by data quality problems • Improving the quality of each data source • Record matching across sources • Provide a unified and trasparent access to data sources • Data Integration & Quality-driven query processing

  8. Improving quality of addresses in Italian PA (2004) • Accordo di collaborazione AIPA (ora CNIPA) e ISTAT Aprile 2002-Luglio 2004 • Proposta di formati standard per l’acquisizione e l’interscambio degli indirizzi • Proposta di ridisegno dei flussi per l’aggiornamento degli indirizzi • Metodologia per lamisurazione della qualità degli indirizzi • Misurazione sperimentale della qualità degli indirizzi in tre archivi nazionali: • Agenzia delle Entrate • Camere di Commercio • INPS

  9. Data Quality and Data Privacy (Current) • Joint Activity with University of Purdue, Indiana USA • Publishing elementary data may violate privacy requirements, even when data are anonymized • anonymization removes principal identifiers like SSN, Name+Surname+DOB, etc. • Record matching privacy aware • only the result of the intersection (AB) across data sets are shared and nothing else (not A-AB and not B-AB)

  10. Obiettivi ESTEEM • Studio di problematiche di trust e qualità dei dati in sistemi P2P • Specifica di sistemi di integrazione dati P2P con requisiti di trust • Definizione di algoritmi di query processing quality- and trust-aware

  11. P2P Systems • P2P systems • loosely coupled, dynamic, open • Data sharing in such systems • no centralized global schema • peers mapping dynamically build • new peers can make available new data schema

  12. Data Quality Attribute conflict EmployeeS1 Key conflict EmployeeS2

  13. Quality-aware query processing - 1 • Key conflicts require the application of Record Matching techniques • Attribute conflicts are solved by query time Conflict Resolution Techniques • The resolution of such conflicts in P2P systems is an open issue: • Definition of a quality-aware semantics for query answering in P2P systems • Need to develop techniques for solving such conflicts according to the defined semantics

  14. Quality-aware query processing - 2 • Query language supporting the specification of conflict resolution strategies • Important in P2P systems: research space pruning on the basis of quality characterization of sources

  15. Privacy • How to protect privacy when sharing data? • With the source S1 and S2 issuing the Queries Q1 and Q2 respectively, at the end of the interaction • S1 must learn result Q1 and nothing else • S2 must learn result Q2 and nothing else S1 S2 Query Q1 Result Q1 Query Q2 Result Q2

  16. Privacy-aware Record Matching - 1 • Secure set intersection: (i) matching esatto; (ii) non di record; (iii) costosi • Private data sharing: (i) matching esatto; (ii) schema un-aware A B AB

  17. Privacy-aware Query Processing - 2 • Algoritmi che consentano di fare privacy aware record matching in contesti P2P • Problema della third party • Prime proposte ElAbbadi ICDE 2006 ma matching esatto

  18. Trust • Trust typically associated to a source as a whole • Need for finer level characterization • Eg: Ministero delle Finanze affidabile rispetto ai Codici Fiscali

  19. Modello di Trust per le sorgenti dati -1 • Previous proposals: the whole organization (peer) • Our proposal: <Organization, Data Type> # of <D, Orgk> complaints sent by Orgi # of D-exchanges of Orgk

  20. Modello di Trust per le sorgenti dati - 2 • Drawback: Centralized • Need for: • Decentralized • More flexible model (e.g. trust associated to views)

  21. Modello di Trust per le sorgenti dati - 3 • More general trust characterization based on the evaluation of a peer’s assertion on some metadata: • Data quality-aware: trust computed on the basis of the declared quality of provided data • Privacy-aware: trust computed on the basis of the declared privacy level • different roles for providers and consumers: e.g. a provider can decide not to release data if a requester is not privacy - trusted (or to adopt specific technique)

More Related