1 / 17

iFuice – Information Fusion utilizing Instance Correspondences and Peer Mappings

iFuice – Information Fusion utilizing Instance Correspondences and Peer Mappings. Erhard Rahm, Andreas Thor , David Aumueller, Hong-Hai Do, Nick Golovin, Toralf Kirsten University of Leipzig, Germany http://dbs.uni-leipzig.de. Who published at SIGMOD as a PC member?. Eventseer.

Download Presentation

iFuice – Information Fusion utilizing Instance Correspondences and Peer Mappings

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. iFuice – Information Fusion utilizing Instance Correspondences and Peer Mappings Erhard Rahm, Andreas Thor, David Aumueller, Hong-Hai Do, Nick Golovin, Toralf Kirsten University of Leipzig, Germany http://dbs.uni-leipzig.de

  2. Who published at SIGMOD as a PC member? Eventseer Who referenced publications of my favorite authors? Local file Who are the candidates for the SIGMOD test of time award? Google Scholar SwissProt PubMed What information system is used to support biological cancer anlaysis? MIM • Additional relationships / attributes (Eventseer, Google Scholar) • Hand-picked private data (local file) • Sources from different domains (SwissProt, MIM) Motivating scenario • Integrating ... ACM Citeseer DBLP

  3. Schema vs. instance based integration • Data integration using query mediator approach • Mediated (global) schema • Matching / views between global and local schemas • Problems • Construction/evolution of global schema • Sources without or semi-structured schema • Heterogeneous/dirty data, mapping to artificial schema • Instance correspondences • Represent semantic relationships between instances • Allow integration of sources without schema • Can be inferred by weblinks

  4. iFuice approach • Information Fusion utilizing Instance Correspondences and Peer Mappings • Bottom up integration • High-level operators • Generic way to dynamic information fusion • Mediator • Controls mapping / operator execution • Utilizes a domain model • P2P-like infrastructure • Correspondences between autonomous data sources • Easy link-up of a new source "where it fits best"

  5. Agenda • Motivation & iFuice approach • Meta data model • Operators • iFuice scripts • Architecture • Summary & outlook

  6. Author Conference Publication Publication DBLP Name: Generic schema matching with Cupid URL: http://vldb.org... Conference: VLDB 2001 Authors: Jayant Madhavan, Philip A. Bernstein, Erhard Rahm Data sources • Physical data source (PDS) • Web data (DBLP), local data (files), ... • Splitted in logical data sources • Logical data source (LDS) • Refers to one object type • Contains object instances • Object instance • Refers to real world entity • Set of attributes • One attribute is id DBLP

  7. Mappings • Directed relationship between LDS • Meta data: meaning of the mapping • Semantic mapping type • e.g., "publications of author" • Same mappings vs. association mappings • same = "equality" relationship between PDS • e.g., DBLP publication (id)  ACM publication (id) • Id mappings vs. query mappings • Instance data: instance correspondences • Materialized: mapping tables • On-the-fly: execution result (e.g., from web service)

  8. ACM DBLP Author Author AuthPub Auhor CoAuthor Publication PubAuth extract Publication Publication Google Scholar PubConf Conference Legend Publication ConfPub LDS PDS Conference mapping Source mapping model Domain model (same: ) Metadata model • Used by mediator for mapping/operator execution • Domain model indicates available object types and relationships

  9. Operators • Query language capabilites + scripting support • Set-oriented operators • Input: set of object or mapping instances + parameters / query specification • Output: set of object / mapping instances • Can be combined bottom-up within scripts

  10. Operators overview • Object instances (OI) • Query  OI: queryInstances, queryMatch, attrTransf • OI  OI: getInstances, traverse, traverseSame, map • Aggregated objects (AO) • OI  AO: agg, disagg, fuseAttributes • AO  AO: aggregateSame, aggregateTraverse, aggregateMap • Generic • union, diff, intersect • domain, range, compose

  11. Operators for object instances • queryInstances executes a query on a peer • $S := queryInstances (Conf@DBLP, Series="SIGMOD") returns all SIGMOD conferences from DBLP • map executes a mapping • map ($S, DBLP.ConfPubs) returns all tuples (conference, publication) • traverse returns the range of a mapping • $P := traverse ($S, DBLP.ConfPubs) returns all publications • traverseSame "navigates" to corresponding objects of another physical source • traverseSame ($P, GoogleScholar) returns "equal" publications at GoogleScholar

  12. agg Publication Publication Name: Generic schema matching with Cupid URL: http://vldb.org... Conference: VLDB 2001 Authors: Jayant Madhavan, Philip A. Bernstein, Erhard Rahm Name: Generic schema matching with Cupid URL: http:// data.cs.washington.edu... NoOfCit: 243 Authors: J Madhavan, PA Bernstein, E Rahm DBLP Name: URL: Authors: Conf.: NoOfCit: Generic schema matching with Cupid http://vldb.org... http:// data.cs.washington.edu... Jayant Madhavan, Philip A. Bernstein, Erhard Rahm J Madhavan, PA Bernstein, E Rahm VLDB 2001 243 GS DBLP DBLP DBLP DBLP GS DBLP DBLP GS GS GS DBLP GS GS GS fuseAttributes Instance fusion Publication • Object instances referring to the same real world object  Aggregated object • Auxillary fusion operators • agg / disagg, fuseAttributes DBLP Name: Generic schema matching with Cupid URL: http://vldb.org... Conference: VLDB 2001 Authors: Jayant Madhavan, Philip A. Bernstein, Erhard Rahm Publication GS Name: Generic schema matching with Cupid URL: http:// data.cs.washington.edu... NoOfCit: 243 Authors: J Madhavan, PA Bernstein, E Rahm

  13. Publication DBLP Name: Generic schema matching with Cupid URL: http://vldb.org... Conference: VLDB 2001 Authors: Jayant Madhavan, Philip A. Bernstein, Erhard Rahm Publication Name: Generic schema matching with Cupid URL: http://vldb.org... Conference: VLDB 2001 Authors: Jayant Madhavan, Philip A. Bernstein, Erhard Rahm Name: Generic schema matching with Cupid URL: http:// data.cs.washington.edu... NoOfCit: 243 Authors: J Madhavan, PA Bernstein, E Rahm DBLP DBLP agg DBLP traverseSame DBLP Publication GS GS Name: Generic schema matching with Cupid URL: http:// data.cs.washington.edu... NoOfCit: 243 Authors: J Madhavan, PA Bernstein, E Rahm GS GS GS Operators for aggregated objects • aggregateSame • Identify corresponding objects in another source (traverseSame) • Aggregate resulting objects with input objects (agg) • aggregateSame ($P, GoogleScholar) returns AOs of (DBLP + GoogleScholar) publications

  14. iFuice scripts • Batch execution of operators • Store (intermediate) results in variables • Scripts can be interpreted as mappings • Other scripts can utilize iFuice "script mappings" • Example: SIGMOD test of time award $SIGMODPubs := queryTraverse (LDS=DBLP.Conf, {Name="SIGMOD 1995"}, DBLPConfPubs) $CombinedConfPub := aggregateSame ($SIGMODPubs, GoogleScholar) $CleanedPubs := fuseAttributes($CombinedConfPub) $Result := sort ($CleanedPubs, "NoOfCitings")

  15. Example: SIGMOD test of time award

  16. Personal Infor-mation Manager Bio navigator iFuice mediator Application Mediator interface Web service or java library Script / batch Interactive (step by step) request response Fusion control unit Cache Meta datamodel Repository mapping results Duplicate detection load store load Mapping handler mapping call mapping result Mapping execution service Wrap different map-ping implementations Web service SQL query Java class iFuice script Mediator architecture iFuice mediator

  17. Summary & outlook • iFuice: generic way to dynamic information fusion • Based on instance correspondences of P2P sources • Mediator controled data fusion • Two working modes • Script mode: powerful operators for information fusion tasks (with source selection or transparent) • Explorative mode: navigation in information space • Future work • Finishing prototype implementation • Different domains, e.g., bioinformatics and e-commerce • Tool supported (semi-) automatic integration of local / private data sources

More Related