1 / 23

iTrails: Pay-as-you-go Information Integration in Dataspaces

iTrails: Pay-as-you-go Information Integration in Dataspaces. Presented By Marcos Vaz Salles, Jens Dittrich, Shant Karakashian, Olivier Girard, Lukas Blunschi ETH Zurich 2008-02-22 Summerized By Sungchan Park. Problem: Querying Several Sources. Solution #1: Use a Search Engine.

bracha
Download Presentation

iTrails: Pay-as-you-go Information Integration in Dataspaces

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. iTrails: Pay-as-you-go Information Integration in Dataspaces Presented By Marcos Vaz Salles, Jens Dittrich, Shant Karakashian, Olivier Girard, Lukas Blunschi ETH Zurich 2008-02-22 Summerized By Sungchan Park

  2. Problem: Querying Several Sources Center for E-Business Technology

  3. Solution #1: Use a Search Engine Center for E-Business Technology

  4. Solution #2: Use an Information Integration System Center for E-Business Technology

  5. iTrail Core Idea • Is there an integration solution in-between these two extremes? Center for E-Business Technology

  6. iTrail Core Idea • Is there an integration solution in-between these two extremes? • Declaratively add lightweight ‘hints’ to a search engine thus allowing gradual enrichment of loosely integrated data sources Center for E-Business Technology

  7. Example Scenario • Query • “pdf yesterday” • Hints(Trails) • The date attribute is mapped to modified attribute • The date attribute is mapped to receivedattribute • The yesterday keyword is mapped to a query for values of the date attribute equal to the date of yesterday • The pdf keyword is mapped to a query for elements whose names end in pdf Center for E-Business Technology

  8. Where hints come from? • Given by the user • Explicitly • Via Relevance Feedback • (Semi-)Automatically • Information extraction techniques • Automatic schema matching • Ontologies and thesauri (e.g., wordnet) • User communities (e.g., trails on gene data, bookmarks) • All these aspects are beyond the scope of this paper Center for E-Business Technology

  9. Data and Query Model • Data Model • Assume that all data is represented by a logical graph G • Query also represented by graph Center for E-Business Technology

  10. Query Syntax Center for E-Business Technology

  11. Query Example • “//Home/projects//*[“Mike”]” Center for E-Business Technology

  12. Basic Form of a Trail • An unidirectional trail • An bidirectional trail Center for E-Business Technology

  13. Trail Example • Trails in an example scenario • Trails • Given query • “pdf yesterday” • Transformed query • “//*.pdf[modified=yesterday() OR received=yesterday() ].” Center for E-Business Technology

  14. iTrail Query Processing • Matching • Transforming • Merging Center for E-Business Technology

  15. iTrail Query Processing Example • Given Query Q1 = //home/projects//* [“Mike”] • Trail Ψ8:= //home/*.name -> //calendar//*.tuple.category • Resulting Query Q1{Ψ8} = //home/projects/*[“Mike”] U //calendar//*[category=“project”]//*.[“Mike”] • Utilizing G. Miklau and D. Suciu. Containment and Equivalence for an Xpath Fragment. In PODS, 2002. Center for E-Business Technology

  16. Applying Multiple Trail • MMCA(Multiple Match Colouring Algorithm) algorithm • Trail can be applied infinitely • To prevent infinite recursion, a trail should not be rematched to nodes in a logical plan generated by itself Center for E-Business Technology

  17. Other Issues • Trail Pruning • Problem: MMCA is exponential in number of levels • Solution: Trail Pruning • Prune by number of levels • Prune by top-K trails matched in each level • Give weight and prob. to trails • Prune by both top-K trails and number of levels • Trail Indexing • Precompute trail expressions in order to speed up query processing • Trail materialization Center for E-Business Technology

  18. Experiments • Setting • Configured iMeMex to act in three modes • Baseline: Graph / IR search engine • iTrails: Rewrite search queries with trails • Perfect Query: Semantics-aware query • Data Center for E-Business Technology

  19. Experiment, Quality • Compare with baseline Center for E-Business Technology

  20. Experiment, overhead • Compare with perfect query • Overhead is not negligible • However, this can be fixed by exploiting trail materializations Center for E-Business Technology

  21. Experiment, Scalability #1 • Rewrite Time • Query-rewrite time can be controlled with pruning Center for E-Business Technology

  22. Experiment, Scalability #2 • Quality • Pruning improves precision Center for E-Business Technology

  23. Conclusion • Our Contributions • iTrails: generic method to model semantic relationships (e.g. implicit meaning, bookmarks, dictionaries, thesauri,attribute matches, ...) • We propose a framework and algorithms for Pay-as-you-go Information Integration • Smooth transition between search and data integration • Future Work • Trail Creation • Use collections (ontologies, thesauri, wikipedia) • Work on automatic mining of trails from the dataspace • Other types of trails Center for E-Business Technology

More Related