1 / 22

Data Integration in Digital Libraries: Approaches and Challenges

Dr. Ismail Khalil Ibrahim ismail.khalil-ibrahim@scch.at +43 7236 3343 852 www.scch.a t. Data Integration in Digital Libraries: Approaches and Challenges. Bringing Digital Libraries together. Biography.

axl
Download Presentation

Data Integration in Digital Libraries: Approaches and Challenges

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dr. Ismail Khalil Ibrahim ismail.khalil-ibrahim@scch.at +43 7236 3343 852www.scch.at Data Integration in Digital Libraries: Approaches and Challenges Bringing Digital Libraries together I. Khalil Ibrahim

  2. Biography Dr. Ismail Khalil Ibrahim is a senior software develepoer and AgenCom project manager at the Software Competence Center Hagenberg - Austria. He worked in the University of Technology - Baghdad – Iraq from 1985-1990 as a lecturer, in the Human Resources Training and Development Institute - Iraq from 1990-1996 as the head of the academic studies department, in Gadjah Mada University from 1996-2000 as a teaching and research assistant. His main research interests lay in the fields of E-commerce & I-Commerce, Database Applications and Techniques for the Web, Practical Experience and Applications in Information Integration systems , Logic Programming for Information Integration , Agents for Information Retrieval and Knowledge Discovery , XML and Semistructured Data Management , Information Systems Management and Development , Information Technology: Impact, Economic Analysis. Ismail is a member of ACM, SIGMOD, SIGKDD, and SIGecom, general Secretary of the Indonesian Information Society Initiative (IISI), member of the Iraqi Engineers Association (IEA), overseas Collaborator in the E-commerce Lab at the National University of Singapore, editorial Board of the Columbian Journal of Computing “Revista Colombiana de Computación”, chairman of the organizing committee of the 1st and 2nd International Workshop on Information Integration and Web-based Applications & Services (IIWAS'99, IIWAS'00) , Yogyakarta, Indonesia, chairman of the organizing committee of the 3rd International Conference on Information Integration and Web-based Applications & Services (IIWAS'2001), Linz, Austria. Ismail holds a B.Sc. in Electrical Engineering, from the University of Technology, Iraq (1985), M.Sc. and Ph.D., in Computer Eng. and Information Systems from Gadjah Mada University (1998, 2001). I. Khalil Ibrahim

  3. Outline • Data Integration • What is it ? • What does a data integration system look like ? • What are some data integration challenges? I. Khalil Ibrahim

  4. What Is Data Integration? • Providing: • uniform: sources transparent to user • access: query, and eventually updates • multiple: even two is a problem • autonomous: not effect behavior of sources • heterogeneous: different data models, schemas • unstructured: at least semi-structured • informationsources: not only databases I. Khalil Ibrahim

  5. http://……... http://www.amazon.com s1 (Title,Author,Subject) http://www.book-a-million.com s2 (ISBN,Title,Publisher) Example Scenario I. Khalil Ibrahim

  6. Example Scenariocont. Retrieve the titles and subjects of all the technical reports written by (Stephane Bressan) and published by MIT PRESS q1 amazon  (Title,”Stephane Bressan”,subject) q2 book-a-million  (ISBN,Title,”MIT Press”) Join the results I. Khalil Ibrahim

  7. So What is the Problem? • Virtual vs. Materialized Architectures • Access: query or query & update? • Problem similar to updating through views • need distributed transactional services • Mediated schema: yes or no? • without mediated schema we lose advantages • mediated schema requires schema integration • schema integration need query transformation • query transformation need query optimization I. Khalil Ibrahim

  8. Additional Dimensions • How many sources are we accessing? • how autonomous are the sources? • how much knowledge do we have about sources? • how structured are the data in the sources? • Requirements from responses: • accuracy • completeness • machine readable vs. human readable • handling inconsistencies • speed • closed World Assumption vs. Open World Assumption I. Khalil Ibrahim

  9. Related Technologies / Issues • Distributed databases • sources are homogeneous • data is distributed a priori • sources are not autonomous • Similarities at the optimization and execution level • Information retrieval • keyword search • no semantics • Data mining: discovering properties and patterns in data I. Khalil Ibrahim

  10. Current Applications • Intranets • enterprise data integration • web-site construction • World Wide Web • digital libraries • comparison shopping (Netbot, Junglee) • portals integration data from multiple resources • XML integration • Science & Culture • medical genetics: integrating genomic data • Astrophysics: monitoring events in the sky • Environment: puget sound regional synthesis model • Culture: uniform access to all the cultural databases I. Khalil Ibrahim

  11. Paradigms of Data Integration Integration global definedfrom local global “independent” of local CWA OWA global-schema-as-view global-as-view- of-local local-as-view- of-global Database Schema Integration Data Warehousing Mediation I. Khalil Ibrahim

  12. Paradigms of Data Integration II • Data Warehousing (materialization architecture) • data of interest is collected in a central place and a web site is built on top of it • queries are applied to the data warehouse easy to support queries, transactions hard to modify, the warehouse is not connected to the providers of information, ... etc. I. Khalil Ibrahim

  13. Wrapper Wrapper Wrapper Data Source Data Source Data Source Data Warehousing Architecture Application Data Warehouse Data Extraction I. Khalil Ibrahim

  14. Paradigms of Data Integration III • Information Mediation (virtual architecture) • data remains in web sources • rules that relate external data to internal application data is not replicated, data are guaranteed to be up-to-date query optimization and execution is more complex I. Khalil Ibrahim

  15. Query Execution Engine Catalog Wrapper Wrapper Data Source Data Source Mediation Architecture Application Global Data Model Local Data Model I. Khalil Ibrahim

  16. GAV LAV Running Example World Relations: Book(title,year,author,subject)BookYear(title,year) BookRev(title,author,review) Source Relations: DB1(title,author,year) DB2(title,author,year) DB3(title,review) I. Khalil Ibrahim

  17. Global As View (GAV) • Define a global schema of objects ande write down rules to collect these objects • for each relation R in the mediated schema, we write a query over the sources' relations specifying how to obtain R's tuples from the sources (Query unfolding) traditional query processing applies requires the right sources to be avaliable and compliant I. Khalil Ibrahim

  18. Local As View (GAV) • For every information source (S), we write a query over the relations in the mediated schema that describes which tuples are found in S (Query folding or Answering Queries using Views) may be able to answer a query based on the avaliable partial information generally, may not be able to answer the query needs non standard query processing techniques potentially high complexity I. Khalil Ibrahim

  19. Challanges • Complexity over traditional DBs: heterogeneous, autonomous, network-bounded surces • Query reformulation now understood • map queries over mediated schemas to „wrapped“ sources (heterogeneity) • Issues remain in query processing • few statistics (autonomous sources) • unanticipated delays and failures (network-bounded sources) I. Khalil Ibrahim

  20. Conclusions Data integration handles many problems needed for embedded systems applications • Many data sources • Easy addition and deletion of sources • Different source capabilities • Dealing with network delays • Easy for user I. Khalil Ibrahim

  21. Publications • Semantic Query Transformation for the Integration of Autonomous Information Sources (INAP’99 – Tokyo) • IKA: Unity in Heterogenity (IIWAS’99 – Yogyakarta) • Information Reterival Agents for the Intelligent Integration of Information Sources (MulNet 2000 - Bandung) • A Multilingual Natural Language Interface for Mediating E-Commerce Product Catalogs (INAP2000 – Tokyo) • Semantic Query Transformation for the Intelligent Integration of Information Sources over the Web (WIIW2001 – Rio de Janeiro) • Rewriting Rules for Semantic Query Transformation in E-Commerce Applications (DS9 – Hong Kong) • Data Integration in Digital Libraries: Challenges and Approaches (IndonesiaDL– Bandung) I. Khalil Ibrahim

  22. Thank you for your attention! I. Khalil Ibrahim

More Related