480 likes | 583 Views
Information Integration Lecture 1. Introduction. Michael Genesereth Autumn 2001. Information Processors. x>y & y>z => x>z. a>b. b>c. a>c?. Universal Connectivity. Information Broker. Client. Client. Client. Information Broker. Source. Source. Source.
E N D
Information Integration Lecture 1 Introduction Michael Genesereth Autumn 2001
Information Processors x>y & y>z => x>z. a>b. b>c. a>c?
Information Broker Client Client Client Information Broker Source Source Source
Syntactic Search Engines Google Search Words Document References Document Document Document Document Document
Too Many Results Query: Who is older -- Jane or John? Search Words: John Jane older Document Fragments: ..John is older than Jane... Jill wants to know whether John is older than Jane... ..John is older than Jill... ...Jim is older than Jane...
Too Few Results Query: Is it the case that John is older than Jane? Document fragments: ..John is more advanced in years than Jane... ..Jane is younger than John... ...John is the father of Jane...
No Integration Query: Is it the case that John is older than Jane? Documents: ...John is older than Jill... ...Jill is older than Jane...
Content versus Form Semantic View Syntactic View Thosewhowillnotr easonPerishinthe act;Thosewhowill notactPerishfort hatreason. Those who will not reason Perish in the act; Those who will not act Perish for that reason.
Structured Data Free Form Text Easy to Use but limited capability Too Few answers, too many answers Impossible to Aggregate effectively Structured Data Taxonomy, Attributes, Typed Values Powerful search possible Aggregation possible
Databases name manager office phone John Jill MJH222 38086 Jane Jerry Cedar12 57493 Jill MJH222 Jerry 420-032 56777
Fragmentation name manager office phone John Jill MJH222 38086 Jane Jerry Cedar12 57493 name manager office phone Jill MJH222 Jerry 420-032 56777 name manager John Jill Jane Jerry Jill Jerry name office phone John MJH222 38086 Jane Cedar12 57493 Jill MJH222 Jerry 420-032 56777 Horizontal fragmentation Vertical Fragmentation
Replication • Network Issues • Latency • Bandwidth • Reliability • Information Source Issues • Limited Availability • Performance • Unscheduled Failures • Solution - Replication • Problems - Cost and Update
Heterogeneity name manager office phone John Jill MJH222 38086 Jane Jerry Cedar12 57493 Jill MJH222 Jerry 420-032 56777 name employee location telephone John MJH222 7238086 Jane Cedar12 7257493 Jill John MJH222 Jerry Jane 420-032 7256777 “The biggest problem facing anyone who wants to search multiple structured databases. . .is that many organizations use different words to describe the same thing. “ Martin Marshall, Communications Week
Automatic Information Integration integrated access to fragmented, heterogeneous, distributed data sources giving the illusion of a homogeneous data management system Client Client Client Information Broker Source Source Source
Potential Application Areas Corporate Logistics - Enterprise Resource Directories Personnel, locations, organizations, equipment, orders Electronic Commerce - Integrated Product Catalogs Catalogs, inventories, product ratings, contracts Health Care - Consolidated Patient Records Doctors, nurses, lab technicians, administrators, patients Multidisciplinary Engineering - Concurrent Engineering Architects, engineers, construction planners Command and Control - Situation Assessment Commanders, intelligence, field officers, consultants
Question Give me a list of 15 inch aluminum skillets with nonstick coating rated at least 4 out of 5 by Consumer Reports that sell for under $30 and are currently in stock.
Data Sources Retailer Product Data Vendor Catalogs Consumer Reports Ratings Currency Conversion Tables Price Sheets Inventory Data Demographic Data Company Data
Quotes The catalog … is what I believe is blocking the growth of Internet commerce.- Geoffrey Moore, Red HerringContent catalogs are critical to enabling an electronic conversation between business partners- Goldman Sachs, November 2000You can’t buy it if you can’t find it.- Amos Barzilay
Infomaster Data Integration System - integrated access to heterogeneous data sources giving the illusion of a homogeneous data management system Client Client Client Infomaster Source Source Source “Infomaster creates an environment that makes it easier for information consumers to get the information they need to answer their questions, while making it easier for owners to publish and share their databases. “ Dennis Rayer, Manager, Data Warehouse, Stanford University
Demonstration Architecture Costco Buyer GTW Catalog User Payless Buyer Costco Interface GTW Interface Payless Interface Rule Library Integrator Internal Warehouse Corning Agent Mirro Agent Regal Agent Corning Data Source Mirro Data Source Regal Data Source
Demonstration Architecture Costco Buyer GTW Catalog User Payless Buyer Costco Interface GTW Interface Payless Interface Rule Library Integrator Internal Warehouse Corning Agent Mirro Agent Regal Agent iMerge Corning Data Source Mirro Data Source Regal Data Source
Course Schedule 1. MRG - Introduction 2. MRG - Data Model - Project Phase Ia 3. MRG - Knowledge Model - Project Phase Ib 4. MRG - Data Integration 5. MRG - Data Integration in Infomaster - Project Phase II 6. MRG - Data Aggregation 7. MRG - Data Aggregation in Infomaster - Project Phase III 8. xxx - View inversion, Containment, and bucket method 9. xxx - Qian, Duschka and Genesereth, Master Schema
Course Schedule 10. xxx - XML, RDF 11. xxx - xCBL, ebXML, cXML 12. xxx - XPath and XSL 13. xxx - standards (e.g. D&B) and directories (e.g. UDDI) 14. yyy - iMerge 15. yyy - Cohera, requisite 16. yyy - a2i, goto 17. xxx - student papers and projects 18. xxx - student papers and projects
Grade Requirements Participation (20%) Attendance Good Questions Good Ideas Project (20%) Functionality Performance Presentations (20%) Familiarity with Material Strengths and Weaknesses Additional Perspectives Clear Exposition and Good Discussion Paper (40%) Correctness and Completeness Appropriate Incorporation of Existing Material Inherent Interest Heft
Deadlines October 4 - Volunteer for Topic Presentation October 9 - Project Phase I complete October 16 - Project Phase II complete October 23 - Project Phase III complete October 30 - Paper Proposal November 20 - Paper Complete December 4 - Paper Ready for Presentation
Assignments Find Teammates Register on Course Website Volunteer or Be Assigned Read Introduction Papers (logic and cghipuw) Read Data Model Papers (rdf and graphs) Think
Pre-Lecture Exercise What is Information Integration? multiple users and multiple sources fragmentation, replication, heterogeneity update and query What are some examples? movies catalogs patient records collaborative design What is not included? observation, including parsing of images, etc. action, including fancy graphics, etc. planning and execution beyond info exchange