210 likes | 351 Views
MASSACHUSETTS INSTITUTE OF TECHNOLOGY SLOAN SCHOOL OF MANAGEMENT INFORMATION TECHNOLOGIES GROUP SEMANTIC INTEGRATION (COIN PROJECT) For Dr. Bob Popp, DARPA 8 April 2003 Stuart Madnick (smadnick@mit.edu) Michael Siegel (msiegel@mit.edu) Richard Wang (rwang@mit.edu).
E N D
MASSACHUSETTS INSTITUTE OF TECHNOLOGYSLOAN SCHOOL OF MANAGEMENTINFORMATION TECHNOLOGIES GROUPSEMANTIC INTEGRATION(COIN PROJECT)For Dr. Bob Popp, DARPA8 April 2003Stuart Madnick (smadnick@mit.edu) Michael Siegel (msiegel@mit.edu) Richard Wang (rwang@mit.edu)
COntext INterchange (COIN) Project CONTEXT MEDIATION * Automatic conflict detection and conversion - Derived data - Source selection - Source attribution Web Pages INPUT PROCESSING * Automatic web wrapping - Semi-structured text -Multi-source query plan and execution OUTPUT PROCESSING ODBC Driver Web - Publishing Appli- cations Receivers Sources TRUSTED AGENTS Data bases Browsers APPLICATIONS: Financial services, electronic commerce, asset visibility, in-transit visibility.
Background on DARPA Supportfor Context Mediation Research • Initial efforts funded as part of DARPA Intelligent Integration of Information (I3) Program • Period: July 1993 - Sept 1998 • Started under: Gio Wiederhold • then under: Dave Gunning & Bob Neches • Other related activity: • MIT Total Data Quality Management (TDQM) • Since 1991 (web.mit.edu/tdqm)
Context Context Context Role Of Context 01-02-03 02-01-03 $ ? 03-02-01 • CONTEXT VARIATIONS: • - GEOGRAPHIC ( US vs. UK ) • - FUNCTIONAL (CASH MGMT vs. LOANS ) • - ORGANIZATIONAL ( CITIBANK vs. CHASE ) Data: Databases Web data E-mail
Example : Context Differences ( from multiple web sources) Daimler Benz ( DAI ) Financial Data P/E Ratio ABC 11.6 Bloomberg 5.57 DBC 19.19 MarketGuide 7.46
Complementary Aggregation Example • Q: How did CO2 emissions (total, per GDP, per capita) change over time (between 1990 and 2000) in Yugoslavia? • User 1: YUG as a geographic region bounded before the breakup • User 2: YUG as a legal autonomous state Related effort: - Laboratory for Information Globalization and Harmonization Technologies (LIGHT)
World Bank’s World Dev. Indicator DB; UN Statistic Division; Statistics Bureaus OAK Ridge’s CDIAC DB; WRI; GSSD; EPAs Olsen (Web) In 1000 tons per year GDP in billions local currency; Population in millions Total CO2 in 1000 tons per year; GDP in billions USD; CO2/Capita in tons per person; CO2/GDP in tons per million USD; GDP/Capita in USD per person Many sources needed: Meanings in sources & users might differ
The 1999 Overture Unit-of-measure mixup tied to loss of $125Million Mars Orbiter “NASA’s Mars Climate Orbiter was lost because engineers did not make a simple conversion from English units to metric, an embarrassing lapse that sent the $125 million craft off course. . . . . . . The navigators ( JPL ) assumed metric units of force per second, or newtons. In fact, the numbers were in pounds of force per second as supplied by Lockheed Martin ( the contractor ).” Source: Kathy Sawyer, Boston Globe, October 1, 1999, page 1.
Shared Conversion Ontologies Libraries Context Management Application Context Mediator Source Receiver Context Context Context Transformation Source Receiver The Context Interchange Approach Concept: Length Meters Feet f() meters feet part length Select partlength From catalog Where partno=“12AY” 17
COIN Elevation Axioms (Ontology)
Another Context Example Context Mediation Services Company Name DAIMLER-BENZ 614,995 Net Income * 97,736,992 Sales Datastream Company Name DAIMLER-BENZ AG * Net Income 346,577 Sales 56,268,168 WorldScope Company Name DAIMLER BENZ CORP Net Income 615,000,000 * Sales 97,737,000,000 Appl. Users & Disclosure * O&A DEM-USD Exchange Rate Systems 1.00 German Mark= 0.58 US Dollar as 12/31/93 OANDA Web Server * Wrapper Services
Disclosure Worldscope DataStream Country of USD Country of Currency Incorporation Incorporation Used Money Amount Money Amount Money Amount Currency As_Of_Date As_Of_Date As_Of_Date Conversion 3 Letters 3 Letters 2 Letters Currency Symbols 1 1000 1000 Scale Fact or Disclosure Names Worldscope Names DataStream Names Company Names American with ‘/’ as American with ‘/’ as European with ‘ - ’ as Date Style separator separator separator Olsen (OANDA) Web Source uses 3 Letter Currency Symbols and European Date Style with ‘/’ as a separator Some Context Differences Context Definitions
exchange- Rate number string curTypeSym fromCur toCur country- Name officialCurrency scaleFactor currency- Type dateFmt txnDate format currency countryIncorp date fyEnding company- Financials company- Name company Domain Model • Some currency context possibilities: • Currency is stated explicitly as part of record • Currency not stated, but the same for all (e.g., US $) • Currency not stated or constant, but inferred by country Inheritance Attribute Modifier
COIN System Architecture SERVER PROCESSES MEDIATOR PROCESSES CLIENT PROCESSES Web Client COIN N SQL Compiler ( cgi -scripts) Repository SQL Query HTTPD-Daemon HTTPD-Daemon Context Datalog Mediator N Query WWW Gateway SQL Query Mediated Query Optimizer Wrapper Optimized ODBC-compliant Apps Query Plan Executioner HTTPD-Daemon Results (e.g Microsoft Excel) ODBC-Driver Web-site Data Store for HTTPD-Daemon Intermediate Results
System Demonstration Single Source Queries with Mediation Q6. Scenario: Using Context Interchange, the financial analyst can look at the Disclosure data using Datastream Context. Query: Find out from Disclosure what Net Income for DAIMLER-BENZ was. Use Datastream Context. Capabilities Demonstrated: Ability to perform Scale Factor Conversion, Date Format Conversion, Company Name Conversion.
Demonstration – context2.mit.edu Source Context
Conflict Detection and Mediation Mediated Query in Datalog Date convert Scale factor convert Name convert
Mediated SQL Query & Result Mediated SQL Query Adjust scale factor Date format conversion Name conversion Final results – from Disclosure but in Datastream context
The 1805 Overture In 1805, the Austrian and Russian Emperors agreed to join forces against Napoleon. The Russians promised that their forces would be in the field in Bavaria by Oct. 20. The Austrian staff planned its campaign based on that date in the Gregorian calendar. Russia, however, still used the ancient Julian calendar, which lagged 10 days behind. The calendar difference allowed Napoleon to surround Austrian General Mack's army at Ulm and force its surrender on Oct. 21, well before the Russian forces could reach him, ultimately setting the stage for Austerlitz. Source: David Chandler, The Campaigns of Napoleon, New York: MacMillan 1966, pg. 390.
Summary • Tremendous opportunity to gather and integrate information from many diverse sources • But … need to overcome many context challenges • Context-type “metadata” plays a critical role • COIN technology can be an important aid