1 / 20

Bethesda, Maryland, April 6, 1999

Bethesda, Maryland, April 6, 1999. Semantic Interoperability and Information Brokering in Global Information Systems. Amit Sheth Large Scale Distributed Information Systems Lab University of Georgia http://lsdis.cs.uga.edu. autonomy. Information Integration Perspective. distribution.

dung
Download Presentation

Bethesda, Maryland, April 6, 1999

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bethesda, Maryland, April 6, 1999 Semantic Interoperability and Information Brokering in Global Information Systems Amit ShethLarge Scale Distributed Information Systems LabUniversity of Georgiahttp://lsdis.cs.uga.edu

  2. autonomy Information Integration Perspective distribution (terminological, contextual) semantic heterogeneity Information Brokering Perspective meta-data data knowledge “Vision” Perspective information data connectivity computing Three perspectives to GlobIS

  3. a society for ubiquitous exchange of (tradeable) information in all digital forms of representation; information anywhere, anytime, any forms Generation III 1997... ADEPT, InfoQuilt DL-II projects InfoSleuth, KMed, DL-I projects Infoscopes, HERMES, SIMS, Garlic,TSIMMIS,Harvest, RUFUS,... Generation II 1990s VisualHarness InfoHarness Generation I 1980s Mermaid DDTS Multibase, MRDSM, ADDS, IISS, Omnibase, ... Evolving targets and approaches in integrating data and information (a personal perspective) Infocosm

  4. Generation I • Data recognized as corporate resource — leverage it! • Data predominantly in structured databases, different data models, transitioning from network and hierarchical to relational DBMSs • Heterogeneity (system, modeling and schematic) as well as need to support autonomy posed main challenges; major issues were data access and connectivity • Information integration through Federated architecture • Support for corporate IS applications as the primary objective, update often required, data integrity important

  5. Database System • Semantic Heterogeneity • Differences in DBMS • data models (abstractions, constraints, query languages) • System level support (concurrency control, commit, recovery) 1980s C o m m u n i c a t i o n • Operating System • file system • naming, file types, operation • transaction support • IPC 1970s • Hardware/System • instruction set • data representation/coding • configuration Generation I (heterogeneity in FDBMSs)

  6. External Schema External Schema Federated Schema . . . schema integration Export Schema Export Schema Export Schema . . . . . . Component Schema Component Schema schema translation . . . Local Schema Local Schema . . . Component DBS Component DBS Generation I (Federated Database Systems: Schema Architecture) • Dimensions for interoperability and integration:distribution, autonomy and heterogeneity • Model Heterogeneity: Common/Canonical Data Model Schema Translation • Information sharing while preserving autonomy

  7. Schematic Conflicts Entity Definition Incompatibility Schematic Discrepancies Abstraction Level Incompatibility Domain Definition Incompatibility Data Value Incompatibility Naming Conflicts Database Identifier Conflicts Schema Isomorphism Conflicts Missing Data Items Conflicts Generalization Conflicts Aggregation Conflicts Data Value Attribute Conflict Entity Attribute Conflict Data Value Entity Conflict Known Inconsistency Temporal Inconsistency Acceptable Inconsistency Naming Conflicts Data Representation Conflicts Data Scaling Conflicts Data Precision Conflicts Default Value Conflicts Attribute Integrity Constraint Conflicts B U T these techniques for dealing with schematic heterogeneity do not directly map to dealing with much larger variety of heterogeneous media Generation I (characterization of schematic conflicts in multidatabase systems) Sheth & Kashyap, Kim & Seo

  8. Generation II • Significant improvements in computing and connectivity (standardization of protocol, public network, Internet/Web); remote data access as given; • Increasing diversity in data formats, with focus on variety of textual data and semi-structured documents • Many more data sources, heterogeneous information sources, but not necessarily better understanding of data • Use of data beyond traditional business applications: mining + warehousing, marketing, e-commerce • Web search engines for keyword based querying against HTML pages; attribute-based querying available in a few search systems • Use of metadata for information access; early work on ontology support distribution applied to metadata in some cases • Mediator architecture for information management

  9. Global/Enterprise Web Repositories Nexis UPI AP Digital Videos Data Stores Documents Digital Maps . . . . . . Digital Images Digital Audios . . . Generation II (limited types of metadata, extractors, mappers, wrappers) Find Marketing Manager positions in a company that is within 15 miles of San Francisco and whose stock price has been growing at a rate of at least 25% per year over the last three years Junglee, SIGMOD Record, Dec. 1997 EXTRACTORS METADATA

  10. METADATA STANDARDS General Purpose: Dublin Core, MCF Domain/industry specific: Geographic (FGDC, UDK, …), Library (MARC,…) User Ontologies Classifications Domain Models Move in this direction to tackle information overload!! Domain Specific Metadata area, population (Census), land-cover, relief (GIS),metadata concept descriptions from ontologies Domain Independent (structural) Metadata (C++ class-subclass relationships, HTML/SGML Document Type Definitions, C program structure...) Direct Content Based Metadata (inverted lists, document vectors, WAIS, Glimpse, LSI) Content Dependent Metadata(size, max colors, rows, columns...) Content Independent Metadata(creation-date, location, type-of-sensor...) Data(Heterogeneous Types/Media) Generation II (a metadata classification: the informartion pyramid)

  11. VisualHarness – an example

  12. traditional queries based on keywords • attribute based queries • content-based queries NOW NEXT • ‘high level’ information requests involving ontology-based, iconic, mixed-media, and media-independent information rrequests • user selected ontology, use of profiles What’s next (after comprehensive use of metadata)? Query processing and information requests

  13. FGDC Metadata Model Theme keywords: digital line graph,hydrography, transportation... Title: Dakota Aquifer Online linkage:http://gisdasc.kgs.ukans.edu/dasc/ Direct Spatial Reference Method: Vector Horizontal Coordinate System Definition:Universal Transverse Mercator… … … ... UDK Metadata Model Search terms: digital line graph, hydrography, transportation... Topic: Dakota Aquifer Adress Id:http://gisdasc.kgs.ukans.edu/dasc/ Measuring Techniques: Vector Co-ordinate System:Universal Transverse Mercator… … … ... GIS Data Representation – Example multiple heterogeneous metadata models with different tag names for the same data in the same GIS domain Kansas State

  14. Generation III • Increasing information overload and broader variety of information content (video content, audio clips etc) with increasing amount of visual information, scientific/engineering data • Continued standardization related to Web for representational and metadata issues (MCF, RDF, XML) • Changes in Web architecture; distributed computing (CORBA, Java) • Users demand simplicity, but complexities continue to rise • Web is no longer just another information source, but decision support through “data mining and information discovery, information fusion, information dissemination, knowledge creation and management”, “information management complemented by cooperation between the information system and humans” • Information Brokering Architecture proposed for information management

  15. INFORMATION CONSUMERS arbitration between information consumers and providers for resolving information impedance People Corporations Programs Universities Government Information Request Information Request Information Request User Query User Query User Query INFORMATION BROKERING InformationSystem InformationSystem InformationSystem InformationSystem DataRepository DataRepository Newswires Corporations dynamic reinterpretation of information requests for determination of relevant information services and products— dynamic creation and composition of information products Research Labs Universities INFORMATION PROVIDERS Information Brokering: An Enabler for the Infocosm INFORMATION/DATA OVERLOAD

  16. T H R E E D I M E N S I O N S C O N S U M E R S B R O K E R S P R O V I D E R S V O C A B U L A R Y S E M A N T I C S M E T A D A T A S T R U C T U R E D A T A S Y N T A X S Y S T E M Information Brokering: Three Dimensions Objective:Reduce the problem of knowing structure and semantics of data in the hugenumber of information sources on a global scale to: understanding andnavigating a significantly smaller number of domain ontologies

  17. W W W + Information Brokering W W W a confusing heterogeneity of media,formats (Tower of Babel) information correlation using physical (HREF)links at the extensional data level location dependent browsing of informationusing physical (HREF) links user has to keep track of information content !! Domain Specific Ontologies as “semantic conceptual views” Information correlation using concept mappings at the intensional concept level Browsing of information using terminological relationships across ontologies Higher level of abstraction, closerto user view of information !! What else can Information Brokering do?

  18. Concepts, tools and techniques to support semantics semanticproximity context inter-ontologicalrelations media-independentinformation correlations ontologies(esp. domain-specific) profiles domain-specific metadata

  19. Context, context, context Media-independent information correlations Multiple ontologies Semantic Proximity (relationships between concepts within and across ontologies) using domain, context, modeling/abstraction/representation, state Characterizing Loss of Information incurred due to differences in vocabulary Tools to support semantics BIG challenge:identifying relationship or similarity between objects of different media, developed and managed by different persons and systems

  20. SEMANTIC HETEROGENEITY metadata ontologies contexts SEMANTIC INTEROPERABILITY Heterogeneity... … is a Babel Tower!!

More Related