420 likes | 614 Views
Jinsoo Park, Ph.D. Assistant Professor College of Business Administration Korea University jinsoopark@korea.ac.kr http://ids.korea.ac.kr. Research Issues & Challenges in Semantic Web. Self-Introduction. Short Bio 1999, Ph.D. in Information Systems, The University of Arizona
E N D
Jinsoo Park, Ph.D. Assistant ProfessorCollege of Business AdministrationKorea Universityjinsoopark@korea.ac.krhttp://ids.korea.ac.kr Research Issues & Challengesin Semantic Web
Self-Introduction • Short Bio • 1999, Ph.D. in Information Systems, The University of Arizona • 1999.9 – 2002.8, Assistant Professor, Carlson School of Management, University of Minnesota • 2002.9 – Present, Assistant Professor, Business School, Korea University • Research Areas • Content and Metadata Management in Intra- and Inter-organizational Information Systems • Semantic Interoperability and Integration • Knowledge Sharing and Coordination • Ontology • Teaching Areas • PhD – Research Methods • MBA/MS – IS Development Methodologies, AI, Databases, Data Structures & Algorithms, Java • Undergraduate – Systems Analysis and Design, MIS, IT Infrastructure Jinsoo Park
e-Business – Technical Challenges • Communication Security & Reliability • System Heterogeneity • Data/Information Heterogeneity • Business Process Heterogeneity • Dynamic Business and Technology Heterogeneity Jinsoo Park
Enterprise Applications Enterprise Applications CRM ERP SFA SCM Supplier Buyer Enterprise Applications ERP SCM CRM SFA Buyers/Suppliers Inter-organizational Interoperability • Interaction with diverse, complex enterprises Interoperability Shim et al. (2000) Jinsoo Park
“Enterprise data integration is the top item on every CIO’s wish list. So what are we doing about it?” • “Are We Working On The Right Problems?,” Plenary Panel led by Michael Stonebraker, 1998 ACM SIGMOD Conf. • “The diversity of information content and formats is a salient factor in nearly all distributed systems, and the major challenge is to make diverse information systems interoperate at thesemantic level while retaining their difference.” • March, Hevner & Ram, Information Systems Research, 2000 • 98% of companies recently interviewed say that integration is either “extremely important” or “very important” to their firm’s IT strategy • Forrest Research, 2001 Motivations • Vast amounts of data and escalating • Highly heterogeneous – a plethora of semantic conflicts • Data types, data formats, structures, community, … • Considerable amount of legacy data with no associated metadata • The growth in existing data far exceeds our abilities to locate and analyze the relevant data Sponsors: NSF, NASA, and NIH Jinsoo Park
Interoperability Linguistic, Social, and Philosophical Solution Semantic Level (Knowledge Level) Ontology Metadata Context Agent Technology Solution Syntactic Level (Application Level) Interface Message Transport Protocol Interoperability Ram, Park and Lee (1999) Jinsoo Park
What is Semantics? • The meaning and the use of data (Woods, 1975) • “meaning or relationship of meanings, or relating to meaning (Webster) • Week vs. Deep Semantics (Sheth, 1995) • Week semantics - semantics that can be identified based on structural, syntactic, and value/extensional information in databases • Deep semantics - semantics that involve the issues of human cognition, perception, or interpretation • Example • 47 • Apples are expensive • A 100 ~ 91? • Semantics bring information closer to human thinking and decision-making Jinsoo Park
Semantics-based Communication • Theory of communication that links results from semiotics, linguistics and philosophy into actual information technology • Meaning Triangle (Odgen and Richards, 1923) Concept evokes refers to refers to Symbol stands for Thing Jinsoo Park
Semantic Interoperability Problems • Contextualdifferences between source and target information systems • Different vocabularies, taxonomies, schemas • Implicit semantics – tacit knowledge • Lack of separation between content, intent and process • Embedded rules • Consistency between different versions of the same schema Jinsoo Park
Research on Semantic Interoperability DataLevelAnalysis • Analysis of the differences in data domains caused by the multiple representations and interpretations of similar data • DeMichiel (1989), Yu et al. (1991), Ventrone & Heiler (1991), Sciore et al. (1994), Kahng & McLeod (1998), Goh et al. (1999) SchemaLevelAnalysis • Analysis of differences in logical structures and/or inconsistencies in metadata (i.e., schemas) of the same application domain • Batini & Lenzerini (1984), Navathe et al. (1986), Geller et al. (1992), Garcia-Solaco et al. (1995), Lakshmanan et al. (1997) Few research has been done on both levels at the same time Jinsoo Park
DISCLOSURE DATALINE Attribute Value Attribute Value COMPNO 3842 CODE HOND CF 19,860,228 PERIOD END 28-02-86 NI 146,502 EARNED FORORDINARY 146,502 NS 2,909,574 TOTAL SALES 2,909,574 NRCEX (ROE) 0.11 RETURN ONSHAREHOLDEREQUITY 19.57 An Example – Data-Level Conflicts Jinsoo Park
DB 1 DB 2 TAX TAX-AMOUNT YEAR TAX-TYPE AMOUNT YEAR PROPERTY WATER 1999 Property 250.34 1999 250.34 38.99 1999 Water 38.99 2000 234.98 59.05 2000 Property 234.98 2000 Water 59.05 DB 3 PROPERTY WATER YEAR AMOUNT YEAR AMOUNT 1999 250.34 1999 38.99 2000 234.98 2000 59.05 An Example – Schema-Level Conflicts Jinsoo Park
The Revolution of the Web Trusted Web Resources Proof, Logic and Ontology Languages (e.g., DAML+OIL) Shared terms/terminology Machine-Machine communication 2010 Resource Description Framework (RDF) eXtensible Markup Language (XML) Self-Describing Documents 2000 HyperText Markup Language (HTML) HyperText Transfer Protocol (HTTP) Formatted Documents Foundation of the Current Web 1990 Berners-Lee and Hendler (2001), Nature Jinsoo Park
The Current Web • Global information space for human consumption. • Information and its presentations are mixed up. • Accessible by merely keywords: highrecall,lowprecision • No distinction of the keyword search “Rose” among these concepts: Rational Rose, Gun ’n Roses, Rose (flower), Rose (Titanic), England’s Rose. • Difficult for machines to automatically comprehend, process, communicate and interoperate. • Problemsin information: • finding, • extracting, • representing, • interpreting, • maintaining. Jinsoo Park
The Semantic Web • “The Semantic Web is the representation of data on the World Wide Web (based on the RDF standards and other standards to be defined).” (http://www.w3.org/2001/sw/) • Envisioned by Tim Berners-Lee and researched by DARPA team and others • “A web of data that can be processed directly or indirectly by machines” • Tim Berners-Lee, Weaving the Web, HarperBusiness, 2000. • The “NextGenerationWeb” with well-established infrastructure for expressing information in a • precise, • human-readable, and • machine-interpretable form. Jinsoo Park
Grid Computing e-Science Agents Web Services e-Business The Vision [Source: C. Globe, “Information Grids, the Semantic Web & Why Ontologies Matter”] Jinsoo Park
Current Research and Technologies • Semantic Web technologies are still very much in their infancies • Little consensus about the likely direction of the Semantic Web • No widespread agreement on exactly what the Semantic Web is • Infrastructure • XML(S), RDF(S) • Ontology language • DAML+OIL, OWL, … • Two paradigms in semantic interoperability • Data warehousing (eager) approach • On-demand driven (lazy) approach Jinsoo Park
Benefits of XML over HTML (a)HTML <html> <body topmargin=20 leftmargin=10> <font size=3> <table width="389" border="1"> <tr> <td height="82" valign="middle"> <pre> Regular Our Price Price LaserJet1150 380,000 357,000 In stock </pre> </td> </tr> </table> ... </font> </body> </html> (b)XML <?xml version=“1.0”?> <document> <productInfo> <product>LaserJet1150</product> <regularPrice>380,000</regularPrice> <ourPrice>357,000 </ourPrice> <inStock>yes</inStock> </productInfo> </document> Jinsoo Park
But XML faces following problems … • Multiple Standards • Need for consistent and standardized tags • There are so many XML standards “there are more than a dozen XML protocols - for Financial Trading applications alone” (Chairman of a Financial Services XML Working Group) • e.g., (price, cost), (subject, theme, title), (car, automobile) ... • Implicit Semantics • Agreement upon the precise meaning of each tag • e.g., How precisely defined is the notion of “price” • Is it in dollars($) or won (\)? • Even if it is “Dollars” is it US dollars, Canadian dollars, or Hong Kong dollars? • Does the “price” include sales tax? Does it include the value added tax (VAT)? • About notion of “title” • It is a movie title or a drama title? • About notion of “bank” • It is a financial institution or a river embankment • Modeling Conflicts Jinsoo Park
But XML faces following problems • Evolution of Semantics • Problem of evolution • e.g., Conversion form using local currency to using Euros in Europe • e.g., GMDaewoo, RenaultSamsung • Multiple Purposes • Different purposes necessitate different interpretations of the information • e.g., Student • Professor – Taking courses • Staff – Registration • e.g., Corporate household/family structure • Financial – Risk (credit - bankruptcy) • Accounting – Account consolidation • Legal – Liability (insurance) • and these are dynamic, changing over time .. Jinsoo Park
RDF & RDF Schema • RDF (Resource Description Framework) • Represents metadata about Web resources • e.g., title, author, and modification data of a Web page … • Data model → resource, property, property value • rdf:Description, rdf:ID, rdf:type • Purport to provide interoperability between applications that exchange machine-understandable information on the Web • RDF Schema • Provides semantics about RDF • a.k.a. RDF Vocabulary Description Language • XML schema: about syntax • Defines an appropriate RDF vocabulary (classes, properties and constraints) for each specific domain • Extension of data model → class and property hierarchy • rdfs:subClassOf, rdfs:subPropertyOf, rdfs:domain and rdfs:range • Logical connectives such as conjunction, disjunction, and negation are not provided • Not full-fledged ontological modeling and reasoning Jinsoo Park
universityStudent undergraduteStudent graduateStudent Properties: degree RDF & RDF Schema <rdf:RDF xmlns:rdf=“http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:rdfs=“http://www.w3.org/2000/01/rdf-schema#” xml:base=“ http://ids.korea.ac.kr/student.rdfs”> <rdfs:Class rdf:ID="universityStudent“/> <rdfs:Class rdf:ID=“undergraduateStudent"> <rdfs:subClassOf rdf:resource="#universityStudent"/> </rdfs:Class> <rdfs:Class rdf:ID="graduateStudent"> <rdfs:subClassOf rdf:resource="#universityStudent"/> </rdfs:Class> <rdf:Property rdf:ID=“degree"> <rdfs:domain rdf:resource="#graduateStudent"/> <rdfs:range rdf:resource="http://www.w3.org/2000/01/rdf- schema#Literal"/> </rdf:Property> </rdf:RDF> RDF <rdf:RDF xmlns:rdf=“http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns=“http://ids.korea.ac.kr/student.rdfs#“> <rdf:Description rdf:about =“http://ids.korea.ac.kr/graduateStudent.rdf#Honggildong”> <rdf:type resource=“http://ids.korea.ac.kr/student.rdfs#graduateStudent”/> <degree>MIS</degree> </rdf:Description> </rdf:RDF> RDF Schema Jinsoo Park
OWL • Web Ontology Language • RDF schema is lacking in some desirable expressiveness • People use different words to represent the same thing • cardinality constraints, conjunction, disjunction … • OWL extends RDF Schema • Uses all RDF Schema’s basic notions of Class, Property, domain, and range • Adds more vocabulary for describing properties and classes • relations between classes (e.g., disjointness) • cardinality (e.g., “exactly one”) • richer typing of properties • characteristics of properties (e.g., symmetry) • enumerated classes • OWL can be used to explicitly represent the meaning of terms in vocabularies and the relationships between those terms Jinsoo Park
CREAM • Conflict Resolution Environment for Autonomous Mediation • An integrated and collaborative facility for achieving semantic interoperability among the participating heterogeneous information sources. • Agent-based Mediation Architecture • SCROL –Semantic Conflict Resolution OntoLogy • Domain independent • Semantic Query Transformation Park & Ram (2004), ACM Transactions on Information Systems Jinsoo Park
Research Questions • What kinds of semantic conflicts are typically foundin a heterogeneous environment? • How do we recognize &resolve such conflicts • To what extent can we automate the process of conflict identification & resolution using mediators? Jinsoo Park
Data Exchange Layer Output Generators XML Generator Users Semantic Filter QBS Interface XSL Generator DTD Generator Semantic Mediators Semantic Mediators Common Repository Databasesource Databasesource Schema Designer Schema Mapper Metadata Layer Ontology Mapper Wrapper Generator SCROL Information Integrator Semantic Mediation Layer Data AccessLayer RMIWrapper ContentWrapper RMIWrapper ContentWrapper Web Source Web Source
Semantic Integration Users Federated Schema Semantic Mediation Service Layer XML Schema SCROL XML DTD DB Schema Business Doc/Schema Jinsoo Park
Metadata – Semantic Model Ram, Park and Ball (1999), IEEE Computer. Jinsoo Park
SCROL –Semantic Conflict Resolution OntoLogy • = (OC, OI, RP, RS, RM, u) • OC - concepts • OI - instances • RP - parenthood relationship (subconcept-of/superconcept-of, instance-of) • RS - sibling relationship (disjoint, peer, part-of, is-a) • RM - (domain instance value) mapping relationship (one-one, one-many, many-many, total, partial, none) • u - root Ram & Park (2004), IEEE Transactions on Knowledge and Data Engineering Jinsoo Park
RootConcept disjoint Concept Concept part-of disjoint Concept Concept Concept Concept Concept is-a part-of mapping mapping Concept Concept Instance Instance Concept Concept Instance Instance is-a mapping peer Instance Instance Instance Instance Concept Concept Concept Concept Concept Concept mapping Instance Instance Instance SCROL – Graphical Illustration Jinsoo Park
SCROL Interface Jinsoo Park
Temporal_Format Graphical Location Day peer Scale Coordinate Date Cardinal_ Number Descriptive Location String peer Code String 10-1 100 101 102 103 10-2 10-3 UTM Julian Date Type String Type Duration Month Day, Year mm/dd/yy Day yyyy/mm/dd Date, Month Day, Year Image Week Month Map Area Town peer Picture City Drawing Vector Square Meter Region Acre Raster VSD part-of JPEG BMP State/ Province City DWG TIFF GIF Country Ontology-Schema Mapping Example census- ending-date census- starting- date census- starting- date area area-size name location duration map size size area name image County-Population City-Population Jinsoo Park
Ontology-Schema Mapper Jinsoo Park
Semantic Mediators Jinsoo Park
Semantic Mediator Communication Protocol • Theory of Speech Acts (Austin 1962, Searle 1969) • Performatives • ASK-ALL(QID, Query) - asking the collection of local queries. • ASK-IF(Query) - asking if Query holds. • DELIVER(QueryResults) - reporting the query results. • DETECT(Query) - traversing the SCROL to check semantic conflicts. • GENERATE(Query) - requesting local query generation. • LOCATE(Query) - requesting directory service to retrieve directory information. • RECONCILE(QueryResults) - requesting semantic reconciliation for the query results. • REPLY-ALL(QID, QueryResults) - replying all the query results being asked. • REPLY-IF(Query, Answer) - replying the Answer upon being asked if Query. • REPORT(QneryResults) - reporting the query results. • RESOLVE-IF(Query, Answer) - reporting the Answer upon being asked if Query can be resolvable. • TELL(Query) - notifying and updating the query request. Jinsoo Park
Key Issues and Potential Research Directions … • Integration vs. Interoperability Integrationbased approach • attempts to build a monolithic view of the enterprise • integrates processes and applications at the event and message levels so multiple systems become one logical unit Interoperabilitybased approach • focuses on the exchange of meaningful, context-driven information between autonomous systems Jinsoo Park
Key Issues and Potential Research Directions … • Machine Understandable Semantics • How can software agents learn something about the meaning of a term that it has never before encountered? • Semantic Mediation and Semantic Query Processing • Conflict Detection and Resolution • Semantic Normalization • C (e1) = C (e2) • Semantic Mapping and Translation • Semantic Association • Dynamic Evolution Jinsoo Park
Key Issues and Potential Research Directions • Ontology Heterogeneity • Different knowledge representation formalism • Language heterogeneity – when ontologies are expressed using different ontology languages • Naming conflicts • e.g., synonyms, homonyms, etc. • Modeling conflicts • e.g., Total Number of Employees could be attributed to inclusion or exclusion of Temporary Employees • Temporal conflicts • Arises when entity values or definitions belong to different times, or time intervals • Conceptualization conflicts • e.g., time intervals vs. time points • Ontology Learning Jinsoo Park
Q & A Jinsoo Park