150 likes | 281 Views
The Role of X ML in Mediated Data Integration Systems with examples from Geological (Map) Data Interoperability B. Brodaric 1, B. Ludaescher 2 , K. Lin 2 1 Geological Survey of Canada 2 San Diego SuperComputer Center, UCSD. GEON: C yberinfrastructure for the G eosciences
E N D
The Role of XML in Mediated Data Integration Systems with examples from Geological (Map) Data Interoperability B. Brodaric1, B. Ludaescher2, K. Lin2 1Geological Survey of Canada 2San Diego SuperComputer Center, UCSD
GEON: Cyberinfrastructure for the Geosciences • 5 year multi-institution large NSF ITR • Geo and IT collaboration on cyberinfrastructure for geoscience research: • grid-based computing environment • connect multi-disciplinary databases and software tools • accessible web portal for the community • stimulate research via analysis, modeling and visualization
Objective: • Discuss role of XML in connecting geoscience databases • Issue: • XML is a meta-language for structuring data • XML allows specification of grammar (structure) and vocabulary (syntax) • Interoperability also requires: • shared semantics (concepts) • shared pragmatics (usage) • sharedsystems • Approach • GEON system that utilizes shared structure, syntax, semantics and pragmatics to connect geoscience (map) databases
Role of XML Data Integration XML Semantics Mediation Example Conclusions • Outline: Role of XML in MediatedSemanticDataIntegration with Examples • XML, RDF, OWL • Data Integration • Mediation • Semantics • Roleof XML • Example: GEON Geologic Map Interoperability • Conclusions
Role of XML Data Integration Semantics Mediation XML Example Conclusions • XML: eXtensible Markup Language • Purpose: a meta-language for structuring text-based* data • Components: tags (syntax) and schema (structure): Tags: embedded in documents <rock type> <texture> fine-grained </texture> sandstone </rock type> Schema: specifies tags and structure (sequence, nesting): DTD, XML Schema <!ELEMENT sandstone (texture*, #PCDATA)> • Usage: archival, transport, exchange • Issues: semantics, pragmatics, integration system • <…>sandstone</…><…>arenite</…><…>grès</…> • <sandstone> <arenite> <grès>
ontology rock type has-texture texture grès fine grained sandstone has-texture sandstone fine grained arenite Role of XML Data Integration XML Mediation Semantics Example Conclusions • RDF:Resource Description Framework • Purpose: model for structuring metadata & linking to data (resources) • semantics, pragmatics: theory for describing data, reasoning • Components: • Triples (RDF): resource (data)propertyvalue • Schema (RDFS): metadata structure—class, property, domain, range,… • Issues: encoding language (structure, syntax), domain ontology (semantics) • OWL:Web Ontology Language: extends RDFS, has XML encoding <owl:Class rdf:ID=“sandstone” <rdfs:subClassOf rdf:resource=“#rocktype” </owl:Class> <sandstone rdf:ID=“arenite” />
e.g. find all map units containing paleozoic, sedimentary, felsic rocks sandstone arenite slate english grès arénite french Role of XML Data Integration Semantics Mediation XML Example Conclusions • Data Integration: of geological map data sources that have: • disparate locations • heterogeneous syntax, structure, semantics, systems
e.g. find all map units containing paleozoic, sedimentary rocks sandstone arenite slate Wrapper global local english Mediator global Wrapper global local grès arénite french Role of XML Data Integration Semantics Mediation XML Example Conclusions • Mediation: data integration mechanism • Mediator: distributor—receive & dispatch query, integrate & dispatch results • Wrapper: translator—globallocal query, localglobal results
global ontology geol. concept concepts geol. unit earth material user/query/tool local database classifications sandstone arenite slate english slate sandstone vocabulary sandstone slate grès ardoise … mediator wrappers grès arénite french Role of XML Data Integration Semantics Mediation XML Example Conclusions • Semantics: ontologies for specifying shared meanings
e.g. find all map units containing paleozoic, sedimentary rocks sandstone arenite slate Wrapper global local english Mediator global Wrapper global local Ontology XML RDF OWL GML XML RDF OWL GML XML GML XML RDF OWL GML XML RDF OWL french grès arénite Role of XML Data Integration Semantics Mediation XML Example Conclusions • Role of XML: data transport language; ontology transport & archival language
meaning System 2 System 1 I N T E R O P E R A B I L I T Y Concepts: NADM - ontology Concepts: NADM - ontology Semantics Schema: NADM – geological OGC - spatial Z.39.50 - metadata Schema: NADM – geological OGC - spatial Z.39.50 - metadata Structure Languages:SQL - database RDF, OWL - ontology XML - transport Languages: SQL - database RDF, OWL - ontology XML - transport Syntax File Format:SHP - ESRI DXF - ACAD File Format:SHP - ESRI DXF - ACAD Services: WSDL - description UDDI - discovery Services: WSDL - description UDDI - discovery Systems Transfer Protocols: FTP, HTTP Transfer Protocols: FTP, HTTP Operating Systems: MS, Linux Operating Systems: MS, Linux Role of XML Data Integration Semantics Mediation XML Example Conclusions • Role of XML: transport and archival language
e.g. find all map units containing paleozoic, sedimentary rocks “paleozoic” “paleozoic, sedimentary” Role of XML Data Integration Semantics Mediation XML Example Conclusions • Example: GEON geological map data interoperability: 4 state maps registered • http://kbis.sdsc.edu/GEON/map-integration.html
e.g. find all map units containing paleozoic, sedimentary rocks sandstone arenite slate Wrapper global local english Mediator global Wrapper global local Ontology shape file JAVA JDBC JAVA JDBC JDBC MapServer XML RDF OWL grès arénite french Role of XML Data Integration Semantics Mediation XML Example Conclusions • Example—Role of XML: ontology language; future— data transport language
Role of XML Data Integration Semantics Mediation XML Example Conclusions • Conclusions: XML is not a silver bullet for interoperability: • XML provides a framework to specify syntax and structure for transport & archival in semantic mediated systems • XML is one component of a geoscience interoperability solution • Other components include: • shared syntax (vocabulary): a geoscience lexicon • shared semantics (concepts, classifications): geoscience concepts • shared structure (schema): geoscience relations • shared pragmatics (usage): geoscience inference • sharedsystems: e.g. GEON Geology Ontology Workbench • Ontology development is a priority (lexicon, concepts, relations)