190 likes | 383 Views
Semantic Web and Retrieval of Scientific Data Semantics. Goran Soldar University of Brighton UK. Dan Smith University of East Anglia UK. Introduction. Semantic Web Introduced by Tim Berners-Lee Data and resources described, interchanged, and processed
E N D
Semantic Web and Retrieval of Scientific Data Semantics Goran Soldar University of Brighton UK Dan Smith University of East Anglia UK
Introduction • Semantic Web • Introducedby Tim Berners-Lee • Data and resources described, interchanged, and processed • Machine understanding of heterogeneous data • Most search engines on the Web are human use oriented • Finding and processing scientific data on the web is time- consuming process Example • Search: Web pages containing the word temperature • Search engine: Google • Search domain: www.cru.uea.ac.uk • Results: 773 web pages
Introduction • Inefficiency of the traditional search • Humans have to browse through web pages • No guarantee that the wanted information will be found • Preferred approach • Describe the semantics of data using RDF/XML format • Store the data in a DBMS • Automatically retrieve desired information based on users requests • Enable client machines to learn the semantics of RDF format described data
Introduction • Objectives of this work • To address the problem of extracting semantics from data files within the meteorology domain. • To build the ontology for the meteorology domain. • To create semantic cases with RDF Model/RDF Schema. • To employDB2 DBMS as the data repository. • To enhance standard DBMS with RDF Triples Engine. • To manage the RDF graph structure with RDF Triples Engine.
Resource Property Value (Subject) (Predicate) (Object) Name ltgrid.dat File RDF structure RDF and Domain Ontology • RDF is a framework for describing metadata. • It enables interoperability between machines by interchanging information about information resources • It is represented with a Directed Labeled Graph
RDF and Domain Ontology • Specific domains represented with RDF • Our focus: The Meteorology domain • The concepts, semantics and the relations between the concepts defined with RDF Schema. • Ontology: An explicit specification of an information domain • RDF Schema: Uses the syntax of RDF Model • Corresponds to XML’s DTD or XML Schema • RDF Schema is a basis for RDF instances
Modelling RDF Model for Meteorology • Three phases of modelling • Development of the vocabulary (ontology) • Design of semantic cases to capture resource description • Creation of semantic case instances • The vocabulary is comprised of main concepts and classes represented by classes and properties • RDF Schema uses RDF Model encoding syntax • rdf:type separates RDF classes from properties • rdfs:subClassOf allows expression of inheritance-relationship between RDF classes
Modelling RDF Model for Meteorology • The Meteorology domain at cru.sys.uea.ac.uk: • Contains about 1000 data files • Made of 9 meteorological topic (sub-domains) • Have all sub-domains designed as RDF classes • have all concepts and elements defined in its Namespace • The ontology is defined in two RDF files: • Class.rdf • Property.rdf • Semantic cases are based on the existing vocabulary • Simple semantic cases designed first • Complex cases are the combination of complex ones
Modelling RDF Model for Meteorology • Our prototype model: • Describes 100 data sets • Contains 4 semantic cases The semantic cases • HeaderCase • URL • FormatType • DataParameter • Comment • Domain ObservationCase • Frequency • TimePeriod • Value PeriodCase • TimeRange • TimePeriod • Value SizeCase • Compression • FileSize • Value
Modelling RDF Model for Meteorology <rdf:Description about="hgt.1958.1000.6h.w1.53x21.dat.gz"> <cru:URL> http://www.cru.uea.ac.uk/cru/pressure/hgt/hgt1000_6h </cru:URL> <cru:FormatType>ASCII</cru:FormatType> <cru:DataParameter> GeopotentialHeight_AtPressure </cru:DataParameter> <rdfs:comment> 6-Hourly GeopotentialHeight at 1000mb </rdfs:comment> <rdfs:domain>cru:Height</rdfs:domain> </rdf:Description> RDF Instance of HeaderCase for a data file
From RDF to Relational Model • Our prototype model: • Comprises of 12 RDF files • One holds semantic case descriptions • Two hold RDF Schema descriptions • Nine contain RDF onstances of semantic cases • Management of RDF-described data • W3C does not recommend any method for manipulating RDF Triples • RDF structure is similar to XML • XML comes with APIs for data manipulation (SAX, DOM), RDF does not
RDF Triple Engine DB2 RDF Triples Model SiRPAC CRU Meteorological Domain Semantic Cases Ontology Modelling RDF Model for Meteorology • We utilise RDF triple structure to achieve the manipulation of data • XML parsers check the syntax of RDF • RDF parsers converts it into triples • RDF tags removed • Triples converted onto Relational model • Stored in DB2 DBMS Mapping RDF model for Meteorology into RDBMS
Applications Users Semantics Retrieval Language (SRL) Web Interface (HTTP Requests) (S RL Requests) Java Web Servlet Engine Server ( TCP/IP) ( TCP/IP) RDF Semantics Triple Support SiRPAC Engine Server Distributed Data ( TCP/IP) and Information Sources for Meteorology JDBC RDF Model DB2 html ascii file RDF Schema dbms RDF architecture for retrieving semantic information Modelling RDF Model for Meteorology
Retrieval of Semantic Information • RDF Triple Engine is responsible for manipulating triples and executing semantic queries • Based on Client/Server architecture with specialised RDF servers • Records in DBMS have graph structure • Not semantically atomic • Additional query processing added to RTE • RTE is aware of graph structure of triples • Able to produce results that reconstruct the graph structure and present in format specified by users
Property Resource Value domain weather frequency temperature daily temperature domain temperature weather daily recorded frequency recorded temperature file file name file ltgrid.dat name size url file www.cru.uea.ac.uk ltgrid.dat size file size_id size_id value size_id 40 unit value url unit size_id Kb 40 Kb www.cru.uea.ac.uk RDF graph for the Weather domain Relational structure of the RDF graph Retrieval of Semantic Information
Retrieval of Semantic Information Property Resource Value cru:URL hgt.1958.1000.6h.w1.53x21.dat.gz http://www.cru.uea.ac.uk/cru/data/ncep/window1/ 6hourly /pressure/hgt/hgt1000_6h cru:FormatType hgt.1958.1000.6h.w1.53x21.dat.gz ASCII cru:DataParameter hgt.1958.1000.6h.w1.53x21.dat.gz GeopotentialHeight_AtPressure rdfs:comment, hgt.1958.1000.6h.w1.53x21.dat.gz 6 - Hourly GeopotentialHeight at 1000mb rdfs: domain hgt.1958.1000.6h.w1.53x21.dat.gz cru:Height rdf:type cru:Height#genid2 Rdf:Seq rdf:_1 cru:Height#genid2 Compressed rdf:_2 cru:Height#genid2 Kilobyte rdf:_3 cru:Height#genid2 2593 cru:size hgt.1958.1000.6h.w1.53x21.dat.gz cru :Height#genid2 rdf:type cru:Height#genid3 rdf:Seq rdf:_1 cru:Height#genid3 Frequency rdf:_2 cru:Height#genid3 Hour rdf:_3 cru:Height#genid3 6 cru:observation hgt.1958.1000.6h.w1.53x21.dat.gz cru:Height#genid3 rdf:type cru:Height#gen id4 rdf#Seq rdf:_1 cru:Height#genid4 TimeRange rdf:_2 cru:Height#genid4 Year rdf:_3 cru:Height#genid4 1958 cru:period hgt.1958.1000.6h.w1.53x21.dat.gz cru:Height#genid4 relational table RDF instance “MetInstance” converted into a
Retrieval of Semantic Information • RTE relies on SQL query processor to extract relevant triples • Semantics Retrieval Language (SRL) prototype developed • SQL-similar syntax Example DESCRIBE RESOURCE “hgt.1958.1000.6h.w1.53x21.dat.gz”; • Processing of the above SRL query • Step 1: Transform the query into a standard SQL sentence and submit • it to DB2 • SELECT * FROM MetInstance • WHERE RESOURCE=“hgt.1958.1000.6h.w1.53x21.dat.gz”;
Retrieval of Semantic Information • Step 2 • RTE applies the rules to generate XML as the output: • 1. Extract name space prefixes and generate XML namespace node. • 2. For all (real) atomic value create XML elements with Property • values as XML elements • 3. For all non-atomic values, create XML nodes as sub-elements of • the resources where they appear as values • 4. Ensure that if the node type is Seq container, all elements must be • ordered
Conclusion • RTE-DBS approach enables querying and retrieval of semantic information from scientific data files available on the Web • Such retrieved information can be further processed by a machine or used by humans • Future work will be based on building a user interface into RTE to maintain individual triples to prevent removal of triples who are nodes • A method for for identifying data semantics of data sets, based on reasoning over semantic cases will be developed