610 likes | 704 Views
ONTOLOGY-DRIVEN DISCOVERY OF SCIENTIFIC COMPUTATIONAL ENTITIES. Pearl Brazier Department of Computer Science University of Texas-Pan American November 2, 2010. Outline. Motivation Research Goals and Objectives Significance of Contribution Background Information and Context
E N D
ONTOLOGY-DRIVEN DISCOVERY OF SCIENTIFIC COMPUTATIONAL ENTITIES Pearl Brazier Department of Computer Science University of Texas-Pan American November 2, 2010
Outline • Motivation • Research Goals and Objectives • Significance of Contribution • Background Information and Context • Research Efforts • GEO-SEED Architecture • Scientific Computational Entity Discovery Ontology • RDF Repository • Usability and Performance Studies • Conclusions and Future Work November 2, 2010 2
Motivation: Geosciences Web Services • Web contains many scientific resources • Scientific data (sharing datasets, experimental results) • Geosciences web services metadata • Resources are currently shared via • publication • human contact • web portals • Metadata annotations needed to • assist collaboration • allow machine processing November 2, 2010 3
Research Goal To investigate an ontology‐driven discovery approach that can be distributed on the Web and that can support the elicitation, documentation, and registration of computational entities and other resources November 2, 2010 4
Cyberinfrastructure/e-science • Supports building new types of scientific and engineering knowledge environments and organizations • Supports modern in-silico experiments that can lead to important scientific discoveries through scientific data repositories, semantic mediation services, and scientific workflows • Describes computationally intensive science, which is carried out in highly distributed network environments, or science that uses immense data sets 6
Web Technologies-1 • Web 2.0 Technologies • Includes social networks and Wiki technologies • Used by humans • Semantic Web • Allows machines to understand meaning of information on the Web • Used by machines and automated agents • Supports core standards such as RDF, SPARQL, OWL <?xml version="1.0" encoding="utf-8" ?> - <rdf:RDFxmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:s="http://www.schemaweb.info/schemas/meta/rdf/" xmlns:foaf="http://xmlns.com/foaf/0.1/"> - <s:Schema rdf:about="http://www.schemaweb.info/schema/SchemaDetails.aspx?id=62"> <s:id>62</s:id> <s:name>Wine Ontology</s:name> <s:description xml:lang="en-gb">Sample ontology used in the OWL specification documents.</s:description> <s:namespace>http://www.w3.org/TR/2003/CR-owl-guide-20030818/wine#</s:namespace> < </s:Schema> </rdf:RDF> 7
Web Technologies-2 • Ontologies • Captures concepts and relationships among them • Provides standard vocabulary and classifications • Web Services Metadata • WSDL • WSDL-S • OWL-S and SWSO 8
Objective 1 Define an ontology for scientific computational entities that supports the development of a repository and a system that can retrieve computational entities. Activities: • Define use cases for the ontology. • Determine the essential elements of an ontology that documents the features and relationships used to identify computational entities and distinguish one from another. 10
Objective 2 Define an architecture that supports an ontology-driven approach. Activities • Investigate efficient approaches for storing information. • Investigate the relationships of registration, annotation, and knowledge extraction. • November 2, 2010 11
Objective 3 Evaluate the usability of a system based on the ontology-driven discovery approach. Activities • Design and implement a prototype system based on the ontology-driven discovery approach. • Conduct a usability study of the prototype system with computer scientists and geoscientists (novices and experts). 12
Objective 4 Evaluate the performance of a system based on the ontology-driven approach. Activities • Design the schema and implement a relational RDF repository that supports efficient storage and querying of documented scientific computational entities based on the ontology-driven discovery approach. • Run a simulation to analyze the performance of the system. 13
Research Contributions and Significance • Designed new Scientific Computational Entity Discovery Ontology • More comprehensive and domain specific than existing discovery ontologies, enabling the scientist to more easily share their computational entities • Created a novel design for organizing the RDF data • Uses SPARQL queries for the RDF representation • Supports more efficient query evaluation • Developed GEO-SEED wiki using Web 2.0 and Semantic Web Technologies • Supports discovery and sharing of scientific computational entities 14
GEO-SEED Scientific Computational Entity Discovery Ontology 18
Schema Mapping Strategies • Five approaches to generate database schemas: • Schema-Oblivious • Schema-Aware • Data Driven • User-Customizable • Hybrid 28
Schema-Oblivious (Triple Table) Triple predicate object subject <:WS1> <rdf:type> <:WebService> . <:WS1> <:describedBy> <:GP1> . <:WS1> <:describedBy> <:QoSP1> . <:GP1> <rdf:type> <:GeneralProfile> . <:QoSP1> <rdf:type> <:QoSProfile> . <:GP1> <:subject> <:Gridding> . <:GP1> <:author> "Pearl Brazier" . <:QoSP1> <:trust> "5" . <:QoSP1> <:availability> “0.9" . <:QoSP1> <:overallRating> “4" . Extracted RDF Triples 29
Schema-Aware (Property Table) Property_type <:WS1> <rdf:type> <:WebService> . <:WS1> <:describedBy> <:GP1> . <:WS1> <:describedBy> <:QoSP1> . <:GP1> <rdf:type> <:GeneralProfile> . <:QoSP1> <rdf:type> <:QoSProfile> . <:GP1> <:subject> <:Gridding> . <:GP1> <:author> "Pearl Brazier" . <:QoSP1> <:trust> "5" . <:QoSP1> <:availability> “0.9" . <:QoSP1> <:overallRating> “4" . Property_author Property_describedBy 30
User Customizable (Profile Tables) GeneralProfile <:WS1> <rdf:type> <:WebService> . <:WS1> <:describedBy> <:GP1> . <:WS1> <:describedBy> <:QoSP1> . <:GP1> <rdf:type> <:GeneralProfile> . <:QoSP1> <rdf:type> <:QoSProfile> . <:GP1> <:subject> <:Gridding> . <:GP1> <:author> "Pearl Brazier" . <:QoSP1> <:trust> "5" . <:QoSP1> <:availability> “0.9" . <:QoSP1> <:overallRating> “4" . QoSProfile 31
SPARQL Query Retrieves the quality-of-service descriptors of a Web service :WS1: Select ?profile ?pre ?obj Where { :WS1 :describedBy ?profile . ?profile rdf:type :QoSProfile . ?profile ?pre ?obj . } 32
Query Complexity Comparison • Triple Table: Two Joins Triple Triple Triple. Note: Tables can get large • Property Table: Two Joins describedBy type (trust ⋃ reliability ⋃ availability ⋃ ⋯⋃ userReview) Note: Union result is not indexed 33
OR Many Joins: (describedBy type trust) ⋃(describedBy type reliability)⋃ (describedBy type availability) ⋃⋯⋃(describedBytype userReview) Note: Indexed but re-computes the (describedBy type) many times • Profile Table: One Join describedBy QosProfile 34
Empirical Comparison of the Three Approaches • Created a GEO-SEED dataset that describes 10,000 web services. • Defined six common queries using SPARQL • Ran queries on PC with 3.00 GHz Intel Core 2 CPU, 4GB RAM, 750 GB disk space running • Evaluated the execution time 35
Performance Test Queries • Find web services that implement a computational entity with the name “gridding” • Find web services, along with their user reviews and overall quality-of-service ratings, that implement a computational entity “gridding” • Find web services that implement a computational entity with the name “gridding” and that have trust ≥ 4 and availability ≥ 0.8 ratings • Retrieve a general profile of a particular Web service. • Retrieve a quality-of-service profile of a particular Web service • Retrieve quality-of-service profiles of two Web services 36
Overview • GEO-SEED consists of two components: Wiki and RDF repository • Wiki serves as a collaborative environment for knowledge sharing of geosciences web services. • Provides interface for human interaction • RDF repository serves as a meta-data database readily accessible by machines and automated agents 39
conclusions 40
Conclusions • GEO-SEED architecture supports a new generation Web portal • metadata repository for scientific computational entities in geosciences for sharing and discovery • Ontology-driven profiles approach supports usability for • Humans • Machines • Unique User-customizable profile table design for storing the RDF data allows efficient queries of large metadata collections 41
Future Work • Explore user-guided metadata extraction algorithms for the Wiki • Explore coupling GEO-SEED with an existing SWFMS • Extend the project to support annotation and discovery of scientific workflows and datasets in geosciences • Refine the prototype to address user interface issues 42
Thank You!Questions? Spring 2010 Summer 2010 Cactus from El Paso 2005 43
UTEP ComputerScienceDissertationDefense Presentations • Abraham, John, Brazier, Pearl, Chebotko, Artem, Jaime Navarro, and Piazza, Anthony, "Distributed Storage and Querying Techniques for a Semantic Web of Scientific Workflow Provenance", in Proc. of the 7th IEEE International Conference on Services Computing (SCC'10), Miami, Florida, USA, July 5-10, 2010. Acceptance rate: 18%Download • Brazier, Pearl, Chebotko, Artem, Gonzalez, Eric, Kashlev, Andrey, and Piazza, Anthony, "Supporting Geosciences Web Services Metadata Management and Discovery", in Proc. of the 7th IEEE International Conference on Services Computing (SCC'10), Miami, Florida, USA, July 5-10, 2010. • Brazier, Pearl, Chebotko, Artem, Gates Ann Q., Piazza, Anthony, and Salayandia, Leonardo. (2009) “Web 2.0 and Semantic Web Portal for Annotation and Discovery of Web Services in Geosciences”, Presented and published in 2009 International Conference on Semantic Web and Web Services (SWWS 2009), Las Vegas, Nevada, July 13-16, CSREA Press, USA. • Brazier, Pearl, Chebotko, Artem, Gates Ann Q., and Salayandia, Leonardo. (2009) “GEO-SEED: A Metadata Repository for Geosciences Web Service Discovery”, Presented and published IEEE 2009 Third International Workshop on Scientific Workflows (SWF 2009), Los Angeles, CA., July 6-10. 44
UTEP DissertationDefense Research Efforts: Usability study 46
UTEP DissertationDefense Usability Study Overview • 31 Invitations sent to Geology faculty, students, Computer Science faculty and students (17 Responses) • Steps in study: • Register • Login • Submit a computational entity • Search for a computational entity • Add a user rating for an entity • Complete a survey rating the experience 47
UTEP DissertationDefense Overall GEO-SEED would be a useful tool for sharing 48
UTEP DissertationDefense Other Usability Study Results 49
UTEP DissertationDefense Descriptive Statistical Analysis • BINOM-DIST • Grouped responses into two group • Strongly Disagree + Disagree • Agree + Strongly Agree • Compared p-values for < 0.05 • t Test • Used 4 groups • Strongly Disagree + Disagree + Agree + Strongly Agree • Compared p-values for < 0.05 50