510 likes | 689 Views
Semantic Web Technologies and their usages in BMI. Jing Liu CSE5095: Biomedical Informatics Spring 2011. jing.liu@engr.uconn.edu Computer Science & Engineering Department University of Connecticut Storrs, CT 06269. Outline. Semantic Web Overview Semantic Web Technologies RDF RDFS OWL
E N D
Semantic Web Technologies and their usages in BMI Jing Liu CSE5095: Biomedical Informatics Spring 2011 jing.liu@engr.uconn.edu Computer Science & Engineering Department University of Connecticut Storrs, CT 06269
Outline • Semantic Web Overview • Semantic Web Technologies • RDF • RDFS • OWL • SPARQL • SWRL (RIF) • Biomedical Informatics with Semantic Web • Translational Research with Semantic Web • Semantic PHRs • Knowledge-Driven Querying of Biomedical Data
Semantic Web Overview Search articles written by “Tim Berners-Lee” • Example: why the Semantic Web? Returns millions of results, most of which will cite or refer to him
Semantic Web Overview • What is Semantic Web? Essentially, the Semantic Web is a web of data. • It is about two things: • It's about common formats for integration and combination of data drawn from diverse sources. • It is also about language for recording how the data relates to real world objects. • A set of technologies that supports identifying, representing, and reasoning across a wide of range of data.
Semantic Web Overview Impact on areas: • Information management and discovery tools • Digital Libraries • Support for interaction between virtual communities and collaborations • E-learning methods and tools
Semantic Web Stack User Interface and Applications Trust Proof Unifying Logic Rules: SWRL/RIF Querying: SPARQL Ontologies: OWL Cryptography Taxonomies: RDFS Data Exchange: RDF Syntax: XML Identifiers: URI Character Set: UNICODE
Semantic Web technologies - RDF • RDF stands for Resource Description Framework. • RDF is a language for representing: • information about resources in the World Wide Web. • metadata about Web resources. • Resources are things that can be identified on the Web, even when they cannot be directly retrieved on the Web. • RDF can be processed and exchanged by applications. • W3 Recommendations for RDF is at: http://www.w3.org/RDF/
Identification and description in RDF • RDF identifies resources using URIs • It may be a URL, but not always • Anything that can be named via a URI is a resource • Resources are described in terms of simple properties and property values. • A property is a resource that has a name. - e.g. Author, Title, Mailbox • A property value is the value of the property. - e.g. mailto:em@w3.org is the value of mailbox property - A property value can be another resource
RDF statements • RDF is intended to provide a simple way to make statements about Web resources • Each statement, which is also called “triple”, consists of three parts: • Subject: the thing the statement describes. • Predicate: a specific property of the thing the statement describes. • Object: the thing the statement says is the value of this property. • Statements can be represented in • RDF Graph: illustrates RDF’s conceptual model • RDF/XML: an XML syntax for writing down and exchanging RDF graphs • Notation 3
RDF Example (1) “there is a Person identified by http://www.w3.org/People/EM/contact#me, whose name is Eric Miller, whose email address is em@w3.org, and whose title is Dr."
RDF Example (2) <?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:contact="http://www.w3.org/2000/10/swap/pim/contact#"> <contact:Person rdf:about="http://www.w3.org/People/EM/contact#me"> <contact:fullName>Eric Miller</contact:fullName> <contact:mailbox rdf:resource="mailto:em@w3.org"/> <contact:personalTitle>Dr.</contact:personalTitle> </contact:Person> </rdf:RDF> RDF/XML Describing Eric Miller:
Semantic Web technologies - RDFS • RDFS stands for RDF Schema. • An extension of RDF. • RDF Schema provides a higher level of abstraction than RDF. • Describes classes • Describes properties • Describes relationships between classes and properties • It allows resources to be defined as instances of one or more classes. • Classes can be organized in a hierarchical fashion. • RDFS provides important semantic capabilities that are used by enhanced semantic languages like DAML, OIL and OWL.
RDFS: Describing Classes • To say that ex:MotorVehicle is a class, write: ex:MotorVehicle rdf:type rdfs:Class . • To create an instance of ex:MotorVehicle, write: exthings:companyCar rdf:type ex:MotorVehicle . • Convention: • class names start with an uppercase letter • property and instance names are lowercase • A resource may be an instance of more than one class. A Motor Vehicle Class
RDFS: Defining Subclasses We might want to represent various specialized kinds of motor vehicle: ex:Van rdf:type rdfs:Class . ex:Truck rdf:type rdfs:Class . These statements only describe the individual classes. If we want to indicate their special relationship to class ex:MotorVehicle: ex:Van rdfs:subClassOf ex:MotorVehicle . ex:Truck rdfs:subClassOf ex:MotorVehicle .
Vehicle Hierarchy Example This schema could also be described by the triples: ex:MotorVehicle rdf:type rdfs:Class . ex:PassengerVehicle rdf:type rdfs:Class . ex:Van rdf:type rdfs:Class . ex:Truck rdf:type rdfs:Class . ex:MiniVan rdf:type rdfs:Class . ex:PassengerVehicle rdfs:subClassOf ex:MotorVehicle . ex:Van rdfs:subClassOf ex:MotorVehicle . ex:Truck rdfs:subClassOf ex:MotorVehicle . ex:MiniVan rdfs:subClassOf ex:Van . ex:MiniVan rdfs:subClassOf ex:PassengerVehicle .
Vehicle Hierarchy in RDF/XML <?xml version="1.0"?> <!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xml:base="http://example.org/schemas/vehicles"> <rdf:Description rdf:ID="MotorVehicle"> <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/> </rdf:Description> <rdf:Description rdf:ID="PassengerVehicle"> <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/> <rdfs:subClassOf rdf:resource="#MotorVehicle"/> </rdf:Description> …… <rdf:Description rdf:ID="MiniVan"> <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/> <rdfs:subClassOf rdf:resource="#Van"/> <rdfs:subClassOf rdf:resource="#PassengerVehicle"/> </rdf:Description> </rdf:RDF>
An Instance of ex:MotorVehicle Two methods to create an instance of ex:MotorVehicle: <?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ex="http://example.org/schemas/vehicles"> <rdf:Description rdf:ID="companyCar"> <rdf:type rdf:resource="http://example.org/schemas/vehicles#MotorVehicle"/> </rdf:Description> <ex:MotorVehicle rdf:ID="anotherCar"> … </ex:MotorVehicle> </rdf:RDF>
RDFS: Describing Properties • All properties in RDF are described as instances of class rdf:Property. e.g. exterms:weightInKg rdf:type rdf:Property . • rdfs:range is used to indicate that the values of a particular property are instances of a designated class. e.g. ex:Person rdf:type rdfs:Class . ex:author rdf:type rdf:Property . ex:author rdfs:range ex:Person . • rdfs:domain is used to indicate that a particular property applies to a designated class. e.g. ex:Book rdf:type rdfs:Class . ex:author rdf:type rdf:Property . ex:author rdfs:domain ex:Book . • rdfs:subPropertyOf is used to define a property hierarchy.
Limitations of RDFS • No standard for expressing primitive data types such as integer, etc. All data types in RDF/RDFS are treated as strings. • No standard for expressing relations of properties (unique, transitive, inverse etc.) • No standard for expressing whether enumerations are closed. • No standard to express equivalence, disjointedness etc. among properties.
Ontologies • RDFS is useful, but does not solve all possible requirements. • Complex applications may have more requirements: • characterization of properties • identification of objects with different URIs • disjointness or equivalence of classes • construct classes, not only name them • can a program reason about some terms • and more…
Semantic Web technologies - OWL • OWL stands for Web Ontology Language. • The current version of OWL, also referred to as “OWL 2”, was published in 2009. • OWL is used to represent rich and complex knowledge about things, groups of things, and relations between things. • The three sublanguages of OWL: • OWL Lite • OWL DL • OWL Full • In addition to RDF/RDFS tags, it also allows us to express equivalence, identity, difference, inverse, and transivity.
3 Dialects in OWL Lite DL Full • OWL Full: • an extension of RDF • allows for classes as instances, modification of RDF and OWL vocabularies • OWL DL: • the part of OWL Full that fits in the Description Logic framework • known to have decidable reasoning • OWL Lite: • a subset of OWL DL • easier for frame-based tools to transition to • easier reasoning
Two Syntaxes for OWL • RDF/XML documents • OWL is part of the Semantic Web • OWL can be an extension of RDF • RDF applications can parse OWL • Abstract syntax • easier to read and write manually • corresponds more closely to Description Logics and Frames
How is OWL Used • Build an ontology • Create the ontology • Name classes and provide information about them • Name properties and provide information about them • State facts about a domain • Provide information about individuals • Reason about ontologies and facts • Determine consequences of what was built and stated
Creating Ontologies • Information in OWL is generally in an ontology • Ontology- “a branch of metaphysics concerned with the nature and relations of being” • An ontology determines what is of interest in a domain and how information about it is structured • An OWL ontology is just a collection of information, generally mostly information about classes and properties • Ontology([name] ...) • Ontologies can include (import) information from other ontologies
OWL Components • What is an Instance? • An instance is an object. It corresponds to a description logic individual. • What is a Class? • e.g., person, pet, car • a collection of individuals (object, things, . . . ) • a way of describing part of the world • an object in the world (OWL Full) • What is a Property? • e.g., has father, has pet • a collection of relationships between individuals (and data) • a way of describing relationships between individuals
Example Ontology Class(pp:old+lady complete intersectionOf(pp:elderly pp:female pp:person)) Class(pp:old+lady partial intersectionOf( restriction(pp:has_pet allValuesFrom(pp:cat)) restriction(pp:has_pet someValuesFrom(pp:animal)))) This ontology represents:” Every old lady must have a pet cat.”
Semantic Web technologies - SPARQL • SPARQL stands for Simple Protocol and RDF Query Language. • A protocol: • A way of communication between parties that run SPARQL queries. • Defining a way of invoking the service. • Bindings of a transport protocol for that goal. • A standard RDF Query Language (QL) • A standard query language in the form of expressive query against the RDF data model. • Data access language. • Graph patterns. • Powerful than XML queries in some aspects.
Semantic Web technologies - SWRL • SWRL stands for Semantic Web Rule Language. • SWRL is intended to be the rule language of the Semantic Web. • SWRL includes a high-level abstract syntax for Horn-like rules. • All rule are expressed in terms of OWL concepts (classes, properties, individuals). • A proposal to combine ontologies and rules: • Ontologies: OWL-DL • Rules: RuleML • Can work with reasoners.
SWRL Human Readable Syntax • In the SWRL syntax, a rule has the form: antecedent => consequent • Both antecedent and consequent are conjunctions of atoms written a1 ∧ ... ∧ an. • Variables are indicated using the standard convention of prefixing them with a question mark. • Build-in relations that are functional can be written in functional notation. For example: parent(?x,?y) ∧ brother(?y,?z) ⇒ uncle(?x,?z) ?x = op:numeric-add(3,?z)
SWRLTab • A development environment for working with SWRL rules in Protégé-OWL. • It supports the editing and execution of SWRL rules. • Extension mechanisms to work with third-party rule engines. • Mechanisms for users to define built-in method libraries. • Supports querying of ontologies.
Translational Research with Semantic Web • Biomedical researchers and health care practitioners work together to exchange ideas, information, and knowledge across organization, governance, socio-cultural, political, and national boundaries. • A significant barrier to translational research is the lack of uniformly structured data across related biomedical domains. • In applying research to cure and prevent diseases, an integrated understanding across subspecialties becomes essential.
How can the Semantic Web help BMI? (1) • The global scope of identifiers decreases the complexities caused by the proliferation of local identifiers. • The Semantic Web technologies simplify the management and comprehension of relationships among the data. • RDFS and OWL offer some relief to the burden of understanding data schemas. • A well-designed ontology, the structure itself can help guide users towards its correct use.
How can the Semantic Web help BMI? (2) • RDFS and OWL are flexible, extendable, and decentralized. They support hierarchical relationships. • Data built upon ontologies will be easier to link together than those that use ad-hoc solutions. • Ability to do inference, classification, and consistency checking which will help avoid inappropriate diagnosis and treatment.
HCLSIG Health Care and Life Sciences Interest Group (HCLSIG) • HCLSIG was set up within the framework of World Wide Web Consortium. http://www.w3.org/wiki/HCLSIG • The mission of the Semantic Web for Health Care and Life Sciences Interest Group (HCLSIG) is to develop, advocate for, and support the use of Semantic Web technologies for biological science, translational medicine and health care. • Document use cases to aid individuals in understanding the business and technical benefits of using Semantic Web technologies.
Task Forces and their goals • BioRDF: Converting a number of life sciences data sources into RDF and OWL. • Ontologies: Facilitating creation, evaluation, and maintenance of core vocabularies and ontologies. • Drug safety and efficacy: • Indentifying and addressing challenges • Detecting, examining, and classifying signals of potential drug side-effect and adverse reactions. • Data security and integrity. • Facilitating electronic submissions. • Adaptable clinical pathways and protocols (ACPP): Representing guideline and protocol and reasoning. • Scientific publishing: Collecting publications, applying natural language to scientific text, developing tools.
Semantic PHRs • Current standard structures for PHRs • XML schema of CCR of ASTM • HL7’s CCD • Disadvantages: • XML-based PHR are document-centric-data, whereas health care data usage often is data-centric. • Computation capabilities are not provided. • Semantic PHRs were developed
Semantic PHRs • Develop personal health record ontology to describe the concepts of the domain in which PHRs take place. • The complex elements transformed to OWL classes. • Simple elements transformed to OWL data properities, • Element-attribute relationships transformed to OWL data prosperities. • The relationships transformed to class-to-class relationships To transform the XML schema to OWL-ontology
Semantic PHRs A PHR-ontology
Semantic PHRs • In data storage, instance ontologies are presented by RDF-elements. • XSLT (Extensible Style sheet Language) is used to transforms an XML document to RDF. • Then we can query PHRs by query languages developed for RDF, e.g. by SPARQL.
Semantic PHRs • PHR instance ontology is to organize PHR instances according to the ontology.
Semantic PHRs Summary Transforming an XML document into RDF/XML element.
Knowledge-Driven Querying of Biomedical Data • Martin et al represented an end-to-end knowledge-based system based on Semantic Web Technologies. Martin J. O'Connor, Ravi D. Shankar, Samson W. Tu, Csongor Nyulas, Dave Parrish, Mark A. Musen, Amar K. Das: “Using Semantic Web Technologies for Knowledge-Driven Querying of Biomedical Data”, AIME 2007: 267-276
Knowledge-Driven Querying of Biomedical Data Background: • Biomedical applications have significant knowledge and information management requirements. • Very few of current systems emphasize the knowledge requirements for day-to-day activities. • Schema design of these systems often reflects the operational requirements. • Inconsistencies between knowledge-level concepts in system design and corresponding operational data collected in a deployed system needs to be ovvercome.
Knowledge-Driven Querying of Biomedical Data Limitations in Technologies: • OWL provides limited deductive reasoning capabilities. • Using RDF to store data at back end in biomedical systems is still not practical. • Separation of knowledge and data, which creates a semantic gap.
Knowledge-Driven Querying of Biomedical Data Solution: • Specify the mapping of rows in relational table to triples in an RDF model, which will then be mapped to OWL classes, properties, and individuals. A tool written in Protégé-OWL accomplished this task. • Develop mapping software that works with a query engine to allow queries written in SWRL to use data retrieved from a relational database.
Knowledge-Driven Querying of Biomedical Data Optimization techniques to improve the performance: • Adding built-in annotation ontology. • Re-writing SWRL queries. • Rule base level optimizations. • Standard database optimization techniques.