810 likes | 1.02k Views
Ontology Engineering: Tools and Methodologies. Ian Horrocks <horrocks@cs.man.ac.uk> Information Management Group School of Computer Science University of Manchester. Tutorial Resources. http://www.cs.man.ac.uk/~horrocks/nsd07/. Ontologies. Ontology: Origins and History.
E N D
Ontology Engineering: Tools and Methodologies Ian Horrocks <horrocks@cs.man.ac.uk> Information Management Group School of Computer Science University of Manchester
Tutorial Resources http://www.cs.man.ac.uk/~horrocks/nsd07/
Ontology: Origins and History • In Philosophy, fundamental branch of metaphysics • Studies “being” or “existence” and their basic categories • Aims to find out what entities and types of entities exist
Ontology in Information Science • An ontology is an engineering artefact consisting of: • A vocabulary used to describe (a particular view of) some domain • An explicit specification of the intended meaning of the vocabulary. • Often includes classification based information • Constraints capturing background knowledge about the domain • Ideally, an ontology should: • Capture a shared understanding of a domain of interest • Provide a formal and machine manipulable model
OWL History • Semantic Web led to requirement for a “web ontology language” • set up Web-Ontology (WebOnt) Working Group • WebOnt developed OWL language • OWL based on earlier languages RDF, OIL and DAML+OIL • OWL now a W3C recommendation(i.e., a standard) • OWL is a family of 3 languages: OWL Lite, OWL DL and OWL Full • OIL, DAML+OIL and OWL (DL & Lite) based on Description Logics • Many OWL DL/Lite tools & ontologies • Relatively few OWL Full tools or ontologies
HappyParent´Parent u8hasChild.(Intelligent t Athletic) HappyParent´Parent u8hasChild.(Intelligent t Athletic) HappyParent´Parent u8hasChild.(Intelligent tAthletic) HappyParent´Parentu8hasChild.(Intelligentt Athletic) What Are Description Logics? • A family of logic based Knowledge Representation formalisms • Descendants of semantic networks and KL-ONE • Describe domain in terms of concepts (classes), roles (properties, relationships) and individuals • Operators allow for composition of complex concepts • Names can be given to complex concepts, e.g.: HappyParent´Parent u8hasChild.(Intelligent t Athletic)
Animal Mat Black IS-A Felix has-color Cat IS-A IS-A sits-on Why (Description) Logic? • OWL exploits results of 15+ years of DL research • Well defined (model theoretic) semantics [Quillian, 1967]
Why (Description) Logic? • OWL exploits results of 15+ years of DL research • Well defined (model theoretic) semantics • Formal properties well understood (complexity, decidability) I can’t find an efficient algorithm, but neither can all these famous people. [Garey & Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, 1979.]
Why (Description) Logic? • OWL exploits results of 15+ years of DL research • Well defined (model theoretic) semantics • Formal properties well understood (complexity, decidability) • Known reasoning algorithms
Pellet CEL Why (Description) Logic? • OWL exploits results of 15+ years of DL research • Well defined (model theoretic) semantics • Formal properties well understood (complexity, decidability) • Known reasoning algorithms • Implemented systems (highly optimised) KAON2
Why the Strange Names? • Description Logics are a family of KR formalisms • Mainly distinguished by available operators • Available operators indicated by letters in name, e.g., S : basic DL (ALC) plus transitive roles (e.g., ancestor R+) H : role hierarchy (e.g., hasDaughter v hasChild) O : nominals/singleton classes (e.g., {Italy}) I : inverse roles (e.g., isChildOf ´ hasChild–) N : number restrictions (e.g., >2hasChild, 63hasChild) • Basic DL + role hierarchy + nominals + inverse + NR = SHOIN • The basis for OWL-DL • SHOIN is very expressive, but still decidable (just) • Decidable we can build reliable tools and reasoners
Why (Description) Logic? • Foundational research was crucial to design of OWL • Informed Working Group decisions at every stage, e.g.: • “Why not extend the language with feature x, which is clearly harmless?” • “Adding x would lead to undecidability - see proof in […]”
Class/Concept Constructors • C is a concept (class); P is a role (property); x is an individual name • XMLS datatypes as well as classes in 8P.C and 9P.C • Restricted form of DL concrete domains
Knowledge Base / Ontology • A TBox is a set of “schema” axioms (sentences), e.g.: {Parent v Person u>1hasChild, HappyParent´Parent u8hasChild.(Intelligent t Athletic)} • An ABox is a set of “data” axioms (ground facts), e.g.: {John:HappyParent, John hasChild Mary} • An OWL ontology is just a SHOIN KB
OWL RDF/XML Exchange Syntax <owl:Class> <owl:intersectionOf rdf:parseType=" collection"> <owl:Class rdf:about="#Parent"/> <owl:Restriction> <owl:onProperty rdf:resource="#hasChild"/> <owl:allValuesFrom> <owl:unionOf rdf:parseType=" collection"> <owl:Class rdf:about="#Intelligent"/> <owl:Class rdf:about="#Athletic"/> </owl:unionOf> </owl:allValuesFrom> </owl:Restriction> </owl:intersectionOf> </owl:Class> E.g., Parent u8hasChild.(Intelligent t Athletic):
Why Ontology Reasoning? • Given key role of ontologies in many applications, it is essential to provide tools and services to help users: • Design and maintain high quality ontologies, e.g.: • Meaningful— all named classes can have instances
Why Ontology Reasoning? • Given key role of ontologies in many applications, it is essential to provide tools and services to help users: • Design and maintain high quality ontologies, e.g.: • Meaningful— all named classes can have instances • Correct— captures intuitions of domain experts
Why Ontology Reasoning? • Given key role of ontologies in many applications, it is essential to provide tools and services to help users: • Design and maintain high quality ontologies, e.g.: • Meaningful— all named classes can have instances • Correct— captures intuitions of domain experts • Minimally redundant— no unintended synonyms Banana split Banana sundae
Why Ontology Reasoning? • Given key role of ontologies in many applications, it is essential to provide tools and services to help users: • Design and maintain high quality ontologies, e.g.: • Meaningful— all named classes can have instances • Correct— captures intuitions of domain experts • Minimally redundant— no unintended synonyms • Answer queries, e.g.: • Find more general/specific classes • Retrieve individuals/tuples matching a given query
e-Science • E.g., Open Biomedical Ontologies Consortium (GO, MGED) • Used, e.g., for “in silico” investigations relating theory and data • E.g., relating data on phosphatases to (model of) biological knowledge
CentralSulcus Frontal Lobe Parietal Lobe Occipital Lobe Temporal Lobe Lateral Sulcus Medicine • Building/maintaining terminologies such as Snomed, NCI, Galen and FMA • Used, e.g., for semi-automated annotation of MRI images
Organising Complex Information • E.g., UN-FAO, NASA, Ordnance Survey, General Motors, Lockheed Martin, …
Organising Complex Information • E.g., UN-FAO, NASA, Ordnance Survey, General Motors, Lockheed Martin, …
OWL Experiences and Directions • Workshop at ESWC’07 (Innsbruck, Austria) • Brings together users, implementors and researchers • Submissions include: • Enterprise Integration (Mitre) • Product development (Lockheed Martin) • Role based access control (NASA) • Healthcare (SNOMED) • Agriculture and fisheries (UN Food & Agriculture Organization) • Oral Medicine (Chalmers) • …
Ontology Engineering Tasks • Typical tasks in Ontology Engineering: • author concept descriptions • refine the ontology • manage errors • integrate different ontologies • (partially) reuse ontologies • These tasks are highly challenging; need for: • tool & infrastructure support • design methodologies
Tools and Infrastructure • Editors/environments • Protégé, Swoop, TopBraid Composer, Construct, Ontotrack, …
CEL Tools and Infrastructure • Editors/environments • Oiled, Protégé, Swoop, Construct, Ontotrack, … • Reasoning systems • Cerebra, FaCT++, Kaon2, Pellet, Racer, … Pellet KAON2
Entity Endurant Perdurant Quality Substantial Event Stative Achievement Accomplishment Tools and Infrastructure • Editors/environments • Oiled, Protégé, Swoop, Construct, Ontotrack, … • Reasoning systems • Cerebra, FaCT++, Kaon2, Pellet, Racer, … • Design methodologies • Modularity, foundational ontologies, etc.
Development Environments • Most widely used free to download toos are • Protégé (Stanford / Manchester) -- be sure to get v4.x • Swoop (UMD / Clark & Parsia) • Commercial tools include • TopBraid, RacerPro, … • Facilities typically include • Range of display modes and editing features • Visualisation • Consistency and subsumption checking • Useful extras may include • Debugging and explanation • Repair • Integration and/or partitioning http://code.google.com/p/swoop/ http://protege.stanford.edu/
Demo Ontologies • GALEN • http://www.cs.man.ac.uk/~horrocks/OWL/Ontologies/galen.owl • NCI • http://www.mindswap.org/2003/CancerOntology • Tambis • http://www.cs.man.ac.uk/~horrocks/OWL/Ontologies/tambis.owl
GALEN • Ontology about medical terms and surgical procedures. • Work started in the 90s within the OpenGALEN project. • Main applications: • Integration of clinical records, and • decision support. • GALEN: • is very large (~35,000 concepts), • is fairly expressive (SHIF description logic), • has not been classified yet by any DL reasoner • We will look at a smaller version, which: • is still large (~3,000 concepts), • is similarly expressive as full GALEN, • was first classified by the FaCT system.
GALEN: The Ontology at a Glance • Size: • ~ 3,000 classes • ~ 500 object properties • no individuals or datatypes • Expressivity • ~350 General Concept Inclusion Axioms (GCIs). • Concept constructors: • Conjunction (intersectionOf) • Existential restrictions (someValuesFrom) • 150 functional properties • 26 transitive properties
GALEN: The (Unclassified) Hierarchies • The class hierarchy: • Number of subsumption relations: 1,978 • Maximum depth of the tree: 13 • No multiple inheritance • The property hierarchy: • 4 properties with multiple inheritance
GALEN: Concept definitions and GCIs Concept definition • Axiom of the form A ´ C with: • A a concept name • C a (possibly complex) concept • A definition assigns a nameAto a complex concept C Some examples: LungPathology ´ Pathology u9 locativeAttribute.Lung RenalTransplant ´ Transplanting u9 actsOn.Kindney
GALEN: Concept definitions and GCIs Inclusion axioms: • Axioms of the form A v C: • A is a concept name • C is a possibly complex concept • Represent an incomplete (``partial’’) definition • Examples: XRayMachine v ImagingDevice Candida v Fungus u9 hasFunction.AerobicMetabolicProcess • In GALEN, some of these can be very complex: • check out the definitions of Knee Joint and Kidney!
GALEN: Concept definitions and GCIs General Concept Inclusion Axioms (GCIs) • Axioms of the form C ´ D • C,D can be complex • May describe general (background) knowledge about the ontology Examples: Secretion u9 actsSpecificallyOn.Leucocidin v 9 isFunctionOf.StraphilococcusAureus Transport u9 actsOn.Glucose u9 carriesFrom.Blood v 9 carriesTo.Cell
Classifying GALEN Ontology statistics (revisited): • Number of class subsumption relations: 6729 • 1978 of which are “told” and the rest inferred • Maximum depth of the class tree: 15 • As opposed to 13 in the case of the unclassified tree • Classes with multiple inheritance: 408 • All multiple inheritance relations have been inferred! • This was intended in the design of GALEN • Maximum depth of the property tree: 9 • No change with respect to the “told” tree • Properties with multiple inheritance: 4 • Again, no change with respect to the ``told’’ tree Reasoning is mostly performed on classes and not on properties
Modeling Choices • The “upper” part: • Composed of the domain-independent concepts and roles. • Examples: • TopCategory, DomainCategory, GeneralisedStructure… • Shallowly defined (mostly a taxonomy) • The “domain specific” part: • Examples: • Plant, LungPathology, … • Richly defined • Much more than just a taxonomy!
Inferred Knowledge A trivial subsumption: • Why is PathologicalCondition a subclass of DomainCategory? • Simply look at the definition of Pathological Condition! Another example: • Why is PathologicalBehavior a subclass of PathologicalCondition? • Look at the definition of both classes • Notice that Behavior is a subclass of DomainCategory A non-trivial subsumption: • Why is AchalasiaProcesses a PathologicalBodyProcesses?
Classifying GALEN • Simple and multiple inheritance • Focus, for example, on PathologicalBodyProcess • Navigate to its super-classes • Visualisation can be useful • In Swoop we can “Fly the mother ship”!
The NCI Ontology • Huge bio-medical ontology describing the Cancer domain • Maintained by dozens of domain experts • Contains information about: • genes, • diseases, • drugs, • research institutions, … All with a cancer-centric focus
NCI: The Ontology at a Glance • Size: • ~ 30.000 classes • ~ 70 object properties • no individuals or datatypes • Expressivity • Concept constructors: • Conjunction (intersectionOf) • Existential restrictions (someValuesFrom) • Axioms: • Definitions (no GCIs) • Domain and range of properties