400 likes | 507 Views
Ontology Best Practices. A Software Developer’s View. Software Development Today…. Many different software languages and programs Wide variety of domains and architectures Each software language has a single syntax Children are writing software programs
E N D
Ontology Best Practices A Software Developer’s View
Software Development Today… • Many different software languages and programs • Wide variety of domains and architectures • Each software language has a single syntax • Children are writing software programs • Software tools range from simplistic to very powerful • Successful software programs designed by lead architect • Support tools for software development are plentiful • Software development has existed for a few decades
Ontologies in the Semantic Web Today… • Many different ontologies, no two alike • Different syntaxes (Turtle, RDF, etc.) • Ontology nuances are poorly understood, even by “experts” • Very limited inferencing • Inferencing mostly limited to syllogisms • Usually require an ontology specialist and domain expert • Frequently designed by committee • The concept of ontologies has been around for hundreds of years Why can’t we get it right?
Similarities between Ontologies and Software • Ontologies define classes of things, e.g., Person • Ontologies define properties associated with the classes of things • Ontologies allow for Individuals – members of classes with particular property values • Classes inherit information from parent classes • (Object Oriented) Programming defines classes of things • Programming defines properties and methods associated with each class • Programming uses instances of classes with particular property values • Classes inherit information (data and methods) from super-classes
Ontology Example(OWL/XML syntax) <owl:Classrdf:about="#Opera"> <rdfs:subClassOfrdf:resource="#MusicDrama"/> </owl:Class> <owl:ObjectPropertyrdf:ID=“hasComposer"> <rdfs:domainrdf:resource="#MusicDrama" /> <rdfs:rangerdf:resource="#Composer" /> </owl:ObjectProperty> <owl:DatatypePropertyrdf:ID=“numberOfActs"> <rdf:typerdf:resource="&owl;FunctionalProperty" /> <rdfs:domainrdf:resource="#MusicDrama" /> <rdf:rangerdf:resource="&xsd;positiveInteger"/> </owl:DatatypeProperty> … <Opera rdf:ID="Tosca"> <hasComposerrdf:resource="#Giacomo_Puccini"/> <hasLibrettistrdf:resource="#Victorien_Sardou"/> <hasLibrettistrdf:resource="#Giuseppe_Giacosa"/> <hasLibrettistrdf:resource="#Luigi_Illica"/> <premiereDaterdf:datatype="&xsd;date">1900-01-14</premiereDate> <premierePlacerdf:resource="#Roma"/> <numberOfActsrdf:datatype="&xsd;positiveInteger">3</numberOfActs> </Opera>
Software Example (Java) public class Opera extends MusicDrama { private Composer composer; private Set<Librettist> librettistSet = new HashSet<Librettist>(); private Date premiereDate; private City premierePlace; private intnumberOfActs; public void setComposer(Composer theComposer) { composer = theComposer; } public void addLibrettist(Librettist librettist) { librettistSet.add(librettist); } public void setNumberOfActs(intnumActs) { if (numActs < 1) { throw new IllegalArgumentException(“Number of Acts must be positive”); } numberOfActs = numActs; } … }
Challenge 1: Ontology Syntax & Language • As of OWL2, there are 5 different syntaxes for OWL • 3 different OWL levels (Lite, DL, Full), plus OWL2 • Ontologies can be specified in any level, any syntax • But tools are expected to handle them • Conversely, there is one Java language and syntax, one Javascript syntax, etc. • Some earlier languages, e.g., Fortran, had different versions, but normalized to a single one • Java compiler or tools not expected to handle C programs or Javascript programs Reduce the number of languages, levels Normalize on a common language Language Standardization leads to wider adoption
Challenge 2: Ontology Inclusion <rdf:RDF xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns# xmlns:rdfs=http://www.w3.org/2000/01/rdf-schema# xmlns:owl=”http://www.w3.org/2002/07/owl#” xmlns:dc=“http://dublincore.org/documents/dcmi-namespace/” xmlns:foaf=“http://xmlns.com/foaf/0.1/”> • Namespace reference NOT the same as importing • To import another ontology, use <owl:import…> • Further confusion: Namespace URLs do not necessarily indicate location of ontology definition • DublinCore (dc) links to Namespace Policy Document • FOAF links to Vocabulary Specification • No well-developed system or suite of upper level ontologies • Cannot easily find a basic ontology in a particular domain • Lots of reinventing of the wheel… • Increases cost of implementation • Barrier to Entry • Limits rate of adoption
Challenge 2: Ontology Inclusion, continued • Conversely, importing references and definitions is simple in programming languages: package ontology.example; import java.text.SimpleDateFormat; import java.util.*; • Package declaration defines location of defined class; imports define path to imported class definitions • Repositories for common libraries enable reuse (Apache, Sourceforge, etc.) • Could be better organized Make importing consistent, easier Use path to ontology definition as namespace URL Create Repositories for upper-level ontologies
Challenge 3: Poor Documentation • OWL standard stretched across many documents • Written by/for standards committee • W3C documents more focused on syntactic correctness than usefulness • Confusing and intimidating to novices • No consistent examples • Difficult to find OWL2 examples of Property Chain Inclusion in RDF/XML syntax • Multiple techniques for defining the same thing, even within a particular syntax • Leaves developers wondering which one is best • Inhibits learning curve • Affects adoption rate
Challenge 3: Poor Documentation, continued • Conversely, software development has huge support. • Java community: many online tutorials (particularly at the official Java site), examples, etc. • Finding help is easy • Well-defined documentation standards • Consistent style, recommendations • Documentation at multiple levels: • Online tutorials • Code and library documentation (e.g., Javadoc) • Freeware as examples (or solutions) Documentation, Examples severely needed
Challenge 4: Unfriendly Naming Conventions & Style • Some ontologies use incomprehensible class names, property names, or individuals • mesh:A01.378.800.667.430.705 – what is it? • Hinders reuse by related ontologies • Prevents adoption of ontology • Many different ways to define/declare information about a Class • Many different ways to organize contents of ontology • Classes then Properties • Properties related to classes near class definitions
Challenge 4: Unfriendly Naming Conventions & Style • Some ontologies use incomprehensible class names, property names, or individuals • mesh:A01.378.800.667.430.705 – what is it? • mesh:Thumb • Hinders reuse by related ontologies • Prevents adoption of ontology • Many different ways to define/declare information about a Class • Many different ways to organize contents of ontology • Classes then Properties • Properties related to classes near class definitions
Challenge 4: Unfriendly Naming Conventions & Style, continued • Conversely, software developers usually follow style guides • Define naming conventions for classes (nouns, camel case with initial capital) and methods (verb phrases, camel case, initial lowercase) • Classes, methods, and variables should be named to indicate what they are • Established style elements • e.g., the preferred way to use an iterator • Possibly reinforced by code sharing • Posting code publically for help debugging it • Common code organizational style Class & Property names should indicate purpose
Challenge 5: Awkward/Limited Properties • Enumerated Datatypes <owl:DatatypePropertyrdf:ID="tennisGameScore"> <rdfs:range> <owl:DataRange> <owl:oneOf> <rdf:List> <rdf:firstrdf:datatype="&xsd;integer">0</rdf:first> <rdf:rest> <rdf:List> <rdf:firstrdf:datatype="&xsd;integer">15</rdf:first> <rdf:rest> <rdf:List> <rdf:firstrdf:datatype="&xsd;integer">30</rdf:first> • Single Value Properties • Functional Property… or… <owl:Classrdf:ID="Operetta"> <rdfs:subClassOfrdf:resource="#MusicalWork"/> <rdfs:subClassOf> <owl:Restriction> <owl:onPropertyrdf:resource="#hasLibrettist" /> <owl:minCardinalityrdf:datatype="&xsd;nonNegativeInteger">1</owl:minCardinality> </owl:Restriction>
Challenge 5: Awkward/Limited Properties • No way to specify range of values on properties • Defining quantities with units <Measurement> <observedSubjectrdf:resource="#JaneDoe"/> <observedPhenomenonrdf:resource="#Weight"/> <observedValue> <Quantity> <quantityValuerdf:datatype="&xsd;float">59.5</quantityValue> <quantityUnitrdf:resource="#Kilogram"/> </Quantity> </observedValue> <timeStamprdf:datatype="&xsd;dateTime">2003-01-24T09:00:08+01:00</timeStamp> </Measurement> • No way to have values that vary over time • Person hasLocation ??? • Properties of properties would solve this • But it could create as many messes… Make Properties more useful
Challenge 6: Open-World Assumption A curious thing about the ontological problem is its simplicity. It can be put in three Anglo-Saxon monosyllables: 'What is there? ' It can be answered, moreover, in a word--'Everything.‘ - Willard Van OrmanQuine • OWL uses the Open World Assumption (OWA): If a fact cannot be determined, it is undefined • Individuals can potentially belong to multiple classes, even those which should be distinct. • Contrary to normal human thought processes • “A Person cannot be a Car too!” • Implementing distinct classes or property restrictions can be computationally expensive • N distinct classes results in an Order n2 operation • Adding a property restriction on a class creates more classes
Challenge 6: Open-World Assumption • OWA is reasonable for some domains, but not others • Causes severe challenges in inferencing • Some software products allow turning off OWA • But this requires the developer to implicitly know ontology assumptions (OWA or not) • Conversely, Software programs define individuals explicitly as members of particular classes • Cannot be a member of a different class (except superclasses) • Properties (class variables) have default values • Specifically either single or multi-valued Support for exclusive classes, default values Ontology could specify OWA/CWA handling
Challenge 7: The Data Challenge • Ontologies allow for Classes, Properties, Things • Confusing what is an Individual vs. a Class • OWL-Full specifically promotes this confusion! • Some ontologies include large number of Individuals in its OWL file • But we’re building for the Semantic WEB • No one wants to wait 10 minutes to download/access an ontology! • Many ontologies separate the Assertions (Abox) from the Terminology (Tbox) • Abox=individuals, Tbox=classes, properties • But this isn’t part of OWL standard!
Challenge 7: The Data Challenge, continued • Software programs define classes, but create instances of those classes at run-time. • Data typically stored outside of software programs, accessed at run-time • Model-View-Controller software pattern separates Model (the data) from the Control (the logic manipulating the data) and the View (the presentation of the information) • Databases: separate data (tables) from data model (schema) Separate Individuals from Classes and Properties Make Class/Property Definitions web-accessible Don’t put Individuals in web-accessible OWL file Access Individuals through SPARQL
Challenge 8: Inconsistent Reasoners • Different Reasoner implementations yield different results on same ontology • OWA/CWA • Different rule implementations • Performance optimization • Causes ontologies to be dependent on choice of reasoner • If you do not use the same reasoner as the ontology developer, you may not get expected results • Help forums filled with “try this reasoner instead…” • No way for an ontology to specify its reasoner requirements
Challenge 8: Inconsistent Reasoners, continued • In software development, when two different compilers or software versions give different answers, this is a BUG! • Software testers develop test suites to verify proper functionality • Testing typically evaluates as many aspects of the software program as possible Need consistent reasoners Validation suites Document expected behavior Need a mechanism that allows ontologies to define their reasoner requirements Again, lack of standardization will limit adoption
Challenge 9: Ontology Development Tools • Many different ontology development tools • Different tools support different syntaxes, ontology levels • Ontology editors mostly by commercial vendors • Protégé offers an open-source ontology editor • Tools mostly stand-alone applications • NeOn, Top Braid Composer uses Eclipse, but as a standalone application • Tool Installation frequently challenging • Validation tools lacking • Need to validate whether ontology is well-constructed
Ontology Tools Protégé 4 Top Braid Composer Xturtle
Challenge 9: Ontology Development Tools • Software developers have largely migrated to development platforms – particularly Eclipse • Easy to install, with automated updates • Automatically compiles code, validating, as you edit • Freeware • Eclipse also can include plugins for editing XML, HTML, connecting to databases, configuring web servers, and much more • Why do OWL editors stand apart? Integrate ontology editing with other editors Would link ontology to other development tasks
Challenge 9: Inferencing and Reasoning • Ontology inferencing very limited • OWL implies class-superclass inference: “if X is a truck then X is a vehicle” • OWL2 supports object property chains • Limited –only allows particular kinds of chains, inference • Can do: Bob hasSister Jane, therefore Bob hasSibling Jane (hasSister is a subproperty of hasSibling) • Cannot do: Bob hasSibling Jane, AND Jane hasGender Female, therefore, Bob hasSister Jane (property intersection) – hasGender is a DatatypeProperty • Cannot do: Jane is Bob’s sister, therefore Jane hasGender female (mixing Datatype, Object property) • No boolean operations, comparators • Cannot do: Jane hasAge 12; if X hasAge < 18, then X isA Child; therefore Jane isA Child
Challenge 9: Inferencing and Reasoning • Performance challenges with more complex ontologies, rule sets • Challenges of forward & backward reasoning • OWL-Full nearly impossible to bound • Conversely, Software Programs have an unlimited set of ways to enhance information • Only bound by algorithm complexity, designer’s creativity • Software programs automatically support superclass relations Support wider range of inference rules Provide scope for inferencing (no OWL-Full)
Challenge 11: Slow Enhancements to Standards • OWL in Feb 2004; OWL2 in Nov 2009 • Small number of changes • Long cycle for enhancements to standard • Impossible to keep current due to standardization process, formalisms used • Conversely, software languages update frequently • Java: Major release every 18 months (originally) • Open-Source libraries release even faster • Newer versions of software incorporate desired changes quickly • If it doesn’t get out quickly, developers will find alternatives OWL needs a more efficient update/release process
Challenge 12: No Clear Role for Ontology • Software Systems don’t use ontologies to access information • RDF Triple Stores don’t need Ontologies to hold data • Databases use schemas to describe how they store information • Many sites & systems claiming to use ontologies only use it for metadata, not content • Sites: Author, Title, Publish Date, etc. • Systems: artifacts of systems – data, functionality of system • Migration to Microformats Understand HOW, WHERE and WHY to use ontologies Create working systems where ontologies work with software, services, data schemas
Challenge 13: The Eclectic Ontologist • Crafting ontologies is seen as a specialized task • Ontologists rarely appear in Project Team diagrams • When they do, they are frequently isolated from developers • Sometimes isolated from Subject Matter Expert (biggest mistake of all) • Most software developers do not understand ontologies • To be honest, ontologists do not make it easy… • But software developers deal with highly complex systems all the time • Certainly capable of understanding ontologies Break down the barriers to using ontologies Make ontologies easier to use and integrate
Examples of “Challenged” Ontologies: DBpedia • Captures information in RDF form effectively • Ontology, however, is huge • Duplicate, redundant, confusing, or useless properties: • Cambridge has a property for “imagesize” • Cambridge has two values for yearPrecipitationMm • What does “location” property indicate? • World of Warcrafthas “length” property – with over 25 values! • wikiPageUsesTemplate property – who cares? • Demonstrates value, but also danger, of crowd-sourcing information repositories • No central control or curation
Examples of “Challenged” Ontologies: MeSH <owl:Classrdf:about=http://bioonto.de/mesh.owl#A01.047.025> <rdfs:labelrdf:datatype="http://www.w3.org/2001/XMLSchema#string">Abdominal Cavity</rdfs:label> <rdfs:subClassOfrdf:resource="http://bioonto.de/mesh.owl#A01.047"/> </owl:Class> <owl:Classrdf:about="http://bioonto.de/mesh.owl#A01.047.025.600"> <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Peritoneum</rdfs:label> <rdfs:subClassOfrdf:resource="http://bioonto.de/mesh.owl#A01.047.025"/> </owl:Class> • Unfriendly class naming • Also illustrates confusion between data, classes • Strictly a hierarchical categorization of medical terms, not a true ontology • Classes/subclass relationships not correct • Thumb is a subclass of hand, therefore thumbs are hands(!?)
Examples of “Challenged” Ontologies: Cyc <owl:Classrdf:about="Mx8NhB4rcacO5KgcQdidVfCUDYB6bg-SZ292ZXJubWVudCBtZWV0aW5nHiu9ZdiAnCkRsZ2tw3ljb3JwD6Q3NGVhZTkxOC0xN2IxLTQxZDktODUyNy05MGI1NGRlOTBmYzM"> <rdfs:labelxml:lang="en">government meeting</rdfs:label> <Mx4rwLSVCpwpEbGdrcN5Y29ycA xml:lang="en">a existing object type named "governmentmeeting"</Mx4rwLSVCpwpEbGdrcN5Y29ycA> <rdf:typerdf:resource="Mx4rpPHhAOB1EdqAAAACs6hRXg"/> <rdfs:subClassOfrdf:resource="Mx4rvVj27ZwpEbGdrcN5Y29ycA"/> <owl:sameAsrdf:resource="http://sw.cyc.com/concept/Mx8NhB4rcacO5KgcQdidVfCUDYB6bg-SZ292ZXJubWVudCBtZWV0aW5nHiu9ZdiAnCkRsZ2tw3ljb3JwD6Q3NGVhZTkxOC0xN2IxLTQxZDktODUyNy05MGI1NGRlOTBmYzM"/> </owl:Class> • Unfriendly class naming • Tries to represent everything • Bizarre class representations
Summary: Good Ontology Design • Semantic Web is about enabling automated processes to comprehend and process information • Design ontologies for use in tools – not just standalone • Separate Individuals from Class & Property definitions (T-Box) • Store Individuals in a SPARQL-accessible Triple Store • Make T-Box OWL available as a web document (small) • Avoid OWL-Full • Pay attention to Properties, not just classes • Design for Reuse by others • Understandable class, property names • Follow conventions for ontology style • Namespace URI should be URL of actual ontology definition • Ontology should be independent of tools used to access, edit, or reason over it
Summary: OWL Improvements • More Inference & Reasoning options • Establish expected behavior for reasoners • Conformance testing suites • Faster cycles for updates • Establish Style Guides/standards • W3C Documents should focus on usability, not formalisms • Formalism necessary, but shouldn’t be the first/only thing found by a search
Summary: OWL Improvements • Make Open-World an option • Exclusive classes • Default values for properties • Simplify ontology definition for common constructs (single value, enumerated datatypes, etc.) • Improve Property Specifications • Temporal Constructs • Cleaner Enumerated datatypes • Property ranges • Datatypes with units • Simplified datatypes – no need for 16 different numeric datatypes
Summary: Community Improvements • Better examples, tutorials, etc. • Multiple examples for every ontology construct, with thorough explanations • All supported languages, levels • Community forums for collaborating, developing ontology-based solutions • Wiki? Forums? Ontology.org? • Site for “open-source” ontologies • Upper-level ontologies for general domains • Medical, Financial, Social Media, etc. • Design for reuse, and demonstrate it • Assume your ontology will be accessed by others
Last Thoughts… • Ontology challenges directly hinders the adoption of ontology-based semantic technologies • Slows acceptance by community • No large-scale adoption = Less $$$ • Large-scale Java adoption occurred in just a few years • Same with other software languages • Language adoption, standardization fuels job growth • Learn what Java did right – other languages and technologies followed Java’s pattern for success Until the Ontology community addresses these challenges, ontologies will continue to be a marginal player in the semantic web