200 likes | 297 Views
OntoCheck – Verifying Naming Conventions in Protégé 4 OR Fulfilling the 4.th commandment: “You shall not make wrongful use of names”. Daniel Schober , Ilinca Tudose, Vojtech Svatek, Martin Boeker. Diversity in Naming Conventions.
E N D
OntoCheck – Verifying Naming Conventions in Protégé 4OR Fulfilling the 4.th commandment: “You shall not make wrongful use of names” Daniel Schober, Ilinca Tudose, Vojtech Svatek, Martin Boeker
Benefits of Consistent Naming • Increase consistency, accuracy & clarity of labels • Normalize appearance • Reduce diversity with which meta-tools have to cope with • Ease text mining • Ease ontology mapping & alignment • NCBIO Portal • Lexical Owl Ontology Mapper (LOOM)
The OntoCheck Protege 4.1 plugin • Clean-up checks on RUs • Checks on Naming Conventions • Lexical harmonization and labeling enforcement • Checks on Metadata • Completeness and cardinality checks on mandatory and obligatory annotation properties • Checks can be stored, shared & reused • e.g. before each ontology release • Quantifies found violations
Typographical Checks • Word Case • CamelCase, camelHump, lower Case Start, Upper case start, all lower case, ALL UPPER CASE • Word Separator • none, space, hyphen, underscore, dot • Digits • Check for numerics in labels • Look for cardinality and order indicators
Lexical Checks • Regular Expressions • Check on specified Affix pattern • e.g. Role subclasses have ‘role‘ -postfix • Avoid Boolean operators in labels • Check on ‘and’, ‘ or’, ‘non’, ‘anti‘, ‘dis’ • Check on metalevel postfixes • e.g. ‘ class ‘, ‘type’, ‘concept’, ‘relation’ • Check for punctuation • e.g. dots hint for abbreviations • Character & word count • Check for potentially unclear names • Alert on labels >4 characters • Alert on unreadable names >50 characters
CheckTab: MixedCaseConvention Test For all Thing subclasses check if they are CamelCase for OWLClassName RU
CheckTab: PostfixInclusion Test For all QualityRegion subclasses check if they contain a ‚Region‘-postfix
CheckTab: Cardinalityenforcement For all Thing subclasses check for presence of labels (Min Card=1) Save & Load Checks
CompareTab: Label equalsClassName Test For all Thing subclasses check if ClassName equals rdfs:label (ignore separator & case)
OntoCheck current Storage Format Ugly but simple txt file written in P4 inst. Dir. check-name:: QualityRegionContainsRegion panel:: checkPanel checkCombo:: 0 checkRegexText:: Region checkRB:: 2 check-name:: ThingDoesntContainNon panel:: checkPanel checkCombo:: 0 checkRegexText:: Non checkRB:: 1 check-name:: ThingHaveRdfs:labelValueForAllClasses panel:: checkPanel checkCombo:: 6 checkRegexText:: your regex here checkRB:: 0 check-name:: ThingCamelCaseOWLClassNameNC panel:: checkPanel checkCombo:: 0 checkRegexText:: your regex here checkRB:: 4 comboNamingType:: 1 cbWithDigits::
CheckTab - Future Extensions • Check for naming clashes & redundancies • Classes with different IDs but equal labels • Check for plural word forms • Check on non-ASCII characters • α alpha • Check on redundant restrictions • Between own and inherited axiomatic class definitions
Next Steps • Engage collaboration with OntologyDesignPatterns.org • Formalize ‚Naming ODP‘ Pattern • Correlate OntoCheck storage format with ‚Naming ODP‘ • Enable to load Checks from ODP • Pre-formalize sets of consistent Naming Conventions • E.g. for OBO Foundry compliance, Manchester Style, ISO … • Analyse and reuse LiLA framework for linguistic label analysis • Dominique Ritze, Johanna Völker, Christian Meilicke and Ondrej Svab-Zamazal. Linguistic Analysis for Complex Ontology Matching. Proceedings of the ISWC workshop on Ontology Matching (OM), 2010, http://code.google.com/p/lila-project/ • Apply to analyze naming structures & recommend fitting NC
Conclusions • Enforces syntactic and lexical normalization • Render labels clearer to users & machines • Ease ontology cross-referencing and import • Ease Ontology mapping & alignment by reducing String variability • Avoid redundancy and inconsistencies • E.g. ‘biphenyl’ (CHEBI:17097) under a IUPAC required ‘biphenyls’ (CHEBI:22888) • Helps enforcing metadata completeness • Helps in Quality assurance and Quantification • OntoCheck Plugin in early stage • CheckTab already proves useful in multiple in-house efforts
Resources & Acknowledgements Resources • OntoCheck plugin download • http://www.imbi.uni-freiburg.de/ontology/OntoCheck/ • OBO Foundry Naming Conventions & Questionnaire • http://obofoundry.org/wiki/index.php/Naming • Schober D. et al. (2009) Survey-based naming conventions for use in OBO Foundry ontology development. BMC Bioinformatics, Vol.10, Issue 1, 2009 Acknowledgements • This work was initiated and supervised by Daniel Schober, implemented and improved by IlincaTudose under additional guidance from Martin Boeker. Timothy Redmond helped solving Protégé API problems. • DS was supported by the Deutsche Forschungsgemeinschaft (DFG) grant JA 1904/2-1, SCHU 2515/1-1 GoodOD (Good Ontology Design) • IT was supported by the DebugIT EU Grant ICT-2007.5.2-217139