250 likes | 378 Views
The challenge of making ontologies useful and usable. Alan Rector School of Computer Science / Northwest Institute of Bio-Health Informatics rector@cs.man.ac.uk www.co-ode.org www.clinical-escience.org www.opengalen.org. Users face a complex landscape.
E N D
The challenge of making ontologies useful and usable Alan RectorSchool of Computer Science / Northwest Institute of Bio-Health Informaticsrector@cs.man.ac.uk www.co-ode.orgwww.clinical-escience.orgwww.opengalen.org
Users face a complex landscape Courtesy of Hodgson, TopQuadran via Carole Goblet
Inhabited by many tribes,Each tribe in its own teepee Descriptionlogic Complexity theory Fuzziness Defaultlogics KR Logics BeliefRevision Logic programming NaturalLanguage Bayesiananalysis Argumentation
Too neat Too academic!Doesn’t understand! ApplicationBuilder Knowledge Engineer KR Researcher PureLogician/Mathematician Too nea! Too academic!Doesn’t understand! Too neat Too academic!Doesn’t understand! Too neat Too academic!Doesn’t understand! The chain of theorem envy Too scruffytoo ad hocDoesn’t understand! Too scruffytoo ad hocDoesn’t understand! Too scruffytoo ad hocDoesn’t understand! Too scruffytoo ad hocDoesn’t understand! The chain of value No one person can understand it all - must manage the chain
… and seem to insist on solving harder problems than the user actually has … but logicians are often seen as policemen Incomplete! Undecidable!Higher order! No semantics! … often without examples of why not
And don’t come back until you have the semantics clear …or insist users understand the solution space The semantics isyour job!Meet users where they are
So what is an ontology? [Deborah McGuinness, Stanford] General Logical constraints Frames (properties) Formal Is-a Thesauri Catalog/ ID Disjointness, Inverse, partof Formal instance Informal Is-a Terms/ glossary Value restrictions Arom Gene Ontology TAMBIS EcoCyc Mouse Anatomy PharmGKB
My definition of an ontology • Short version:“a representation of the shared background knowledge for a community” • Long version:“an implementable model of the entities that need to be understood in common in order for some group of software systems and their users to function and communicate at the level required for a set of tasks” • ... and “it doesn’t make the coffee”Just one of at least three components of a complete system 11
“Ontologies” in Information Systems • What information systems can say and how - “Models of Meaning” • Mathematical theories - although usually weak ones • evolved at the same time as Entity Relation and UML style modelling • Managing Scalabilty / complexity - “Knowledge driven systems” • Housekeeping tools for expert systems • Organising complex collections of rules, forms, guidelines, ... • Interoperability • The common grounding information needed to achieve communication • Standards and terminology • Communication with users • Document design decisions • Testing and quality assurance • sufficient constraints to know when it breaks • Empower users to make changes safely • ... but “They don’t make the coffee” • just one component of the system / theory 8
The scaling problem: The combinatorial explosion Predicted Actual • It keeps happening! • “Simple” brute force solutions do not scale up! • Conditions x sites x modifiers x activity x context→ • Huge number of terms to author • Software CHAOS
Combination of things to be done & time to do each thing Effort per term What we might accept What we would like Things to build • Terms and forms needed • Increases exponentially • Effort per term or form • Must decrease tocompensate • To give the effectiveness we want • Or might accept
abnormal hand The means: Logic as the clips for “Conceptual Lego” normal extremity body gene protein polysacharide cell expression chronic Lung acute infection inflammation bacterium deletion polymorphism ischaemic virus mucus
Logic as the clips for “Conceptual Lego” “SNPolymorphismofCFTRGene causing Defect in MembraneTransport of Chloride Ion causing Increase in Viscosity of Mucus in CysticFibrosis…” “Handwhich isanatomically normal”
Protein Build complex representations from modularisedprimitives Protein coded bygene in humans Function ofProtein coded bygene in humans Disease caused by abnormality inFunction ofProtein coded bygene in humans Gene in humans Species Genes Function Disease
A conceptual Coat rackFractal tailoring of reusable resources:example of data collection forms for trials Renin dependent Hypertension at St Stevens Hospitals for the National Hypertension Survey National Hypertension Survey In St Stevens Hospital Renin Dependent Hypertension` Hypertension
Solution space • Ontologies • Information Models • Logics • Rules • Frames • Planners • Logic programming • Bayes nets • Decision theory • Fuzzy sets • Open / closed world • … • Problem space • Answer questions • Advising on actions • Hazard monitoring • Creating forms • Discovering resources • Constraint actions • Assess risk • …
Guidelines, Patterns, Tools Problem space & solution space `` Problem space Solutionspace
Matching problems and solutions is worthwhile sciencen & craft • Patterns, guidelines and tools • Reformulations of users’ “solutions” • Collaborations with behavioural scientists • Challenges and demonstrations Some observations…
Inter-rater variability ART & ARCHITECTURE THESAURUS (AAT) Domain: art, architecture, decorative arts, material culture Content: 125,000 terms Structure: 7 facets, 33 polyhierarchies Associated concepts (beauty, freedom, socialism) Physical attributes (red, round, waterlogged) Style/Period (French, impressionist, surrealist) Agents: (printmaker, architect, jockey) Activities: (analysing, running, painting) Materials (iron, clay, emulsifier) Objects: (gun, house, painting, statue, arm) Synonyms Links to ‘associated’ terms Access: lexical string match; hierarchical view
CTV3 Term And to real world problemsThe Coding of ChocolateAn international conversion guide ? Bounty bar Crème egg Kit Kat Mars Bar Milky Way Smarties Twix Snicker UbOVv UbOW2 UbOW3 UbOW4 UbOW5 UbOW6 UbOW7 Ub1pT C-F0811 C-F0816 C-F0817 C-F0819 C-F081A C-F081B C-F081C C-F0058 SNOMED-CT
Technology is improving • Understanding of the Web stack is improving • OWL is improving • OWL 1.1 • but we are just beginning to learn how to use it • Tools are improving • Protege4, NEON, ... • Applications are happening • In Bioinformatics • In Health Informatics • Moore’s law is coming to the rescue • We are crossing a critical threshold ... but for human issues we are just starting almost ready to ask the important questions
Challenges • Understanding problems • From users’ perspective • From value perspective • Matching solutions to problems • Solutions exist to solve problems • ... solution designers exist to make better solutions • Understanding misunderstandings • Not trying to do the impossible • Chocolate bars on the two sides of the Atlantic are different • Improving the technology • The dog just barely walks on its hind legs • So • What can we do? • What’s it good for? • Is it useful and usable? 29