190 likes | 327 Views
Automating Generation of Textual Class Definitions from OWL to English. Robert Stevens, James Malone , Sandra Williams, Richard Power. Summary. Motivation Use Case Methods and Description Generator Results Evaluation Open Questions (still). Motivation. Textual definitions are
E N D
Automating Generation of Textual Class Definitions from OWL to English Robert Stevens, James Malone, Sandra Williams, Richard Power
Summary • Motivation • Use Case • Methods and Description Generator • Results • Evaluation • Open Questions (still) Automating Generation of Textual Class Definitions from OWL to English
Motivation • Textual definitions are • cornerstone of good practice in ontology delivery • a requirement of the OBO process • hard work to produce • Logical definitions • make meaning explicit to the computer • help maintenance of the ontology’s structure, querying, and so on • are also hard to produce but also more difficult to understand • The information in one form should reflect the information in the other • Need to keep textual and logical definitions synchronised • Aim to produce fluent textual definitions from logical definitions/description in OWL Automating Generation of Textual Class Definitions from OWL to English
OWL Smackdown: Computer vs Human Automating Generation of Textual Class Definitions from OWL to English
Our Hypotheses • Text = humans • Logical = computers (and future human-computer hybrids) • Textual definitions ≈ Logical definition • Textual definitions tend to be more lossy than logical (cardinalities are often dropped, specific roles not mentioned, etc.) • Logical definitions are often more explicit than natural language and therefore should contain sufficient content to produce a textual definition. Automating Generation of Textual Class Definitions from OWL to English
EFO Use Casewww.ebi.ac.uk/efo • Experimental Factor Ontology (EFO) is an application ontology which consumes domain ontologies to satisfy specific application focused use cases • Primarily Gene Expression data from ArrayExpress @ EBI Automating Generation of Textual Class Definitions from OWL to English
EFO @ Gene Expression Atlaswww.ebi.ac.uk/gxa Automating Generation of Textual Class Definitions from OWL to English
Related Work • Generating descriptions from ontologies often called ‘ontology verbalisation’ • A number concerned only with ABox verbalisation (Hielkema 2009; Galanis and Androutsopoulos, 2007) • Others produce only separate sentences, one for each OWL axiom (Kalijurand, 2007) • Our approach has much in common but differs in; • only a subset of OWL is considered (the simple description logic EL++) • instead of realising axioms in isolation we apply some rules for organisation and aggregation to give more natural feel Automating Generation of Textual Class Definitions from OWL to English
Method Overview • An OWL ontology is just a “pile of axioms” • We can produce individual sentences based on a grammar that guides transformation from OWL to English (or other natural language) • Need to group sentences (group axioms with the same subject together) • Need to aggregate axioms (collapse axioms with the same relationship together) • Once grouped and aggregated, a paragraph of text can be produced sentence by sentence. hasPart some leg hasPart some body hasPart some head Has parts leg, body and head Automating Generation of Textual Class Definitions from OWL to English
Processing stages • Transcode OWL/XML to Prolog • Construct a lexicon for atomic entities – (next slide) • Group axioms by atomic entity • Aggregate axioms with similar structure • Generate sentences from aggregated axioms. class(animal). subClassOf(class(cat), class(animal). subClassOf(class(dog), class(animal). => class(animal). subClassOf([class(cat), class(dog)], class(animal)). => ANIMAL. A cat and a dog are both kinds of animals. Automating Generation of Textual Class Definitions from OWL to English
Description Generator • Input: OWL/XML ontology • Output: Text describing atomic entities • generation from label/URL • It is assumed that the syntax of each phrase will be severely constrained as follows: • individuals are expressed by proper names • classes by common nouns (with singular and plural forms) • properties by transitive verbs (simple or compound) with slots for a subject and an object. ANIMAL. The following are kinds of animals: a cat, a duck, a giraffe, a person, a sheep, and a tiger. An animal eats a thing. If X has as pet Y then necessarily Y is an animal. Automating Generation of Textual Class Definitions from OWL to English
Results *axioms placed on subclasses Automating Generation of Textual Class Definitions from OWL to English
Results • Online survey of ontology users at EBI • 10 of the 50 verbalisations were evaluated based on widest range of axioms Total Judgement Automating Generation of Textual Class Definitions from OWL to English
Findings • Finding of dodgy class; • definition for Ara-C-resistant murine leukemia indicated subclasses b117h and b140h types of this, implying that they were diseases rather than cell lines • Desire amongst this user group for simplicity of language – avoid ontological formality • e.g. bearer of • Especially property names for qualities • e.g. has as quality male • Initial verbalisation making semantics clear was not liked • Plural forms occasionally issue: lex(class(EFO_0000322),noun, ‘cell line’, ‘cell lines’). lex(class(EFO_0002095),noun, ‘22rv1’,’22rv1s’). Automating Generation of Textual Class Definitions from OWL to English
Conclusion • Initial results were largely well received and considered useful in most cases • Discovery of incorrect class definition demonstrates potential as tool for class validation • Preference for text definitions was for ‘clear and simple’ over ‘precise and complex’ • Dependent entities could become adjectival forms of the independent entities in which they inhere (cell has quality female becomes female cell) • Formal relations/class labels reduce understanding and should be brought closer to domain language • Many ontologies are not amenable to text mining – this is an important use case neglected by most • Definitions now being imported into EFO Automating Generation of Textual Class Definitions from OWL to English
Next Steps • Systematic study of acceptable wordings • Different wording styles for different users • Adjectival forms for qualities etc; the role of a upper level ontology • Moving beyond EL++ • Parsing for OBO Automating Generation of Textual Class Definitions from OWL to English
Next Steps: Round Tripping Automating Generation of Textual Class Definitions from OWL to English
Open Questions Should textual descriptions ≡ logical descriptions? Are discrepencies acceptable? Automating Generation of Textual Class Definitions from OWL to English
Acknowledgements • Sandra Williams, Richard Power and Robert Stevens are funded by the SWAT project (EPSRC grants EP/G033579/1 and EP/G032459/1); • James Malone is funded by EMBL and EMERALD (project number LSHG-CT-2006-037686). • We would like to thank the members of the EBI’s ontology interest group, functional genomics group and Dr Helen Parkinson for comments and survey participation Automating Generation of Textual Class Definitions from OWL to English