350 likes | 518 Views
Building a rich ontology from AGROVOC. Dagobert Soergel College of Information Studies, University of Maryland dsoergel@umd.edu , www.dsoergel.com. FAO Agricultural Ontology Server Workshop Beijing, April 27 - 29, 2004. The problem.
E N D
Building a rich ontology from AGROVOC Dagobert Soergel College of Information Studies, University of Maryland dsoergel@umd.edu, www.dsoergel.com FAO Agricultural Ontology Server Workshop Beijing, April 27 - 29, 2004
The problem • AI and Semantic Web applications need full-fledged ontologies that support reasoning • Constructing such ontologies is expensive • While existing KOS do not provide the full set of precise concept relationships needed for reasoning,existing KOS, both large and small, represent much intellectual capital KOS = Knowledge Organization System • How can this intellectual capital be put to use in constructing full-fledged ontologies • Specifically: From AGROVOC to a full-fledged Food and Agriculture Ontology
Some applications of a Food and Agriculture Ontology • Advice on crops and crop management (fertilization, irrigation) • Advice on pest management • Tracking contaminants through the food chain • Advice on safe food processing • Computing nutrition labels • Advice on healthy eating • Improved searching
AGROVOC relationships compared with more differentiated relationships of a Food and Agriculture Ontology
From AGROVOC to FA Ontology • Define the FA Ontology structure • Fill in values from AGROVOC to the extent possible • Edit manually with computer assistanceusing the rules-as-you go approach andan ontology editor: • make existing information more precise • add new information
Note Relationships between Relationships Relationships between concepts Concept Relationship annotation relationship designated by Relationships between terms Lexicalization/ Term Other information: language/culture subvocabulary/scope audience type, etc. manifested as Relationships between strings String
Fill in values from AGROVOC • Fill in values from AGROVOC to the extent possible • Arrange in structured sequence (to the extent possible based on the information in AGROVOC) to facilitate editing(The editor can deal with similar problems at the same time.)
Edit manually with computer assistance • Use the rules-as-you-go approach andgood ontology editing software that handles large ontologies efficiently • make existing information more precise • add new information Assumption: Entity types of concepts are known from AGROVOC or other sources (Langual, UMLS, WordNet); for example milk fat is a Substance Asteraceae is a taxon The editor may need to determine the entity type
The rules-as-you-go approachExploitpatternsto automate the conversion processExample 1. An editor has determined that milk NT cow milk should become milk <includesSpecific> cow milk • She recognizes that this is an example of the general pattern milk NT * milk milk <includesSpecific> * milk (where * is the wildcard character) • Given this pattern, the system can derive automatically milk NT goat milk should become milk <includesSpecific> goat milk Result:
The rules as you go approachExploitpatternsto automate the conversion process 1. Editor: milk NT milk fat milk <containsSubstance> milk fat • Pattern:Substance NT/RT Substance Substance <containsSubstance> Substance • Thereforemilk RT milk protein milk <containsSubstance> milk protein Result:
The rules as you go approachExploitpatternsto automate the conversion process 1. Editor: cows RT cow milk cows <hasComponent> cow milk • PatternAnimal RT BodyPart Animal <hasComponent> BodyPart • Therefore: goats NT goat milk goat <hasComponent> goat milk Result:
The rules as you go approachExploitpatternsto automate the conversion process 1. Editor: acid soils BT chemical soil types acid soils <isa> chemical soil types • Pattern:X BT * type* X <isa> * type* • Therefore: acrisols BT genetic soil types acrisols <isa> genetic soil types Result:
The rules as you go approachExploitpatternsto automate the conversion process 1. Editor:Cichorium BT Asteraceae Cichorium <isa> Asteraceae • Pattern:Taxon BT Taxon Taxon <isa> Taxon • Therefore: Cichorium endivia BT Cichorium Cichorium endivia <isa> Cichorium Result:
The rules as you go approachExploitpatternsto automate the conversion process 1. Editor:Cichorium intybus RT coffee substitutes Cichorium intybus <usedToMake> coffee substitutes • Pattern:Taxon RT FoodProduct Taxon <usedToMake> FoodProduct • Therefore:Cichorium intybus RT root vegetables Cichorium intybus <usedToMake> root vegetables Result:
The rules as you go approachDiscussion Main idea: Formulate constraints to assist the editor • Ontology may have many relationship types, perhaps > 100 • Constraints limit the relationship types that are possible in a specific case; show the editor only these • If the constraints limit possible relationship types to 1, conversion is automatic • Constraints may depend on Thesaurus to be converted
Checking by editor • Relationship instances created by editor by selecting from a constraint-generated menuare final • Relationship instances created automatically must be presented to the editor • If the editor determines that the relationship instances are almost always correct, she checks a box accept without checking
Overall conversion process • One master editor must go through the file from start to finish,processing the relationship instances and creating patterns,creating new relationship types as needed • Assistant editors can apply the patterns. • In the first pass, the master editor should deal with the easy cases. • Deal with the remaining cases later.Groups of similar relationship instances can be seen more easily in a smaller set
Adding new relationship types and new relationship instances • AGROVOC does not contain all relationship types or relationship instances for AI applications • Need to add data. For example Organism X <hasPest> Organism Y ChemSubstance X <actsAgainst> Organism Y Organism X <actsAgainst> Organism Y Plant X <growsIn> Environment Y FoodProduct X <suitableFor> Diet Y
Conclusion The rules-as-you-go approach is a realistic method for developing a rich ontology from an existing thesaurus