230 likes | 372 Views
Sophia Katrenko HCSL, Informatics Institute Universiteit van Amsterdam katrenko@science.uva.nl. Semantic learning with specific application to ontologies. Outline. Learning: why semantic? Learning ontologies: approaches Some experiments Dictionary Relation schemata
E N D
Sophia Katrenko HCSL, Informatics Institute Universiteit van Amsterdam katrenko@science.uva.nl Semantic learning with specific application to ontologies
Outline • Learning: why semantic? • Learning ontologies: approaches • Some experiments Dictionary Relation schemata Text • Future work
Learning ontologies: citations Ontology learning aims at developing methods and tools that reduce the manual effort for engineering and managing ontologies. R. Studer …we can define ontology learning as the set of methods and techniques used for building an ontology from scratch, enriching, or adapting an existing ontology in a semi-automatic fashion using several sources. Gómez-Pérez & Manzano-Macho
Research questions • What type of information is learnable? • Is it possible to learn in terms of existing semantic formalisms such as OWL? • How to evaluate the information learned? • What types of ontologies should be learned (domain ontologies, general (?) ontologies)?
Approaches to learning ontologies: sources (1) According to Gómez-Pérez & Manzano-Macho (OntoWeb, deliverable 1.5, 2003) ontologies can be learned from • Text • Dictionary • Knowledge base • Semi-structured data • Relation schemata
Approaches to learning ontologies: sources (2) Dictionaries/thesauri • Reliable information about the domain presented explicitly in the form of definitions • Relatively easy to learn concepts/is-a relations using definitions • May function as the general knowledge source for a given domain Text • Domain- and case-specific • Serve to learn ontology for the particular case
Approaches: way of learning (1) • Weakly supervised and supervised approaches PoS taggers -> parsing -> … • Unsupervised Grammar inference systems
Approaches to learning ontologies (2) • Pattern based use patterns of parts of speech defined by expert, e.g. NN is DETADJ*NN -> alitame is asweetener asthma is acronical medicalcondition • Statistical require large amount of data
Data in food domain • Glossary of food-related terms (IFIC – International Food Information Council ) http://www.ific.org/glossary/index.cfm • USDA National Nutrient Database for Standard Reference Release 16http://www.nal.usda.gov/fnic/foodcomp/Data/SR16/sr16.html • European Food Safety Authority (reports, summaries)
Extraction taxonomies from dictionary Aim Using structure of a glossary/dictionary to find concepts and is-a relations Approach • Linguistic analysis (PoS tagging) • Pattern-based extraction of the concepts • Building a taxonomy • Representing the results using some formalism
Example (1) • fructose – Fructose is a found naturally in fruits, as an added sugar in a crystalline form and as a component of high-fructose corn syrup (HFCS). monosaccharide is-a fructose monosaccharide galactose is-a
Example (2) <owl:Class rdf:ID=“monosaccharide”></owl:Class> <owl:Class rdf:ID=“fructose”> <rdfs:subClassOf rdf:resource=“#monosaccharide” /> </owl:Class> monosaccharide
Linguistic processing is difficult to perform in highly specific domains such as food domain – A type (NN) of flavonoid (VBD) found in various which … – A type (NN) of found in various fruits (NNS) which … Problems anthocyanidins fruits (NNS) flavonoid (NN) antibiotics – Antibiotics are used in animal agriculture for two reasons… Lack for an explicit definition scheme
USDA National Nutrient DatabaseSR16 • … is the major source of food composition data in the US • … contains nutrient data for 6,661 food items listed for up to 125 food components such as vitamins, minerals and fatty acids • … is often updated
USDA National Nutrient DatabaseSR16 The unstructured information provided by SR16 allows for extraction of • is-a relations • properties
Grammar inference systems (1) • Learning syntactic constituents which corresponds to terminals and non-terminals in grammar parlance • Existing approaches – Adios (2004), Emile • Assumption 1: there is relation between semantics and syntax • Assumption 2: it is possible to learn concepts, attributes and relations
Grammar inference systems (2) • Data European Food Safety Authority; Medline E.g. , E1127 (cereal, fish, nut) E1142 (demonstrate, predict, exclude) P1140 (coeliac disease) P6726 (a,E6727,margin,of,safety) E6727 (small, satisfactory, preliminary, large, possible) • Problems: data sparseness evaluation methods for GI are not appropriate (?)
Conclusions (1) • Existing linguistics software must be trained on the data in a specific domain (e.g., in food domain) • Need for cooperative modeling (Morik’93) where the data is first processed automatically with results presented to a domain expert
Conclusions (2) • Need to integrate learning from various sources in one framework Co-training, bootstrapping … • Using feedback an expert to retrain a system • How to evaluate? Looks-good-for-me approach Comparison to the gold standard ontology Delete-it-and-place-it-back (Faatz)
The way it may look like … yes no Clues Concepts/relations learned ML methods statistical grammatical inference
Future work • Using deep linguistic processing • Work on papers: using grammar induction tools • Cooperative modeling: interactive demo • So far: learning concepts & relations…future: learning constraints & axioms