10 likes | 117 Views
hasMother. Range Class Woman. Domain Class Man. hasSister. Domain Instance Samuel. hasMother ?. Range Instance Mary. hasSister ?. Confidence Level Processor. Current T-box. Factors. Axioms – to - Thresholds convector. NLP Processor. Thresholds. Current A-box.
E N D
hasMother Range Class Woman Domain Class Man hasSister Domain Instance Samuel hasMother? RangeInstance Mary hasSister ? Confidence Level Processor Current T-box Factors Axioms – to - Thresholds convector NLP Processor Thresholds Current A-box Terms representation Co-occurrence Based Scores generator Scores All related content Should Current property be Populated? Decision Rule Yes/No answer Pre processing Text Segments Processing Ontology Population Populated Ontology Term List (Excel) Text Mining & NLP based Algorithm to populate ontology with A-Box individuals and object propertiesAlexandre Kouznetsov andChristopher J. O. Baker, University of New Brunswick, Saint John, joint work with Innovatia IncApril 13th, 2010 Connecting Recourses Sentences Named Entities Using Ontology Ontology unpopulated (OWL) Synonyms Lists Tables Single Relations Reasoning Visualizing Source Documents XML Text Segments Separation Bullet Lists Multi Relations Visual Queries Motivation Ontologies can play a very important role in information systems, particularly in facilitating information retrieval and data integration. In this contribution we present a semi-automatic method for extracting information, specifically named entities and their relations, from texts and populating a domain ontology. While previous work has proposed solutions to extract named entities and populate them to classes an ontology, we are focused on the problem of accurately extracting and populating multiple relations between the same named entities and presenting them as distinct object properties between A-box individuals in an OWL-DL ontology. Multiple Relations Problem T-Box level A-Box level Methodology Ontology-based information retrieval applies Natural Language processing (NLP) to link text segments, named entities and relations between named entities to existing ontologies. In our algorithm we leverage a customized gazetteer list, including lists specific to object property synonyms and score A-box property candidates by using functions of distance between co-occurred terms. Using ontology reasoning we build Confidence Thresholds on A-box property candidate scores. A-box Property prediction and population based on these scores and thresholds. Algorithm for multi relation detection • Algorithm main modules • NLP processor: to extract term(s) to represent each object property • 2) Confidence Level processor: to convert settings to Threshold Factors • 3) Axioms-to- Thresholds convector: to extract A-box related axioms and convert into decision boundary Thresholds on property candidate scores • 4) Co-occurrence Based Scores generator: to calculate scores based on normalized distances between domain, range and property terms • 5) Decision Rule to populate properties that obtained scores over Thresholds Semi-Automatic Ontology populating pipeline Implementation tools Java, OWLAPI, GATE/JAPE, PELLET Delivery Semi- Automatical Ontology populating pipeline prototype is under testing on BioMed (Lipids) and Telecom (Innovatia/Nortel) ontologies Acknowledgment We would like to thank Bradley Shoebottom for his help with Telecom knowledge engineering.