1 / 23

Ontology Learning and Semantic Annotation: a necessary symbiosis

Ontology Learning and Semantic Annotation: a necessary symbiosis. Emiliano Giovannetti, Simone Marchi, Simonetta Montemagni, Roberto Bartolini ILC-CNR, Pisa, Italy. The knowledge acquisition paradox.

eliora
Download Presentation

Ontology Learning and Semantic Annotation: a necessary symbiosis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ontology Learning and Semantic Annotation: a necessary symbiosis Emiliano Giovannetti, Simone Marchi, Simonetta Montemagni, Roberto Bartolini ILC-CNR, Pisa, Italy

  2. The knowledge acquisition paradox • Technologies in the area of knowledge management and information access are confronted with a typical acquisition paradox • access to content requires understanding the linguistic structures representing it in text at a level of considerable detail • processing linguistic structures at the depth needed for content understanding presupposes that a considerable amount of domain knowledge is already in place

  3. corpus The knowledge acquisition paradox ontology as a formal representation of domain knowledge ontology learning needs a linguistically- annotated corpus advanced linguistic annotation needs an ontology semantically annotated text

  4. Dynamic Content Structuring Turning a vicious circle into a “virtuous circle” Text (implicit knowledge) Linguistic annotation Structured content (explicit knowledge) Knowledge Extraction

  5. Dynamic Content Structuring Turning a vicious circle into a “virtuous circle”: a first step Text (implicit knowledge) Syntactic parsing Structured content (explicit knowledge) Terminology extraction and structuring ontology

  6. Dynamic Content Structuring Turning a vicious circle into a “virtuous circle”: a first step Text (implicit knowledge) Syntactic parsing Structured content (explicit knowledge) Domain entity and relation extraction ontology-driven semantic annotation

  7. A case study:semantic annotation of product catalogues the challenge • product descriptions appear as semi-structured texts, also including portions of running text • product catalogues do not contain continuous and linguistically sound text (typically, nominal descriptions) • this task requires the combination of different types of evidence and techniques

  8. semantic annotation component ontology learning component The system for ontology-based semantic annotation of product catalogues input catalogue Product catalogues Terminology Processor Product catalogues Terminology Processor NLP Modules Tokenizer Morpho Analyzer Chunker Dependency Parser ontology Product catalogues Italian Semantic Annotator Product catalogues Italian Semantic Annotator <entity data_id="26"> <name>SANELA</name> </entity> <entity data_id=“33"> <part>fodera</part> </entity> <entity data_id=“34"> <material>cotone</material> </entity> semantic annotation of product descriptions

  9. The Product catalogues Terminology Processor (PTP) for Ontology Learning Customised version of T2K (Text-to-Knowledge), a hybrid system combining linguistic technologies and statistical techniques domain terminology Leg Shelf Drawer Sliding door Door Frame Element Part Cover Support Top Term Extraction Semantic Structuring

  10. PTP: semantic structuring – identification of relations Vertical relations identified on the basis of head-sharing Horizontal relations identified on the basis of dynamic distributionally-based similarity measures

  11. bianco blu scuro is_a is_a is_a beige is_a is_a rosso is_a grigio acciaio pino is_a is_a betulla is_a is_a alluminio is_a is_a rovere is_a is_a plastica vetro faggio First step: semantic structuring - clustering colour material definition of sub-concepts definition of root concepts

  12. hasPartMaterial hasPartColour material part colour isa isa isa isa isa blue steel wood base door isa isa isa isa stainless steel solid wood sliding door light blue PTP: the final ontology

  13. Semantic annotation: the approachpattern matching + NLP pattern matching: resorted to for isolating individual product descriptions within the textual flow and for identifying their basic building blocks ontology-driven NLP: for each identified product, the NL description is processed by a battery of NLP tools in charge of identifying relevant entities (e.g. color, material, parts of a given product) and the relations holding between them (e.g. part_of, color_of)

  14. input catalogue NLP tools PISA RegExp Manager NLP Manager ontology Product catalogues Italian Semantic Annotator (PISA):ontology driven semantic annotation domain entities • product • part • name • id • type • category • material • color • price • height • width • depth • weight • diameter relations between identified entities • part_of( product  part ) • name_of ( (product | series)  name ) • id_of ( product  id ) • type_of ( product  type ) • category_of( (product | series | part)  category ) • made_of ( (product | series | part)  material ) • color_of ( (product | series | part)  color ) • price_of( product  price ) • height_of ( product  height ) • width_of ( product  width ) • depth_of ( product  depth ) • weight_of( product  weight ) • diameter_of ( product  diameter )

  15. name type price to be processed by the NLP manager to extract entities and relations about: parts, materials, colours, etc. description dimensions product id name type price description dimensions product id PISA:semantic annotation - pattern matching ([A-Z]{3,}\s)+(.+)?(€[\d,\/\spz]+\.)([\w|\s|\.]+)(Cm\s\d{1,3}.\d{1,3}\.)(\d{3}\.\d{3}\.\d{2})

  16. hasPartMaterial hasPart part material product isa isa isa isa isa table glass wood base door isa isa isa tempered glass solid wood sliding door PISA:ontology for semantic annotation (entity recognition) [ [ CC: N_C] [ AGR: @FP] [ POTGOV: ANTA#S@FP]] [ [ CC: P_C] [ AGR: @MS] [ PREP: IN#E] [ POTGOV: VETRO__TEMPRARE|TEMPRATO#S@MS]] [ [ CC: PUNC_C] [ PUNCTYPE: .#@]] {. }

  17. hasPart materiale parte prodotto isa isa isa isa isa sedia vetro plastica base schienale isa isa vetro temprato bevel edged plate schienale regolabile PISA:ontology for semantic annotation (relation extraction) “Sedia in plastica con schienale regolabile” (plastic chair with adjustable back) ? ? Where to attach “schienale regolabile”: - to “sedia” or to “plastica”? There is no property linking Material to a Part, but there is one linking a Product to a Part so the correct interpretation is that “schienale regolabile” is a part of “sedia”. sedia plastica schienale regolabile “sedia” is a kind of “prodotto” “plastica” is a kind of “materiale” “schienale regolabile” is a kind of “parte”

  18. An example of semantic annotation

  19. An example of semantic annotation: entities annotation

  20. An example of semantic annotation: relations annotation

  21. Evaluation of acquired results recall precision • Preliminary evaluation was carried out: • “task based” evaluation concerning the ontology learning component: • provided in terms of correctness in supporting semantic annotation • evaluation of the semantic annotation component: • a “gold-standard” corpus of reference was created by randomly extracting and manually annotating about 100 IKEA products. number of correct annotations number of partially correct annotations F-measure total number of annotations in the gold-standard (correct+partially correct+missing) total number of annotation (correct+incorrect+partially correct)

  22. Further directions of research • system portability to other product catalogues: • “Zanotta” furniture catalogue • subset of 30 product descriptions extracted as a “gold-standard” of reference and manually annotated • product catalogues in other domains • application of the methodology to other domains and to non-structured (free) corpora • more steps towards the triggering of the “virtuous circle”: • next step: exploiting the results obtained from the semantic annotation to enrich the ontology

  23. THANK YOU!

More Related