1 / 32

Evaluating Semantic Metadata without the Presence of a Gold Standard

Evaluating Semantic Metadata without the Presence of a Gold Standard. Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute, The Open University {y.lei,a.nikolov,v.s.uren,e.motta}@open.ac.uk. Focuses.

delu
Download Presentation

Evaluating Semantic Metadata without the Presence of a Gold Standard

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute, The Open University {y.lei,a.nikolov,v.s.uren,e.motta}@open.ac.uk

  2. Focuses • A quality model which characterizes quality problems in semantic metadata • An automatic detection algorithm • Experiments

  3. Ontology <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> Metadata <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> Data

  4. Semantic Metadata Generation Semantic Metadata Acquisition Semantic Metadata Repositories

  5. Semantic Metadata Generation Semantic Metadata Acquisition Semantic Metadata Repositories A number of problems can happen that decrease the quality of metadata

  6. Quality Evaluation • Metadata providers: ensuring high quality • Users: facilitate assessing the trustworthiness • Applications: filtering out poor quality data

  7. Our Quality Evaluation Framework • A quality model • Assessment metrics • An automatic evaluation algorithm

  8. The Quality Model Real World Modelling Describing Representing Data Sources Ontologies Instantiating Annotating Semantic Metadata

  9. Quality Problems Data Objects Semantic Entities (a) Incomplete Annotation

  10. (a) Incomplete Annotation Quality Problems (b) Duplicate Annotation

  11. (a) Incomplete Annotation Quality Problems (c) Ambiguous Annotation (b) Duplicate Annotation

  12. (a) Incomplete Annotation Quality Problems (c) Ambiguous Annotation (b) Duplicate Annotation (d) Spurious Annotation

  13. (a) Incomplete Annotation (d) Spurious Annotation Quality Problems (c) Ambiguous Annotation (b) Duplicate Annotation (e) Inaccurate Annotation

  14. C1 C2 C3 (a) Incomplete Annotation Quality Problems (c) Ambiguous Annotation (b) Duplicate Annotation Class Semantic metadata I1 R1 R2 I2 R2 I3 I4 (e) Inaccurate Annotation (d) Spurious Annotation (f) Inconsistent Annotation

  15. Current Support for Evaluation • Gold standard based: • Examples: Gate[1], LA[2], BDM[3] • Feature: assessing the performance of information extraction techniques used. • Not suitable for evaluating semantic metadata • Gold standard annotations are often not available

  16. The Semantic Metadata Acquisition Scenario KMi News Stories Information Extraction Engine (ESpotter) High Quality Metadata Raw Metadata Evaluation Departmental Databases Semantic Data Transformation Engine • Evaluation needs to take place dynamically whenever a new entry is generated. • In such context, gold standard is NOT available.

  17. Our Approach • Using available knowledge instead of asking for gold standard annotations • Knowledge sources specific for the domain: • Domain ontologies, data repositories, domain specific lexicons • Knowledge available at background • Semantic Web, Web, and general lexicon resources • Advantages: • Making possible for automatic operation • Making possible for large scale data evaluation

  18. Using Domain Knowledge Constraints and restrictions 1. Domain Ontologies Inconsistent Problems Example: one person classified as both KMi-Member and None-KMi-Member when they are disjoint classes.

  19. Using Domain Knowledge Constraints and restrictions 1. Domain Ontologies Inconsistent Annotations Lexicon – instance mappings 2. Domain Lexicons Duplicate Annotations Example: when OU and Open-University both appear as values of the same property of the same instance

  20. Using Domain Knowledge Constraints and restrictions 1. Domain Ontologies Inconsistent Annotations Lexicon – instance mappings 2. Domain Lexicons Duplicate Annotations Ambiguous Annotations 3. Domain Data Repositories Inaccurate Annotations

  21. When nothing can be found in the domain knowledge, the data can be: • Correct but outside the domain (e.g., IBM in the KMi domain) • Inaccurate annotation: mis-classification (e.g., Sun Micro-systems as a person) • Spurious (e.g., workshop chair as an organization) • Background knowledge is then used to further investigate the problems

  22. Investigating the Semantic Web Semantic Web No Found matches Examining the Web Watson Yes Yes Classes Similar? Adding data to the repositories WordNet No Inaccurate Annotations

  23. Examining the Web Web No Has classification? Pankow Spurious Annotations Yes Similar? WordNet No Inaccurate Annotations

  24. PANKOW WATSON WordNet Semantic Web Web Lexical Resources The Overall Picture Domain Knowledge Ontologies SemSearch Pellet + Reiter Evaluation Engine Evaluation Results Metadata Step1: Using domain knowledge Step2: Using background knowledge Background Knowledge Web Semantic Web

  25. C1 C2 C3 (a) Incomplete Annotation Addressed Quality Problems (c) Ambiguous Annotation (b) Duplicate Annotation Class Semantic metadata I1 R1 R2 I2 R2 I3 I4 (e) Inaccurate Annotation (d) Spurious Annotation (f) Inconsistent Annotation

  26. Experiments • Data settings: gathered in our previous work [4] in KMi semantic web portal • Randomly chose 36 news stories from the KMi news archive • Collected a metadata set by using ASDI • Constructed a gold standard annotation • Method: • A gold standard based evaluation as a comparison base line • Evaluating the data set using domain knowledge only • Evaluating the data set using both domain knowledge and background knowledge

  27. A number of entities are not contained in the problem domain

  28. Background knowledge is useful in data evaluation

  29. Discussion • The performance of such an approach largely depends on: • A good domain specific knowledge source • A good publicity of the entities that are contained in the data set, otherwise there would be lots of false alarms.

  30. References • H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL02), 2002. • P. Cimiano, S. Staab, and J. Tane. Acquisition of Taxonomies from Text: FCA meets NLP. In Proceedings of the ECML/PKDD Workshop on Adaptive Text Extraction and Mining, pages 10 – 17, 2003. • D. Maynard, W. Peters, and Y. Li. Metrics for Evaluation of Ontology-based Information Extraction. In Proceedings of the 4th International Workshop on Evaluation of Ontologies on the Web, Edinburgh, UK, May 2006. • Y. Lei, M. Sabou, V. Lopez, J. Zhu, V. S. Uren, and E. Motta. An Infrastructure for Acquiring High Quality Semantic Metadata. In Proceedings of the 3rd European Semantic Web Conference, 2006.

More Related