1 / 27

Semantically Conceptualizing and Annotating Tables

This research paper discusses the implementation and experimentation of TANGO and MOGO, two tools for semantically enriching and integrating raw tables into a growing ontology. It highlights the challenges, opportunities, and enhancements in table interpretation, concept/value recognition, relationship discovery, and constraint discovery.

rroman
Download Presentation

Semantically Conceptualizing and Annotating Tables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantically Conceptualizing and Annotating Tables Stephen Lynn & David W. Embley Data Extraction Research Group Department of Computer Science Brigham Young University Supported by the

  2. Overview • Context • WoK: Web of Knowledge • TANGO: Table ANalysis for Generating Ontologies • MOGO: Mini-Ontology GeneratOr • Semantic Enrichment via MOGO • Implementation • Experimentation • Enhancements • Challenges & Opportunities

  3. WoK: a Web of Knowledge

  4. TANGO TANGO repeatedly turns raw tables into conceptual mini-ontologies and integrates them into a growing ontology. Growing Ontology

  5. MOGO TANGO repeatedly turns raw tables into conceptual mini-ontologies and integrates them into a growing ontology. Growing Ontology MOGO generates mini-ontologies from interpreted tables.

  6. MOGO Overview • Table • Interpretation • Yields a canonical table • Canonical Table • Concept/Value Recognition • Relationship Discovery • Constraint Discovery • Yields a semantically enriched conceptual model • Mini-ontology • Integration into a growing ontology MOGO

  7. Sample Input Sample Output

  8. Concept/Value Recognition • Lexical Clues • Labels as data values • Data value assignment • Data Frame Clues • Labels as data values • Data value assignment • Default • Recognize concepts and values by syntax and layout

  9. Concept/Value Recognition • Lexical Clues • Labels as data values • Data value assignment • Data Frame Clues • Labels as data values • Data value assignment • Default • Recognize concepts and values by syntax and layout Concepts and Value Assignments Location Region State Northeast Northwest Delaware Maine Oregon Washington

  10. Concept/Value Recognition • Lexical Clues • Labels as data values • Data value assignment • Data Frame Clues • Labels as data values • Data value assignment • Default • Recognize concepts and values by syntax and layout Year 2002 2003 Concepts and Value Assignments Location Region State Population Latitude Longitude Northeast Northwest Delaware Maine Oregon Washington 2,122,869 817,376 1,305,493 9,690,665 3,559,547 6,131,118 45 44 45 43 -90 -93 -120 -120

  11. Relationship Discovery 2000 • Dimension Tree Mappings • Lexical Clues • Generalization/Specialization • Aggregation • Data Frames • Ontology Fragment Merge

  12. Relationship Discovery • Dimension Tree Mappings • Lexical Clues • Generalization/Specialization • Aggregation • Data Frames • Ontology Fragment Merge

  13. Constraint Discovery • Generalization/Specialization • Computed Values • Functional Relationships • Optional Participation

  14. Validation • Concept/Value Recognition • Correctly identified concepts • Missed concepts • False positives • Data values assignment • Relationship Discovery • Valid relationship sets • Invalid relationship sets • Missed relationship sets • Constraint Discovery • Valid constraints • Invalid constraints • Missed constraints

  15. Concept Recognition • Counted: • Correct/Incorrect/Missing Concepts • Correct/Incorrect/Missing Labels • Data value assignments

  16. Relationship Discovery • Counted: • Correct/incorrect/missing relationship sets • Correct/incorrect/missing aggregations and generalization/specializations

  17. Constraint Discovery • Counted: • Correct/Incorrect/Missing: • Generalization/Specialization constraints • Computed value constraints • Functional constraints • Optional constraints

  18. Concept Recognition • Successes • 98% of concepts identified • Missing label identification • 97% of values assigned to correct concept • Common problems • Finding an appropriate label • Duplicate concepts

  19. Relationship Discovery • Recall of 92% for relationship sets • Missing aggregations and gen./spec.’s (only found in label nesting) • Unnecessary rel. sets generated (are computable)

  20. Constraint Discovery • F-measure of 98% for functional relationship sets • Computed value discovery • Funtional/non-functional  lists in cells

  21. MOGO Contributions • Tool to generate mini-ontologies • Accuracy encouraging

  22. Opportunities & Challenges: MOGO • Enhancements • Check for inter-label relationships • Check for more complex computations • Check for lists in cells • … • Wish List • Data-frame library • Atomic knowledge components • Instance recognizers • Library of molecular components • Semi-automatic construction of a WordNet-like resource for knowledge components

  23. Summary • MOGO • Semantic Enrichment • Encouraging Results • But More Possible • Broader Implications ~ Vision & Challenges • TANGO • WoK • Web of Data • Semantic Annotation • User-friendly Query Answering www.deg.byu.edu embley@cs.byu.edu

  24. Opportunities & Challenges: TANGO • Table Interpretation • Transforming tables to F-logic [Pivk07] • Layout-independent table representation [Jha08] • Table interpretation by sibling tables [Tao07] • Semantic Enhancement / Ontology Generation • Naming unnamed table concepts [Pivk07] • MOGO [Lynn09] • Semi-automatic Ontology Integration • Ontology Matching [Euzenat07] • Ontology-mapping tools [Falconer07] • Direct and indirect schema mappings for TANGO [Xu06]

  25. Opportunities & Challenges: WoK • Web of Data • “The Semantic Web is a web of data.” [W3C] • Upcoming special issue of Journal of Web Semantics • “Enabling a Web of Knowledge” [Tao09] • Information Extraction • Domain-independent IE from web tables [Gatterbauer07] • Open IE [Banko07] • …

  26. Opportunities & Challenges: WoK • … • Semantic Annotation wrt Ontologies • Linking Data to Ontologies [Poggi08] • TISP [Tao07] • FOCIH [Tao09] • Reasoning & Query Answering • Description Logics [Baadar03] • NLIDB Community • AskOntos [Ding06] • SerFR [Al-Muhammed07]

  27. References • [Al-Muhammed07] Al-Muhammed and Embley, “Ontology-Based Constraint Recognition for Free-Form Service Requests”, Proceedings of the 23rd International Conference on Data Engineering, 2007. • [Baader, Calvanese, McGuinness, Nardi and Patel-Schneider, The Description Logic Handbook, Cambridge University Press, 2003. • [Banko07] Banko, Cafarella, Soderland, Broadhead and Etzioni, “Open Information Extraction from the Web”, Proceedings of the International Joint Conference on Artificial Intelligence, 2007. • [Ding06] Ding, Embley and Liddle, “Automatic Creation and Simplified Querying of Semantic Web Content: An Approach Based on Information-Extraction Ontologies”, Proceedings of the First Asian Semantic Web Conference, 2006. • [Euzenat07] Eusenat and Shvaiko, Ontology Matching, Springer Verlag, 2007. • [Falconer07] Falconer, Noy and Storey, “Ontology Mapping—A User Survey”, Proceedings of the Second International Workshop on Ontology Mapping, 2007. • [Gatterbauer07] Gatterbauer, Bohunsky, Herzog and Pollak, “Towards Domain-Independent Information Extraction from Web Tables”, Proceedings of the Sixteenth International World Wide Web Conference, 2007. • [Jha07] Jha and Nagy, “Wang Notation Tool: Layout Independent Representation of Tables”, Proceedings of the 19th International Conference on Pattern Recognition, 2007. • [Pivk07] Pivk, Sure, Cimiano, Gams, Rajkovič and Studer, “Transforming Arbitrary Tables into Logical Form with TARTAR”, Data & Knowledge Engineering, 2007. • [Poggi08] Poggi, Lembo, Calvanese, DeGiacomo, Lenzerini and Rosati, “Linking Data to Ontologies”, Journal on Data Semantics, 2008. • [Tao07] Tao and Embley, “Automatic Hidden-Web Table Interpretation by Sibling page Comparison”, Proceedings of the 26th International Conference on Conceptual Modeling, 2007. • [Tao09] Tao, Embley and Liddle, “Enabling a Web of Knowledge”, Technical Report : tango.byu.edu/papers, 2009. • [Xu06] Xu and Embley, “A Composite Approach to Automating Direct and Indirect Schema Mappings”, Information Systems, 2006.

More Related