1 / 20

Transforming Arbitrary Tables into F-Logic Frames with TARTAR

Transforming Arbitrary Tables into F-Logic Frames with TARTAR. Aleksander Pivk , York Sure, Philipp Cimiano , Matjaz Gams , Vladislav Rajkovic , Rudi Studer Presented By Stephen Lynn. Information Extraction. Free-form Text Linguistic/NLP approaches Tabular Structures

shay
Download Presentation

Transforming Arbitrary Tables into F-Logic Frames with TARTAR

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Transforming Arbitrary Tables into F-Logic Frames with TARTAR AleksanderPivk, York Sure, Philipp Cimiano, MatjazGams, VladislavRajkovic, Rudi Studer Presented By Stephen Lynn

  2. Information Extraction • Free-form Text • Linguistic/NLP approaches • Tabular Structures • Table comprehension task • html, excel, pdf, text, etc. • Semantic interpretation task • More effort???

  3. TARTAR Architecture

  4. Semantic Representation • Frame Logic (F-Logic) • Model-theoretic semantics • Complete resolution-based proof theory • Expressive power of logic • Availability of efficient reasoning tools

  5. F-Logic Frame

  6. Table Comprehension • Dimensions – a grouping of cells representing similar entities

  7. Table Comprehension • Stub – dimension with headers used to index elements in body

  8. Table Comprehension • Box head – column headers (often nested)

  9. Table Comprehension • Body – data values

  10. Table Classes • 1D, 2D, Complex

  11. Methodology

  12. Cleaning & Canonicalization • Clean DOM tree • CyberNeko HTML Parser • Rowspan/Colspan expansion

  13. Structure Detection • Token Type Hierarchy • Assign Functional Types and Probabilities

  14. Structure Detection • Detect Logical Table Orientation

  15. Structure Detection • Discover and Level Regions • Logical Units

  16. FTM Building • Functional Table Model (FTM) • Arrange regions into a tree • Leaf nodes are data

  17. Semantic Enriching of FTM • Labeling • WordNet and GoogleSets • Map FTM to a frame

  18. Evaluation • Crawl, extract, filter web tables • 135 tables • 85.4% success rate • Mostly problems with complex tables • Compare auto-generated frames with human generated frames • 14 people transformed 3 tables each • 21 total tables (each done twice) • Syntactic/Semantic correctness (Strict and Soft)

  19. Results Inter-annotator agreement System-annotator agreement

  20. Benefits • Fully automated knowledge formalization • Arbitrary tables • Independent of domain knowledge • Independent of document type • Explicit semantics of generated frames • Query answering over heterogeneous tables

More Related