1 / 13

Data-Extraction Ontology Generation by Example

This project aims to extract semi-structured web data using an ontology-based technique that is resilient and user-friendly with a focus on ontology generation. By utilizing an object and relationship sets and constraints approach, extraction patterns, keywords, and context expressions are generated for efficient data extraction. The project also evaluates the precision and recall ratios of system-generated ontologies compared to expert-generated ones. Contributions include proposing a by-example approach and a web-based tool for ontology generation.

csandler
Download Presentation

Data-Extraction Ontology Generation by Example

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored byNSF

  2. Motivation • Semi-structured Web data need to be extracted for further manipulations. • Contrast to other wrapper generation techniques, BYU ontology-based data-extraction technique is resilient. • By-Example approach makes it possible to help common users generate ontologies easily.

  3. Canon PowerShot S40 4.0 1600 x 1200 1024 x 768 640 x 480 Web-based System GUI

  4. Extraction Ontology Architecture Data Frame Library Sample Pages Ontology Generator User DefinedForm System GUI Populated Database Extraction Engine Test Pages

  5. Extraction Ontology • Object and Relationship Sets and Constraints • Extraction Patterns • Keywords and Context Expressions

  6. Ontology GenerationObject and Relationship Sets and Constraints Base [0:1] A [1:*] Base [0:2] B [1:*] Base [0:*] C [1:*] Base [0:2] D1 [1:*] D2 [1:*] Base [0:*] E1 [1:*] E2 [1:*]

  7. A [0:1] F [1:*] B1 [0:1] G [1:*] B1, B2 : B … … B2 [0:1] H [1:*] I [1:*] Ontology GenerationObject and Relationship Sets and Constraints

  8. Ontology GenerationExtraction Patterns • Data Frame Library • Lexicons • Synonym Dictionaries or thesauri • Regular Expressions • Matching extraction patterns: • Only one • More than one (use extraction pattern filters) • None (create one)

  9. Ontology GenerationKeywords and Context Expressions • 3.5x optical zoom (2.5xdigital) • a superior4x Optical Zoom Nikkor lens, plus4x stepless digital zoom • optical 3X/digital6Xzoom

  10. Sample Web Page Canon PowerShot G2 4.0 2272 x 1074 3 2 User Defined Forms Object and Relationship Sets and Constraints DigitalCamera [-> object] DigitalCamera [0:1] Brand [1:*] DigitalCamera [0:1] Model [1:*] DigitalCamera [0:1] CCDResolution [1:*] DigitalCamera [0:1] ImageResolution [1:*] DigitalCamera [0:1] Zoom [1:*] Zoom [0:1] DigitalZoom [1:*] Zoom [0:1] OpticalZoom [1:*]

  11. Extraction Ontology DigitalCamera [-> object]; DigitalCamera [0:1] Brand [1:*]; DigitalCamera [0:1] ImageResolution [1:*]; DigitalCamera [0:1] Zoom [1:*]; DigitalCamera [0:1] CCDResolution [1:*]; Zoom[0:1] OpticalZoom[1:*]; Brand matches [10] constant{ extract "\bNikon\b";}, { extract "\bCanon\b";}, { extract "\bOlympus\b";}, { extract "\bMinolta\b";}, { extract "\bSony\b";}; end; CCD Resolution matches [20] constant{ extract"\b\d(\.\d{1,2})?\b"; }; keyword"\bMegapixel\b”, "\bCCD\b", "\bCCD Resolution\b"; end; OpticalZoom matches [10] constant{ extract "\b\d(\.\d)"; context"\b\d(\.\d)?(x)\b";}; keyword"\boptical\b"; end;

  12. Measurements • How much of the ontology was generated with respect to how much could have been generated? • How many components generated should not have been generated? • What comparisons can we make about the precision and recall ratios of extraction data between a system-generated ontology and an expert-generated ontology? • How many sample pages are necessary for acceptable system performance?

  13. Contributions • Proposes a by-example approach to semi-automatically generate data-extraction ontologies • Constructs a Web-based tool to generate data-extraction ontologies

More Related