1 / 18

Query Rewriting for Extracting Data Behind HTML Forms

This study explores query rewriting to extract data from HTML forms. A system is developed to create forms based on application-specific ontology analysis for efficient data retrieval and processing.

rjara
Download Presentation

Query Rewriting for Extracting Data Behind HTML Forms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Query Rewriting for Extracting Data Behind HTML Forms Xueqi Chen Department of Computer Science Brigham Young University March, 2003 Funded by National Science Foundation

  2. Motivation • Web information is stored in databases • Databases are accessed through forms • Automated agents are of great value • Process is difficult because of nature of forms

  3. Input Analyzer Extracted Information Application Ontology Site Form User Query Retrieved Page(s) Output Analyzer System Flowchart

  4. User Query Acquisition • Our system provides a form created based on application-specific ontology

  5. Site Form Analysis • Understand type, name, and/or values for each field

  6. Form Filling • Name matching • Regular Expressions – for fields with values provided • Stemming • Levenshtein Edit Distance • Longest Common Subsequences • Soundex • Wordnet • Value matching

  7. Value Matching: Case 1

  8. ? ? Value Matching: Case 2

  9. ? ? Value Matching: Case 3 Color?

  10. Value Matching: Case 4

  11. Value Matching: Case 5 ?

  12. Value Matching: Case 6

  13. Value Matching: Case 7

  14. Measurements • Matching Efficiency • Submission Efficiency • Post-processing Efficiency

  15. Measurements (cont’) • Matching Efficiency

  16. Measurements (cont’) • Matching Efficiency • Submission Efficiency

  17. Measurements (cont’) • Matching Efficiency • Submission Efficiency • Post-processing Efficiency

  18. Contributions • It enhances the effectiveness of the data-extraction process • It presents another technique, in addition to [RGa01], to access data behind HTML forms.

More Related