1 / 21

Semi-Automatic Semantic Annotation for Hidden-Web Tables

Semi-Automatic Semantic Annotation for Hidden-Web Tables. Cui Tao & David W. Embley Data Extraction Research Group Department of Computer Science Brigham Young University. Supported by NSF. “cdk-4". Semantic Annotation. The Hidden Web: Hidden behind forms Hard to query.

josh
Download Presentation

Semi-Automatic Semantic Annotation for Hidden-Web Tables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semi-Automatic Semantic Annotation for Hidden-Web Tables Cui Tao & David W. Embley Data Extraction Research Group Department of Computer Science Brigham Young University Supported by NSF

  2. “cdk-4" Semantic Annotation • The Hidden Web: • Hidden behind forms • Hard to query www.deg.byu.edu

  3. Semantic Annotation • The Hidden Web: • Hidden behind forms • Hard to query to find the protein and the animo-acids information for gene “cdk-4" www.deg.byu.edu

  4. Semantic Annotation • The Hidden Web: • Hidden behind forms • Hard to query • Semantic annotation • Machine-”understandable” • Publicly accessible www.deg.byu.edu

  5. System Overview • Initial semantic annotation • Manually annotate a sample page • With respect to a selected ontology • Table interpretation • Automatic • Tables from hidden web pages • Final semantic annotation • Automatic • Annotate interpreted tables www.deg.byu.edu

  6. Initial Semantic Annotation • SMORE: Semantic Markup, Ontology and RDF Editor [Maryland information and network dynamics lab] www.deg.byu.edu

  7. www.deg.byu.edu

  8. Table Interpretation • Table interpretation • Locate label and value • Pair label-value pairs • Remember path • TISP – Table Interpretation by Sibling Pages www.deg.byu.edu

  9. TISP www.deg.byu.edu

  10. Interpretation Technique: Sibling Page Comparison Same www.deg.byu.edu

  11. Interpretation Technique: Sibling Page Comparison Almost Same www.deg.byu.edu

  12. Interpretation Technique: Sibling Page Comparison Different Same www.deg.byu.edu

  13. Interpretation Technique: Sibling Page Comparison Structure Pattern of a Table Label Path = Identification.Gene model(s).Gene Model Xpath = html[1]/…/table[3]/tr[1]/td[2]/table[1]/tr[6]/td[2]/table[1]/tr[2]/td[1] www.deg.byu.edu

  14. Annotation Protein Name Protein Name Protein Name Protein Name Protein Name www.deg.byu.edu

  15. Annotation – Split Nucleotide Size Nucleotide Size Nucleotide Size Nucleotide Size Nucleotide Size www.deg.byu.edu

  16. Annotation – Merge Protein Information Protein Information Protein Information www.deg.byu.edu

  17. Annotation—Union Name Name www.deg.byu.edu

  18. Annotation—Selection Molecular Function Molecular Function www.deg.byu.edu

  19. Generated RDF Annotation www.deg.byu.edu

  20. Querying Annotated Data to find the protein and the animo-acids information for gene “cdk-4" www.deg.byu.edu

  21. Summary • Semi-automatic semantic annotation for hidden web tables • Facilitate large-scale annotation to the web www.deg.byu.edu

More Related