1 / 7

ODE: Ontology-Assisted Data Extraction

ODE: Ontology-Assisted Data Extraction. Weifeng Su, Jiying Wang, Frederick H. Lochovsky Summarized by Joseph Park. Overview. “Web databases…compose what is referred to as the deep Web” The goal of data extraction:

meryl
Download Presentation

ODE: Ontology-Assisted Data Extraction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ODE: Ontology-Assisted Data Extraction Weifeng Su, Jiying Wang, Frederick H. Lochovsky Summarized by Joseph Park

  2. Overview • “Web databases…compose what is referred to as the deep Web” • The goal of data extraction: • (1) Query result sectionidentification - decides what section in a dynamically generated query result page contains the data that need to be extracted. • (2) Record segmentation - segments the query result section into records and extracts them. • (3) Data value alignment - aligns the data values from multiple records that belong to the same attribute so that they can be arranged into a table. • (4) Label assignment - assigns a suitable, meaningful label (i.e., an attribute name) to each column in an aligned table.

  3. Problems • Automatically extract data from query results • Limitations of other systems: • Incapable of processing either zero or few query results. • Vulnerable to optional and disjunctive attributes. • Incapable of processing nested data structures. • No label assignment.

  4. Approach • ODE – Ontology-assisted data extraction • PADE wrapper • Query result annotation • Attribute matching • Ontology construction

  5. Approach continued • Query result section identification • Record segmentation • Data value alignment and label assignment • MaxEnt model is used

  6. Experimental Results Extraction performed using DeLa

  7. Conclusion • Can only label attributes that appear in query result pages • References a few DEG papers • DKE99, Tisp, TANGO • Could take advantage of MaxEnt for pre-labeling data • Need to look into DeLa for data extraction

More Related