1 / 20

The DIADEM Ontology

The DIADEM Ontology. Yiyang Bao 2 , Xiaonan Guo 2 , Giorgio Orsi 1,2 , Christian Schallhart 2 , Cheng Wang 2 1 Institute for the Future of Computing University of Oxford 2 Department of Computer Science University of Oxford. DIADEM 1.0. The languages of the web. <html> <head>

kelda
Download Presentation

The DIADEM Ontology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The DIADEM Ontology Yiyang Bao2, Xiaonan Guo2, Giorgio Orsi1,2, Christian Schallhart2, Cheng Wang2 1Institute for the Future of Computing University of Oxford 2Department of Computer Science University of Oxford DIADEM 1.0

  2. The languages of the web <html> <head> </head> <body> <title> </title> <div> … </div> </body> </html> • HTML objects provide the data model of a web-page. • CSS boxes and properties provide the layout. • Javascript provides web dynamics. this.value.toLowerCase(); ox:address • … ? xsd:string Web ox:Property • RDF annotations provide the conceptualization of the domain. Real World

  3. Why ontology? • Ontologies provide a conceptualization of a domain of interest (Gruber ‘93) ox:partOf • But… we do not only want to model the application domain ox:priceSegment ox:minPrice ox:address • We must model the domain of its web representations, i.e., its phenomenology. xsd:string ox:Property • In the end, it is also an ontology

  4. Why ontology? • Can be used to complete an incomplete model. • Can be used to verify a model. • Must tolerate uncertainty and inconsistency.

  5. A logical model for web extraction • Logical model for web entities • input and refinement forms. • result pages • page blocks (e.g., ads) • … • Phenomenological model • How logical entities are concretely represented

  6. The building blocks <form> <label for="male">Male</label> <input type="radio" name="sex" id="male" /> <label for="female">Female</label> <input type="radio" name="sex" id="female" /></form> • HTML entities • labels • fields (included links) • text-nodes and text attributes • Logical entities • constructs of our data model <div> <span> Price: </span> <span> £ 250 </span> </div> Price: £ 250 • Rules • describe the phenomenology

  7. The form model • Goal: model web form phenomenology

  8. The form model • Areas: • button • location • price • room • type • buy/rent • order-by • display • Root entity: • RealEstateForm • Properties: • partOf  hierarchical structures

  9. The form model: elements • price • type = {min, max} • purpose = {buy, rent} • currency • geographic • location • area/branch • granularity = {area, branch} • area/branch input • Area/branch select • address PO • radius • room • category = {bathroom, bedroom, …} • type = {min, max}

  10. The form model: elements • property type • order-by • button • submit • reset • map search • advance submit • link button • display • per page • add-in-time • new/resale • SSTC • buy • rent • buy/rent • other

  11. The form model: phenomenology • Based on linguistic annotations and (visual) heuristics. buyElement(X,F) :- visibleField(X), hasAnnotationFeature(X,"majorType", "reform.label"), hasAnnotationFeature(X,"minorType", "buy"), not hasAnnotationFeature(X,"minorType", "rent"), not hasAnnotationFeature(X,"minorType", "includeSSTC"), group(Ns,_,_,F),#member(X,Ns). radiusElement(X,F) :- visibleField(X), hasAnnotationFeature(X,"majorType","reform.label"), hasAnnotationFeature(X,"minorType","radius"), group(Ns,_,_,F),#member(X,Ns).

  12. The form model: segments • Segments • buttons • geographic • price • Room • property type • buy/rent • order-by • display • per page • add in time • new/resale • SSTC • A segment is: • a single element • a group of elements • a group of segments • a pair <segment, label> • Form • real-estate

  13. The result-page model • Goal: model result-pages phenomenology

  14. The result-page model • Attributes and values • e.g., < price, £ 250,000 > • Record • groups of pairs < attribute, value > • Data area • groups of records • Mandatory attribute(s) • must be present in a record • sanity check purposes

  15. A Conceptual Model for Data Extraction • Conceptual Modelling on the Web • Software modelling e.g., UML and stereotypes • Ad hoc languages e.g., WebML

  16. Linking the domain ontology: OntoX

  17. DIADEM Ontology: discussion • Adaptability • result-page model is substantially domain independent • Form model is domain dependent (entity types) • The number of entities is limited • Expressive power • safe nr-datalog with stratified negation and aggregation • pros: easy to compute • cons: not robust to uncertainty and inconsistencies

  18. Uncertainty, Vagueness and Inconsistencies

  19. Uncertainty, Vagueness and Inconsistencies • Origin • annotations are noisy • entity types are uncertain • Multiple models • probabilistic models • Markov Logic Networks (Lukasiewicz and Simari) • C-tables, Bayesian Networks (Olteanu) • ASP • disjunctive models • weak constraints

  20. Thank you!

More Related