1 / 19

Ontology Learning and Population using Heterogeneous Sources on the Web

Ontology Learning and Population using Heterogeneous Sources on the Web. Victor de Boer OLP-AIO’s Workshop March, 16 th , 2005. About me. Victor de Boer Artificial Intelligence @ UvA Graduated on Human Memory modelling AiO since jan 1 st 2004

perrin
Download Presentation

Ontology Learning and Population using Heterogeneous Sources on the Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ontology Learning and Population using Heterogeneous Sources on the Web Victor de Boer OLP-AIO’s Workshop March, 16th, 2005

  2. About me • Victor de Boer • Artificial Intelligence @ UvA • Graduated on Human Memory modelling • AiO since jan 1st 2004 • Supervisors: Bob Wielinga and Maarten van Someren • MultimediaN • (Mn-9c: VU, CWI, DEN)

  3. Outline • Introduction and Research Questions • Ontology Learning and Population Task • My approach: Redundancy-based • Case Study • Results • Further Research • Questions / Discussion

  4. Intro and Research Questions • Backbone of Semantic Web: • Ontologies • Content • Manual construction has its flaws and is also very time-consuming. • Web contains a lot of knowledge: let’s use it. • My research questions: • How can we automatically construct, enrich and populate ontologies using heterogeneous sources on the Web? • And how can these ontologies help us in extracting more information? (bootstrap)

  5. OLP Task Description • Ontology Learning: • Concepts: • NERC, LSI, … C1 C3 C2 C4

  6. OLP Task Description • Ontology Learning: • Concepts: • NERC, LSI, … • Hierarchical Structure • Hearst Patterns,… C1 C3 C2 C4

  7. OLP Task Description • Ontology Learning: • Concepts: • NERC, LSI, … • Hierarchical Structure • Hearst Patterns,… • Other relations C1 C3 C2 C4

  8. OLP Task Description • Ontology Learning: • Concepts: • NERC, LSI, … • Hierarchical Structure • Hearst Patterns,… • Other relations • Ontology Population • Instances C1 C3 C2 C4 I1 I3 I2

  9. OLP Task Description • Ontology Learning: • Concepts: • NERC, LSI, … • Hierarchical Structure • Hearst Patterns,… • Other relations • Ontology Population • Instances • Relation Instances C1 C3 C2 C4 I1 I3 I2

  10. OLP Task Description • Ontology Learning: • Concepts: • NERC, LSI, … • Hierarchical Structure • Hearst Patterns,… • Other relations • Ontology Population • Instances • Relation Instances • Ontology Enrichment C1 C3 C2 C4 I1 I3 I2

  11. Relation Instantiation • We have: • two Concepts C1 and C2, • a relation R(C1,C2) • and instances I1 of C1 and I2 of C2. • Find for which instances the relation R holds. • Examples: • <Countries, has_city, City> • <Movie, has_director, Director> • <Artstyle, has_artist, Artist> • Information Extraction!

  12. Approaches • Current approaches: • NLP based. Work well for Natural language documents • Wrapper-like. Work well with (semi-)structured documents • Not a generic approach • My approach: • Use generic methods, applicable to heterogeneous sources, combining information to collect evidence of this relation. Redundancy of information should compensate for the loss of subtlety.

  13. Case Study: Domain • Art and Architecture Thesaurus (AAT) • Unified List of Artist Names (ULAN) • Relation: <aat:style, aua:has_artist, ulan:artist> • Find instances of this relation Has_artist

  14. Case Study: Method Manual wrapper Person Name Extractor ULAN-check Seed list AAT Otto Dix Otto Dix Otto Dix S. Freud George Grosz George Grosz George Grosz Score: “George Grosz” + 0.5

  15. Case Study: Results • Impressionism: 200 pages (+/-120 used) Seed Artists: Degas, Gauguin, Boudin, Morisot, Caillebotte, Seurat, Monet, Renoir, Manet sisley, alfred ; 0.08 ; ulan#19582 cassatt, mary ; 0.0780414 ; ulan#8671 cezanne, paul ; 0.0764626 ; ulan#9730 bazille, frederic ; 0.0394824 ; ulan#2147 signac, paul ; 0.0265291 ; ulan#19142 guillaumin, armand ; 0.0263668 ; ulan#11549 gustave courbet ; 0.0218521 ; ulan#12992 bonnard, pierre ; 0.0149454 ; ulan#4215 henri matisse ; 0.0134152 ; ulan#5698 camille corot ; 0.0128969 ; ulan#10536 d'orsay ; 0.0123066 ; ulan#28304 auguste rodin ; 0.0115357 ; ulan#17831 theodore rousseau ; 0.011157 ; ulan#18605 childe hassam ; 0.0107054 ; ulan#12300

  16. Case Study: Results • Evaluation problems • 18 Impressionists (Gold Standard)

  17. Assumptions, Limitations • Conclusions: • It seems to work • Evaluation a problem • Assumptions: • The redundancy of information we extract by using multiple, heterogeneous sources compensates what we lose by not using more ‘sophisticated’ methods • R must be one-to-many relation (no functional properties) • C1 must be ‘googlable’ • C2 must be ‘extractable’

  18. Further Research • Collect more results (how robust is it?) • Different domains • More heterogeneous sources (dB’s), offline dictionaries… • Use page classification/trustability • Evaluation • Use Ontological information

  19. Questions?

More Related