• 290 likes • 423 Views
Extracting Instances of Relations from Web Documents using Redundancy. Victor de Boer OLP-AIO’s Workshop March, 16 th , 2005. Outline. Introduction/Recap Relation Instantiation Task My approach: Redundancy-based Extracting Artists Extracting Periods Future Research
E N D
Extracting Instances of Relations from WebDocuments using Redundancy Victor de Boer OLP-AIO’s Workshop March, 16th, 2005
Outline • Introduction/Recap • Relation Instantiation Task • My approach: Redundancy-based • Extracting Artists • Extracting Periods • Future Research • Questions / Discussion
Intro and Research Questions • How can we automatically construct, enrich and populate ontologies using heterogeneous sources on the Web?
OLP • Ontology Learning: • Concepts: • NERC, LSI, … • Hierarchical Structure • Hearst Patterns,… • Other relations • Ontology Population • Instances • Relation Instances • Ontology Enrichment C1 C3 C2 C4 I1 I3 I2
Relation Instantiation • We have: • two Concepts C1 and C2, • a relation R(C1,C2) • and instances I1 of C1 and I2 of C2. • Find for which instances the relation r holds. • Examples: • <Countries, has_city, City> • <Movie, has_director, Director> • <Artstyle, has_artist, Artist> • Information Extraction!
Approaches • Current approaches are not generic enough • Goal of my approach: • A generic method, applicable to heterogeneous sources. Redundancy of information should do the rest.
Extracting Artists • MultimediaN e-culture Project • Art and Architecture Thesaurus (AAT) • Unified List of Artist Names (ULAN) • Relation: <aat:style, aua:has_artist, ulan:artist> • Find instances of this relation e-culture: Has_artist
Extracting Artists 200 docs
Extracting Artists Person Name Extractor (CUTE) Match against ULAN: Artists 200 docs
Extracting Artists Person Name Extractor (CUTE) Match against ULAN: Artists Tuples: <Ulan Artists, Doc> 200 docs
Extracting Artists Person Name Extractor (CUTE) Instance Score Document Score Match against ULAN: Artists Tuples: <Ulan Artists, Doc> 200 docs
Experiments (ESWC 2006 submission) • Two Art Styles • ‘aat:Expressionism’ • ‘aat:Impressionism’ • Evaluation: Gold Standard extracted from 11 encyclopedic webpages • Three chosen as seeds • Resulting Ordered List • Precision/Recall/F-graph
Results • Max value of F is 0.70 • recall=0.56 • precision=0.94 • Threshold=0.0012 aat:Expressionism aat:Impressionism • Max value of F =0.76 • recall = 0.73 • precision = 0.79 • Threshold= 0.0084
ECAI ’06 Experiments • 12 Art Styles, only iterative • No Gold Standard: only precision • Indication of Iteration stop: • Percentage of max • Maximum nr of extractions
ECAI ’06 Experiments • 12 Art Styles, only iterative • No Gold Standard: only precision • Indication of Iteration stop: • Percentage of max • Maximum nr of extractions • At 30% and max=20 • Dada: 1.0 • Expr: 0.85 • Impr: 0.75 • Table of values vs average Precision
Artstyle-Periods • Same type of approach: • Extract a lot of instances from WWW and rank them according to some. • In this case: extract years and do postprocessing to end up with periods • Steps: • Retrieve 1000 pages for an artstyle (Google) • Extract years (reg.exp.) • Normalize (Google)
Gaussian Mu= 1889.125626 SD= 53.94131969
Gaussian Mu= 1661.79996 SD=66.88810033
Future Research • Artists • Complete evaluation • Threshold? • Values = Statistics? • More domains • Dates • Improve • Integrate in method • Gauss, Block, Fuzzy? • How does this relate to Ontological Knowledge?
More Future • Integrate knowledge from different, heterogeneous sources. • What is the style of a painting X? • X was painted by Y • Y is associated with art styles A,B,C • A = period I1, B = I2, C= I3 • X is painted in year T • T e I2,-> <X has_style B> • Generic Method
W W W Ontological knowledge Statistics, uncertainty, fuzzyness information integration?