130 likes | 250 Views
Corroborate and Learn Facts from the Web. Presenter : Lin, Shu -Han Authors : Shubin Zhao, Jonathan Betz. SIGKDD (2008). Outline. Motivation Objective Methodology Experiments Conclusion Comments. Wikipedia. Motivation. moviefone.com. Infoplease.com. Many “Facts”
E N D
Corroborate and Learn Factsfrom the Web Presenter : Lin, Shu-Han Authors : Shubin Zhao,Jonathan Betz SIGKDD(2008)
Outline • Motivation • Objective • Methodology • Experiments • Conclusion • Comments
Wikipedia Motivation moviefone.com Infoplease.com • Many“Facts” • Themovie“Independenceday”
Motivation • Combinethem • MentionedThedirectorofmovie “Roland Emmerich”
Objectives Attribute value • Cachethenew“facts”:attribute+value • HavethesameHTMLpatterns • Thencorroboratethesenew“facts” • Checkotherwebsitealsomentionedaboutthese“facts”ornot • LearnthisfactGoodfact:commonlyreferenced.Incorrectfacts:veryfewmentioned.
Methodology – Overview 3.Extract New facts 1.RelevantPage 2.Match SearchEngine Wiki、Seedset 6
Methodology – Corroboratefact– Commonfact Match • Acommonfact • “Susan”,gender:female • Threshold: 7
Methodology – ExtractNewfacts 3.Extract New facts Cache“RepeatedHTMLpatterns” 8
Experiments 10
Conclusions • Findrelevantpagesaboutentities • Extractnewfactsbycorroboratingexistingfacts • BaseonstringmatchandHTMLpatterndiscovery
Comments • Advantage • Ideaisintuitive • Languageindependent • Searchandintegrateinformation/dataonweb • Drawback • Canonlyadapttotheoldentitiesor • Lotsofinformationhideinthearticles,notonlytables. • Application • Wecan’tuseittoextractthecommentornewinformation,suchasthecommentsoffoodintheblog