250 likes | 421 Views
Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper. Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20. Outline. Introduction Problem Definitions Computational Model
E N D
Truth Discovery with Multiple Confliction Information Providers on the WebXiaoxin Yin, Jiawei Han, Philip S.YuIndustrial and Government Track short paper Advisor:Dr. Koh Jia-Ling Speaker:Che-Wei Liang Date:2007.11.20
Outline • Introduction • Problem Definitions • Computational Model • Web Site Trustworthiness and Fact Confidence • Iterative Computation • Empirical Study • Conclusions
Introduction • World-wide web • a necessary part of our lives. • ex: Amazon.com, ShopZilla.com. • Is the world-wide web always trustable? • There is no guarantee for the correctness of information on the web.
Introduction • Example 1: Authors of books incomplete! incorrect!
Introduction • Ranking web pages • According to authority based on hyperlinks. • Ex: Authority-Hub analysis, PageRank, more general link-based analysis. • Does authority or popularity of web sites lead to accuracy of information?
Introduction • Veracity problem • Discover the true fact about each object.
Problem Definitions • Define1: Confidence of facts. • The probability of a fact f being correct, denote by s(f). • Define2: Trustworthiness of web sites. • The expected confidence of the facts provided by a web site w, denote by t(w).
Problem Definitions • Facts may be conflict or supportive to each other. • Ex: “Jennifer Widom”, “J. Widom” • Concept of implication • imp(f1 → f2): f1’s influence on f2’s confidence.
Basic heuristic • Basic heuristic 1.Usually there is only one true fact for a property of an object. 2.This true fact appears to be the same or similar on different web sites.
Basic heuristic (cont.) • Basic heuristic 3.The false facts on different web sites are less likely to be the same or similar. 4.In a certain domain, a web site that provides mostly true facts for many objects will likely provide true facts for other objects.
Web Site Trustworthiness and Fact Confidence • Trustworthinesst(w) whereF(w)isthesetoffactsprovidedbyw.
Web Site Trustworthiness and Fact Confidence • moredifficulttoestimatetheconfidenceofafact.
Web Site Trustworthiness and Fact Confidence • Simplecase • f1istheonlyfactaboutobjecto1 • assumew1andw2areindependent. • Confidences(f) W(f)isthesetofwebsitesprovidingf.
Web Site Trustworthiness and Fact Confidence • Trustworthinessscoreofawebsite • τ(w)isbetween0and+∞,bettercharacterizeshowaccuratewis. • ex:t(w1)=0.9,t(w2)=0.99 t(w2) =1.1× t(w1) τ(w2)=2×τ(w1)
Web Site Trustworthiness and Fact Confidence • Confidencescoreofafact • Property:
Web Site Trustworthiness and Fact Confidence • adjustedconfidencescoreofafactf
Web Site Trustworthiness and Fact Confidence • Computetheconfidenceoffbasedonσ*(f)inthesamewayascomputingitbasedonσ(f). • Differentwebsitesareindependent. addadampeningfactorγ,0<γ<1. incorrect!
Web Site Trustworthiness and Fact Confidence • Negative-confidenceproblem • afactfconflictingwithsomefactsprovidedbytrustworthywebsites.σ*(f)<0ands*(f)<0. • Ifγ.σ*(f)>0,s(f)isveryclosetos*(f). • Ifγ.σ*(f)<0,s(f)isclosetozerobutstillpositive. unreasonable!
Iterative Computation • TRUTHFINDER-Iterativemethod • TruthFinderhaslittleinformationaboutthewebsitesandthefacts. • Eachiteration,improvesitsknowledgeabouttrustworthinessandconfidence. • Stopswhenthecomputationreachesastablestate.
Empirical Study • ComparewithVOTING • WhichChoosesthefactthatisprovidedbymostwebsites. • IntelPCwitha1.66GHzdual-coreprocessor,1GBmemory,WindowsXPProfessional.ρ = 0.5 and γ = 0.3.
Conclusions • Introduce and formulate the Veracity problem • resolving conflicting facts from multiple web site. • finding true facts among them. • Propose TRUTHFINDER • Utilizes Web site trustworthiness and fact confidence to find trustable web sites and true facts. • Experiment achieves high accuracy.