1 / 25

Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20

Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper. Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20. Outline. Introduction Problem Definitions Computational Model

adair
Download Presentation

Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Truth Discovery with Multiple Confliction Information Providers on the WebXiaoxin Yin, Jiawei Han, Philip S.YuIndustrial and Government Track short paper Advisor:Dr. Koh Jia-Ling Speaker:Che-Wei Liang Date:2007.11.20

  2. Outline • Introduction • Problem Definitions • Computational Model • Web Site Trustworthiness and Fact Confidence • Iterative Computation • Empirical Study • Conclusions

  3. Introduction • World-wide web • a necessary part of our lives. • ex: Amazon.com, ShopZilla.com. • Is the world-wide web always trustable? • There is no guarantee for the correctness of information on the web.

  4. Introduction • Example 1: Authors of books  incomplete!  incorrect!

  5. Introduction • Ranking web pages • According to authority based on hyperlinks. • Ex: Authority-Hub analysis, PageRank, more general link-based analysis. • Does authority or popularity of web sites lead to accuracy of information?

  6. Introduction • Veracity problem • Discover the true fact about each object.

  7. Problem Definitions • Define1: Confidence of facts. • The probability of a fact f being correct, denote by s(f). • Define2: Trustworthiness of web sites. • The expected confidence of the facts provided by a web site w, denote by t(w).

  8. Problem Definitions • Facts may be conflict or supportive to each other. • Ex: “Jennifer Widom”, “J. Widom” • Concept of implication • imp(f1 → f2): f1’s influence on f2’s confidence.

  9. Basic heuristic • Basic heuristic 1.Usually there is only one true fact for a property of an object. 2.This true fact appears to be the same or similar on different web sites.

  10. Basic heuristic (cont.) • Basic heuristic 3.The false facts on different web sites are less likely to be the same or similar. 4.In a certain domain, a web site that provides mostly true facts for many objects will likely provide true facts for other objects.

  11. Web Site Trustworthiness and Fact Confidence • Trustworthinesst(w) whereF(w)isthesetoffactsprovidedbyw.

  12. Web Site Trustworthiness and Fact Confidence • moredifficulttoestimatetheconfidenceofafact.

  13. Web Site Trustworthiness and Fact Confidence • Simplecase • f1istheonlyfactaboutobjecto1 • assumew1andw2areindependent. • Confidences(f) W(f)isthesetofwebsitesprovidingf.

  14. Web Site Trustworthiness and Fact Confidence • Trustworthinessscoreofawebsite • τ(w)isbetween0and+∞,bettercharacterizeshowaccuratewis. • ex:t(w1)=0.9,t(w2)=0.99  t(w2) =1.1× t(w1) τ(w2)=2×τ(w1)

  15. Web Site Trustworthiness and Fact Confidence • Confidencescoreofafact • Property:

  16. Web Site Trustworthiness and Fact Confidence • adjustedconfidencescoreofafactf

  17. Web Site Trustworthiness and Fact Confidence • Computetheconfidenceoffbasedonσ*(f)inthesamewayascomputingitbasedonσ(f). • Differentwebsitesareindependent. addadampeningfactorγ,0<γ<1.  incorrect!

  18. Web Site Trustworthiness and Fact Confidence • Negative-confidenceproblem • afactfconflictingwithsomefactsprovidedbytrustworthywebsites.σ*(f)<0ands*(f)<0. • Ifγ.σ*(f)>0,s(f)isveryclosetos*(f). • Ifγ.σ*(f)<0,s(f)isclosetozerobutstillpositive.  unreasonable!

  19. Iterative Computation • TRUTHFINDER-Iterativemethod • TruthFinderhaslittleinformationaboutthewebsitesandthefacts. • Eachiteration,improvesitsknowledgeabouttrustworthinessandconfidence. • Stopswhenthecomputationreachesastablestate.

  20. Empirical Study • ComparewithVOTING • WhichChoosesthefactthatisprovidedbymostwebsites. • IntelPCwitha1.66GHzdual-coreprocessor,1GBmemory,WindowsXPProfessional.ρ = 0.5 and γ = 0.3.

  21. Empirical Study

  22. Empirical Study

  23. Empirical Study

  24. Empirical Study

  25. Conclusions • Introduce and formulate the Veracity problem • resolving conflicting facts from multiple web site. • finding true facts among them. • Propose TRUTHFINDER • Utilizes Web site trustworthiness and fact confidence to find trustable web sites and true facts. • Experiment achieves high accuracy.

More Related