Information Credibility on the Web

Information Credibility on the Web 报告人：艾静 2008-12-20

Outline • A Brief Introduction to Information Credibility • Credibility in Different Web Scenarios • Information Credibility Assessment and Evaluation • Summary & Our Ideas

Example 1: False News Steve Jobs, CEO of Apple, rushed to ER following severe heart attack • Civil news, blog news An hour later news site instantly vaporize 9,000,000,000 dollars! At 9 on October 3, 2008 10:00 The spokesman denied the message 10:20 iReport deleted it

Frauds inferior merchandise Misleading, and biased comments …… Example 2: E-commerce I want to buy a notebook. which site provides certified product? Internet Shoppingmuch lower price www.taobao.com www.amazon.cn www.dangdang.com

Example 3: Search Engine Ranking • Baidu's bid-for-ranking • Google • Web spam • Cloaking • ……

Overview • Credibility: the objective and subjective components of the believability of a source or message • Two key components: • trustworthiness • expertise (authority of the data source) • Credibility on the web has become an important topic since the mid-1990s

Credibility in Different Web Scenarios Web Page & Web Site Collaborative Repositories Wikipedia Credibility on the Web P2P Network Social Network Online Discussion Forums Semantic Web

Credibility for webpage & website • Two perspectives: • From human browsers: • —— How to identify the true information and false? • —— I feel this website is more reliable than that. • —— Which features make a website more reliable? • From search engines: • There is too much “web spam” on the Internet • How to detect them automatically and efficiently? (detailed introduction by Hu Xiangmei in Report 2) Web pages that exist only to mislead search engines into (mis)leading users to certain web sites.

Related References • BJ Fogg, Jonathan Marshall, Othman Laraki, et al. What Makes Web Sites Credible?——A Report on a Large Quantitative Study. Stanford University, SIGCHI2001 • R. Lee, D.Kitayama and K. Sumiya. Web-based Evidence Excavation to Explore the Authenticity of Local Events. University of Hyogo, Japan. WICOW2008 • Y. Kawai, Y. Fujita, T. Kumamoto. Using a Sentiment Map for Visualizing Credibility of News Sites on the Web. Kyoto Sangyo University, Japan. WICOW2008

Which features make web sites more credible? Over 1400 participators Sample online questionnaire Evaluating 51different Web site elements What Makes Web Sites Credible?——A Report on a Large Quantitative Study, Stanford University, SIGCHI’01

Identify Credibility of News Sites on the Web (1) a bag of words characteristic • Believe Evidence Search for Events on the web • event =(time, space, vestige) Construct a database of real-world events from the Web User Interface for Credible Search Credible event database Web-based Evidence Excavation to Explore the Authenticity of Local Events, University of Hyogo, WICOW08

P2P Architecture: the open and anonymous nature Peer-to-Peer Networks Offer an almost ideal environment for the spread of inauthentic files File sharing network A client-server network

Related References • F. Cornelli, E. Damiani, S. D. C. D. Vimercati, S. Paraboschi, and S. Samarati. Choosing Reputable Servents in a P2P Network. In Proceedings of the 11th World Wide Web Conference, Hawaii, USA, May 2002 • K. Aberer and Z. Despotovic. Managing Trust in a Peer-2-Peer Information System. In Proceedings of the 10th International Conference on Information and Knowledge Management (ACM CIKM), New York, USA, 2001. • Kamvar, S.D., Schlosser, M.T., Garcia-Molina, H.: The eigentrust algorithm for reputation management in p2p networks. In: Proceedings of the 12th international conference on World Wide Web. (2003) • Damiani, E., di Vimercati, S., Paraboschi, S., Samarati, P., Violante, F.: A reputation-based approach for choosing reliable resources in peer-to-peer networks. (2002) In 9th ACM Conf. on Computer and Communications Security. • S. D. Kamvar, M. T. Schlosser, and H. Garcia-Molina. Incentives for Combatting Freeriding on P2P Networks. Technical report, Stanford University, 2003.

Problem • Attacks by anonymous malicious peers: introduce viruses… • Goal: • identify malicious peers that provide inauthentic files • Based on the peer’s previous behavior: • history of uploads The EigenTrust Algorithm for Reputation Management in P2P Networks, Stanford University, WWW03

Peer 6 Peer 4 Peer 1 Peer 2 Peer 8 Friends of Friends What they think of peer k C42 • Problem: • Each peer has limited past experience. • Knows few other peers. • Ask for the opinions of the people who you trust. • Compute a global trust ti value for a peer how much you trust him C14 she will have a complete view of the network

What is Semantic Web? User Centric & Human Understanding • The Semantic Web: • The extension of the current web • information is given well-definedmeaning • enable computers and humans to work in cooperation • also be known as Web3.0 (Tim Berners-Lee in 1999) • Semantic Web provides computer the ability of automatically processing information Computer Understanding!!! Agent

Related References • McGuinness, D.L., Pinheiro da Silva, P.: Explaining answers from the semantic web: The inference web approach. In: Journal of Web Semantics. Volume 1. (2004) 397–413 • M. Richardson, R. Agrawal, and P. Domingos. Trustmanagement for the semantic web. In Proceedings ofthe Second International Semantic Web Conference,pages 351–368, 2003. • Ceglowski M, Coburn A, Cuadrado J. Semantic search ofunstructured data using contextual network graphs, June2003. • Gans G, Jarke M, Kethers S, Lakemeyer G. Modeling the impactof trust and distrust in agent networks. In: Proceedings of theThird International Bi-Conference Workshop on Agent-orientedInformation Systems, Montreal, Canada, May 2001. • Golbeck J, Parsia B, Hendler J. Trust networks on the semantic web.In: Proceedings of Cooperative Intelligent Agents, Helsinki, Finland,August 2003.

Trust Management for the Semantic Web • The semantic web • large, uncensored system • anyone may contribute • Assume all the information on the Semantic Web: logical assertions • Establish the degree of belief in a statement • Each source’s belief in the statement and the user’s trust in each source • ——how can a user decide how much to trust a source she does not know directly? • employing a web of trust • each user maintains a small set of users he/she trusts Trust Management for the Semantic Web, University of Washington, International Semantic Web Conference 2003

Web of Trust People/Agent Producer/Consumer Each user specifies a small set of users she/he trusts i Local neighborshelp in determining trust of distant neighbors

Social Network • social network • a social structure made of nodes (generally individuals or organizations) • tied by one or more specific types of interdependency • values, visions, ideas, financial exchange, friendship, kinship…… • graph-based structures • A (directed) network of people

Related References • Mui, L.: Computational Models of Trust and Reputation: Agents, Evolutionary Games, and Social Networks. PhD thesis, MIT (2002) • C-N Ziegler and G. Lausen: Propagation Models for Trust and Distrust in Social Networks. Information Systems Frontiers 7:4/5, 337–358, 2005 • Guha, R., Kumar R., Raghavan P., and Tomkins A. Propagation of trust and distrust. In Proceedings of the Thirteenth International World Wide Web Conference, 2004

Propagation of Trust and Distrust • Experience with real-world suggests that distrust is at least as important as trust • Trust useful, authentic information • Distrust  Disinformation(useless, inauthentic, fraudulent information) • the eigenvector of the matrix of distrust values

Solution • n users • n x n matrices: T and D (T: Trust D: Distrust) • tij = i ’s trust in j • 0 <= tij <= 1 • same for D (distrust) • predict unknown values from T and D • M: generic belief matrix

E-commerce and Recommendation Systems ……

Related References • J. Staddon, R.Chow. Detecting Reviewer Bias Through Web-Based Association Mining. PARC, WICOW08 • P. Kollock. The production of trust in online markets. In E. J. Lawler and M. Macy, S. Thyne, and H. A. Walker, editors, Advances in Group Processes, volume 16, pages 99–123. JAI Press, 1999. • S. Nakamura, M.Shimizu and K. Tanaka. Can Social Annotation Support Users in Evaluating the Trustworthiness of Video Clips? Graduate School of Informatics, Kyoto University, WICOW08 • N.Wanas, M.El-Saban, H. Ashour, W. Ammar. Automatic Scoring of Online Discussion Posts. Cairo Microsoft Innovation Center, WICOW08 • S. Ba, A. B. Whinston, and H. Zhang. Building trust in online auction markets through an economic incentive mechanism. Decision Support Systems, 35(3):273–286, 2002.

Users Scoring • collaborative intelligence the posts that are worth attending • Post rating: five point scale • automatically assess online discussion posts automatic content filtering in online discussion forums • Support Vector Machine (SVM) classifier Automatic Scoring of Online Discussion Posts, Cairo Microsoft Innovation Center, WICOW08

Solution • Aim: detect potential bias, assess the validity of online reviews • bring the broader context of the reviewer into the online community • association rules between book reviewers and the authors of the books they review • an association rule: • A, B: reviewer/author of the same book • Pr(B|A) is large Pr(A^B) is large Frequently co-occurrence of names

Wikipedia • The emerging pattern for building large information repositories • encourage many people to collaborate in a distributed manner • create and maintain a repository of shared content • open editing: allows users to freely create and edit web pages

Related References • Blaze, M., Feigenbaum, J., Lacy, J.: Decentralized trust management. In: Proceedings of the 1996 IEEE Symposium on Security and Privacy. (1996) 164–173 • Deborah L. McGuinness1, Honglei Zeng1, Paulo Pinheiro da Silva. Investigations into Trust for Collaborative Information Repositories: A Wikipedia Case Study. WWW2006 • Rui Lopes, Luís Carriço. On the Credibility of Wikipedia: an Accessibility Perspective. WICOW2008 • B. Thomas Adler, Luca de Alfaro. A Content-Driven Reputation System for the Wikipedia, WWW2007 • M. Hu, E.-P. Lim, A. Sun, H. W. Lauw, and B.-Q. Vuong. Measuring article quality in wikipedia: models and evaluation. In CIKM ’07.

Trust in Social Collaborative Information Spaces • Concepts • Article • Version (of an article) • Fragment • Author • Relations • An article: multiple versions • A version: multiple fragments • A fragment: an author • A version: multiple authors Article 1:n Version 1:n 1:n Author 1:1 Fragment Investigations into Trust for Collaborative Information Repositories: A Wikipedia Case Study, Stanford University, WWW06

Deriving Trust from Revision History • Revision Operations (insertion, deletion, modification) implies trust • trustworthiness of the revised article depends on • the trustworthiness of the previous version • the author of the last revision • the modified content involved in the fragment • Revision history is widely available in cooperative information systems

Wikipedia Article with Citation Trust View Citation Revision Fragments are colored per their trust values computed from Citation Trust

Related References • Irit Askira Gelman, Anthony L. Barletta. A “Quick and Dirty” Website Data Quality Indicator. University of Arizona. WICOW2008 • Llewellyn C.M. Tang, Yuyang Zhao, Simon Austin. A Characteristic Based Information Evaluation Model. Loughborough University. WICOW2008

literature on web data credibility assessment Spelling Error: Recieve Accomodate Accross Truely Acheive Affraid Agressive Appearence Tomorow Arguement • the spelling error rate the quality of the document • Application: • social forum exchanges, personal websites, wikipedia, etc • a minimal setE of spelling errors (10 common English spelling errors) • hit counts of search engine queries on E positively related √ A “Quick and Dirty” Website Data Quality Indicator, University of Arizona, WICOW08, short paper

Characteristic Based Information Evaluation • Information many characteristic quantify Value of Information (VOI) A Characteristic Based Information Evaluation Model, Department of Civil and Building Engineering Loughborough University, WICOW08

Summary Semantic Web Online Discussion Forums Collaborative Repositories Wikipedia P2P network Social network Scoring by users’ comments Ranking by trust value Machine learning classifier Network structure Trust of web Trust value matrix Influence Propagation Graph mining Rating model Scoring mechanism

Information Credibility on the Web

Information Credibility on the Web

Presentation Transcript

Finding Information on the web

Public Health Information on the Web

Information Interchange on the Semantic Web

SNP Information on the Web

Credibility on the Internet

Credibility and Human Information Behavior

Web Credibility Checklist

Information on the Web and in Databases

Web Credibility Questionnaire

Finnish Health Information on the Semantic Web

Information Resources on The Web

Searching and Integrating Information on the Web

Energy Information on the Web

Information Sources on the Web

A Method of Rating the Credibility of News Documents on the Web

Information Extraction on the Web

Information on Air Quality on the web

Searching and Integrating Information on the Web

Retrieving Information on the Web

Validating Information on the Web

Evaluating Information on the Web

Evaluating Information on the Web