400 likes | 515 Views
Measuring Reliability in Wikipedia. Wen -Yuan Zhu 2007.11.13. Outline. Introduction Some term of Wikipedia Basic concept of measuring reliability A way to measure reliability Conclusion Reference. Introduction. Wikipedia is the most popular online cooperation cyclopedia
E N D
Measuring Reliability in Wikipedia Wen-Yuan Zhu 2007.11.13
Outline • Introduction • Some term of Wikipedia • Basic concept of measuring reliability • A way to measure reliability • Conclusion • Reference
Introduction • Wikipedia is the most popular online cooperation cyclopedia • it has rich phenomenon which is difference to internet network and common webs
Some term of Wikipedia(2) • feature article • to be considered to be the best articles in Wikipedia • as determined by Wikipedian • at present, there are 1683 featured articles
Some term of Wikipedia(3) • if an article is a feature article, it will show the icon at right corner
Some term of Wikipedia(4) • articles are reviewed at Wikipedia:Featured article candidates • according to Wikipedia:Featured article criteria
Some term of Wikipedia(5) • make sure that it meets all of the featured article criteria • consensus must be reached that it meets the criteria
Some term of Wikipedia(6) • articles that no longer meet the criteria can be proposed for improvement or removal at Wikipedia:Featured article review
Some term of Wikipedia(7) • clean-up article • cleanup issues that this project covers may include wikification, spelling, grammar, tone, and sourcing • anyone can require to cleanup some page in Wikipedia:Cleanup
Basic concept of measuring reliability • if the article has the higher link ratio, the article has the higher reliability • this part referred to [2]
Basic concept of measuring reliability(2) • class of terms
Basic concept of measuring reliability(3) • relation between full name and short
Basic concept of measuring reliability(4) • Relation between PageRank and Link-ratio
Basic concept of measuring reliability(5) • it is not enough to measuring reliability only rely on linking data • there are too many factors to influence reliability of article in Wikipedia
A way to measure reliability • to use Bayesian statistic to model reliability in Wikipedia • to use revision history to assess the reliability of article in Wikipedia • this part referred to [3]
A way to measure reliability(3) • article trust • trustworthiness of a version of an article • fragment trust • trustworthiness of a fragment in a version of an article • author trust • trustworthiness of an author
A way to measure reliability(4) • is the version of an article • is the trust value of • the author who revised • is the trust value of • is the inserted content in by • is the deleted content in by • is the size of
A way to measure reliability(6) • Dynamic Bayesian networks • to be defined by a pair • is the graph structure of the network • is the set of the network’s conditional density distributions
A way to measure reliability(7) • from to , • the state at the revision is represented as a quad • the states satisfies the Markov property • since • ,
A way to measure reliability(9) • to determine the posterior density distribution of • is fully characterized by and
A way to measure reliability(10) • the Beta distribution • where is the beta function with and
A way to measure reliability(12) • to assume let • is the mean of • then or
A way to measure reliability(14) • featured articles • considered highly trustworthy • clean-up articles • considered untrustworthy • Normal articles • remaining articles
A way to measure reliability(15) • administrators • registered authors • anonymous authors • blocked users
A way to measure reliability(16) • a set of English articles from the Geography category in Wikipedia in January 2006 • 50 featured articles • 50 clean-up articles • 768 normal articles • manually classify
A way to measure reliability(17) • U.S. National Forest in Wikipedia • created by an anonymous author
A way to measure reliability(18) • is mean of the posterior density distribution
A way to measure reliability(19) • to developed a classifier based on aforementioned 50 featured articles and 50 clean-up articles • the training set contains 100 pairs , where is the trust value of an article and is its class
A way to measure reliability(20) • the learned rule for feature article is • the test size of 200 new articles(48805 revisions) was evaluated • the accuracy of prediction is 82%
A way to measure reliability(21) • to use trust track to predict events
A way to measure reliability(22) • the method has some problems • the reliability of author is not a constant • the test set of classifier is too small • what is the predicting standards of predict events
Conclusion • An overview of Wikipedia and measuring reliability in Wikipedia • to introduce some ways to measuring reliability in Wikipedia • to realize difficult problems of measuring reliability in Wikipedia
Reference [1] http://en.wikipedia.org/ [2] D. McGuinness, H. Zeng, Pda Silva, LDing, DNarayanan, and MBhaowal. Investigation into trust for collaborative information repositories: A Wikipedia case study. In Proceedings of the Workshop on Models of Trust for the Web, 2006. [3] H. Zeng, M. Alhoussaini, L. Ding, R. Fikes, and D. McGuinness. Computing trust from revision history. In Intl. Conf. on Privacy, Security and Trust, 2006.