1 / 16

See Also: Auto Generated Recommendations

See Also: Auto Generated Recommendations. Mislav Cimperšak Marija Tkalec Siniša Jovčić Faculty of Humanities and Social Sciences Ivana Lučića 3, Zagreb, Croatia. INFuture 2009: Digital Resources and Knowledge Sharing. Introduction. reliable source of information

Download Presentation

See Also: Auto Generated Recommendations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. See Also: Auto Generated Recommendations Mislav Cimperšak Marija Tkalec SinišaJovčić Faculty of Humanities and Social Sciences IvanaLučića 3, Zagreb, Croatia INFuture 2009: Digital Resources and Knowledge Sharing

  2. Introduction • reliable source of information • accessible to everyone around the world • most up-to-date online encyclopedia • disadvantages

  3. See Also • list of similar or related articles to current article • urges users to continue browsing and reading articles on the page itself • user created list

  4. Thesis • users on similar topics create connections to the same articles • by comparing two articles connections we could conclude how similar these two articles are

  5. Goal • creation of an automatic recommendation system for the “See also” section based on soft clustering of documents

  6. GNOME Xfce KDE

  7. GNU General PublicLicense BSD license GNOME Apache License Xfce MIT license KDE GUI Linux Unix Windows Mac OS

  8. GNU General PublicLicense BSD license GNOME Apache License Xfce MIT license KDE Fedora GUI Linux Unix Windows Mac OS

  9. Research • 5,012 articles • 509 clusters • evaluation • compared against human created connections

  10. Research • tokens as vector features • document similarity threshold 0.5 • connections within Wikipedia treated as separate tokens with extra weight when comparing the articles

  11. Research • clusters in three categories • clusters with no real value • partially relevant clusters • well-formed clusters

  12. Clusters with no real value • generated clusters not usable • subjects in completely different theme areas • clusters which contain too many articles • St. Peter, Saint-John Perse, General Staff of Armed Forces of the Republic of Croatia, French Guiana, Marine mammals • Eurasian Avars, Psychology, birds

  13. Partially relevant clusters • some articles within this kind of clusters thematically related • remaining articles are not bound with the same subject or they don’t involve the same or similar area • Croatian Football Team, Parliamentray elections, Orthography, Presidential Elections, Croatian Academy of Science and Arts

  14. Well-formed clusters • articles connected to the same subject • Olympic Games in Tokyo, London, Barcelona, Atlanta, Athena, Beijing, Summer Olympic Games • football teams • Airbus airplanes

  15. Observations • Wikipedia users more often create connections on more general and more obvious terms

  16. Conclusion • the procedure cannot be regarded as being successful enough for an unsupervised implementation on articles in Croatian Wikipedia • most likely the algorithm would be more successful in a strictly supervised encyclopedia

More Related