360 likes | 484 Views
ITCS 6010 DATA INTEGRATION Presentation On Social Web. By : Garima Indurkhya Jay Parikh Shraddha Herlekar Vikrant Naik. Paper 1. The Structure of Collaborative Tagging Systems Authors : Golder , S. and Huberman , B. ,2005. Contents. What is tagging? Tagging & Taxonomy
E N D
ITCS 6010DATA INTEGRATIONPresentation On Social Web By : GarimaIndurkhya Jay Parikh ShraddhaHerlekar Vikrant Naik
Paper 1 • The Structure of Collaborative Tagging Systems Authors : Golder, S. and Huberman, B. ,2005.
Contents • What is tagging? • Tagging & Taxonomy • Aspects of Classification • Kinds of Tags • Case Study : Del.icio.us
What is Tagging? • Marking the content with descriptive terms Examples : • Catalog indexing by Librarian • Keywords to describe a blog entry / Photo on web • Collaborative tagging : practice of allowing anyone to freely attach keywords or tags to content Social Bookmark Managers: • Del.icio.us (http://del.icio.us) • Flickr (http://www.flickr.com) • CiteULike(http://www.citeulike.org/) • Cloudalicious (http://cloudalicio.us/)
Tagging & Taxonomy • Tagging • Non-hierarchical • Describe the information held within them • Tag based search returns great variety of things simultaneously • For example : the Tags for the article about cats in Africa could be cats, africa, animals, cheetahs etc. • Taxonomy • Hierarchical • For example : the Taxonomy for the article about cats in Africa could be
Aspects of Classification • Problems to be considered while classifying • Semantic • Polysemy • Synonymy • Cognitive • Basic level variation • Sense making
Kinds of Tags • Several kinds of functions performed by tags for bookmarks • Identifying What (or Who) it is About • Identifying What it Is • Identifying Who Owns It • Identifying Qualities or Characteristics • Self Reference • Task Organizing
Case Study : Del.icio.us • Del.icio.us • Collaborative tagging system for web • Social bookmark manager • Storage of personal bookmarks • Public nature of bookmarks
Paper 2 • On the Selection of Tags for Tag Clouds Authors : P. Venetis, et. al., WSDM, 2011.
Contents • Tag Cloud • System Model • Properties of Tag Cloud • Algorithms to generate Tag Clouds • User Models for Tag Clouds • Experimental Evaluation of algorithms • Evaluation of User Models • Conclusion
Tag Cloud • Definition A visual representation of social tags, organized into paragraph-style layout, usually in alphabeticalorder, where the relative size and weightofthefont for each tag correspondsto the relative frequency of its use. • Compact • Three dimension at a time! • alphabetical order • size indicating importance • the tags themselves
Tag Cloud • Tag cloud for our example “cats in africa”
Tag Cloud • Uses of Tag Cloud • Summarizing web search results • Summarizing results over biomedical databases • Summarizing results of structured queries
Tag Cloud • Example of tag cloud for summarizing web search results
System Model • Terminologies • C = set of objects (e.g. web pages / articles) • T = set of tags • Cq = set of objects for query q • |Cq| = number of objects in Cq • Tq = set of tags for query q • Aq(t) = Association set for V tag t Tq ,there is c Cq • S = set of tags in tag cloud Tq • |S| = number of tags in tag cloud • Partial (scoring) function • s(t,c) : T x C [0,1] • Similarity function • Sim(. , .) : C x C [0,1]
Properties of Tag Cloud • Extent of S • The cardinality of S • ext(s) = |s| • Coverage of S • Scored size of objects associated with S • Where |Cq|s,q = sum of scores for every c Cq
Properties of Tag Cloud • Overlap of S • The extent of redundancy • Cohesiveness of S • How closely related the objects in each association set of S are
Properties of Tag Cloud • Relevance of S • Relevance between tags in S and original query q • Popularityof S • A tag is more popular if it is associated with many objects in Cq.
Properties of Tag Cloud • Independenceof S • Tags are Independent if they refer to dissimilar objects • Balance of S • Ratio of minimum size of Association set to the maximum size of Association set for a particular tag in a Tag cloud S.
Algosto generate Tag Clouds • Single vs Multi-objective tag selection • E.g. achieving high popularity, get more coverage, be more cohesive, Incorporating relevance • Input to algorithms • Cq,Tqand S ⊆ Tq
Algosto generate Tag Clouds • Popularity algorithm(POP) • The most common algorithm in social information sharing • A tag is more popular if it is associated with many objects in Cq. • It allows user to see what other people are mostly interested in sharing. • For query q and parameter k, the algo returns top k tags in Tq according to their |Aq(t)|.
Tf-idf based algorithms(TF,WTF) • f (q, t, c) = s(t, c) (tf-idf method) • f (q, t, c) = s(t, c).s(q, c) (weighted-idf or WTF method)
User Models for Tag Clouds • Build an Ideal user satisfaction model • Use this model to compare the tag clouds • Base model: Coverage • The probability that an object is of the user’s interest is r.p, while the probability that an object is of the user’s interest is p.
User Models for Tag Clouds • Incorporating Relevance • For an object the probability that it is of the user’s interest is and for every object the probability that it is of the user’s interest is p. • Incorporating Cohesiveness • For an object the probability that it is of the user’s interest is and for every object the probability that it is of the user’s interest is p.
User Models for Tag Clouds • Incorporating Overlap For an object c that is contained by and no other association sets the probability that it is of the user’s interest is the one that can be seen in and for every object the probability that it is of the user’s interest is p. • Taking into account Scores • Closing Comment
Experimental Evaluation Datasets: • CourseRank • Del.icio.us
Experimental Evaluation of algorithms: CourseRank • Most metrics are not correlated • Only coverage and popularity correlated • High coverage might not be highly relevant • Algorithms impact metrics differently
Experimental Evaluation of algorithms : del.icio.us • Similar, but overall range of values for coverage metric is around 0.2-0.8, much lower than for CourseRank dataset
Impact on failure probability • Algorithms impact failure probability differently
Evaluation of User Models • 80% predicted correctly, even when failure probability small • 100% for 0.15-0.25 difference, so if agreement, we get best tag cloud !
Conclusion • Metrics generally not correlated • So, different important aspects of tag cloud are covered. • COV best algorithm to find tag cloud followed by POP • POP works well with relevance and cohesiveness! • User model- useful tool to identify tag clouds preferred by users
Future Work • Extend model to capture balance metric • Construct algorithm to minimize failure probability for a dataset and given extent • Take into account items with unassigned and spam tags