1 / 36

ITCS 6010 DATA INTEGRATION Presentation On Social Web

ITCS 6010 DATA INTEGRATION Presentation On Social Web. By : Garima Indurkhya Jay Parikh Shraddha Herlekar Vikrant Naik. Paper 1. The Structure of Collaborative Tagging Systems Authors : Golder , S. and Huberman , B. ,2005. Contents. What is tagging? Tagging & Taxonomy

thad
Download Presentation

ITCS 6010 DATA INTEGRATION Presentation On Social Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ITCS 6010DATA INTEGRATIONPresentation On Social Web By : GarimaIndurkhya Jay Parikh ShraddhaHerlekar Vikrant Naik

  2. Paper 1 • The Structure of Collaborative Tagging Systems Authors : Golder, S. and Huberman, B. ,2005.

  3. Contents • What is tagging? • Tagging & Taxonomy • Aspects of Classification • Kinds of Tags • Case Study : Del.icio.us

  4. What is Tagging? • Marking the content with descriptive terms Examples : • Catalog indexing by Librarian • Keywords to describe a blog entry / Photo on web • Collaborative tagging : practice of allowing anyone to freely attach keywords or tags to content Social Bookmark Managers: • Del.icio.us (http://del.icio.us) • Flickr (http://www.flickr.com) • CiteULike(http://www.citeulike.org/) • Cloudalicious (http://cloudalicio.us/)

  5. Tagging & Taxonomy • Tagging • Non-hierarchical • Describe the information held within them • Tag based search returns great variety of things simultaneously • For example : the Tags for the article about cats in Africa could be cats, africa, animals, cheetahs etc. • Taxonomy • Hierarchical • For example : the Taxonomy for the article about cats in Africa could be

  6. Aspects of Classification • Problems to be considered while classifying • Semantic • Polysemy • Synonymy • Cognitive • Basic level variation • Sense making

  7. Kinds of Tags • Several kinds of functions performed by tags for bookmarks • Identifying What (or Who) it is About • Identifying What it Is • Identifying Who Owns It • Identifying Qualities or Characteristics • Self Reference • Task Organizing

  8. Case Study : Del.icio.us • Del.icio.us • Collaborative tagging system for web • Social bookmark manager • Storage of personal bookmarks • Public nature of bookmarks

  9. Case Study : Del.icio.us

  10. Paper 2 • On the Selection of Tags for Tag Clouds Authors : P. Venetis, et. al., WSDM, 2011.

  11. Contents • Tag Cloud • System Model • Properties of Tag Cloud • Algorithms to generate Tag Clouds • User Models for Tag Clouds • Experimental Evaluation of algorithms • Evaluation of User Models • Conclusion

  12. Tag Cloud • Definition A visual representation of social tags, organized into paragraph-style layout, usually in alphabeticalorder, where the relative size and weightofthefont for each tag correspondsto the relative frequency of its use. • Compact • Three dimension at a time! • alphabetical order • size indicating importance • the tags themselves

  13. Tag Cloud • Tag cloud for our example “cats in africa”

  14. Tag Cloud • Uses of Tag Cloud • Summarizing web search results • Summarizing results over biomedical databases • Summarizing results of structured queries

  15. Tag Cloud • Example of tag cloud for summarizing web search results

  16. System Model • Terminologies • C = set of objects (e.g. web pages / articles) • T = set of tags • Cq = set of objects for query q • |Cq| = number of objects in Cq • Tq = set of tags for query q • Aq(t) = Association set for V tag t Tq ,there is c Cq • S = set of tags in tag cloud Tq • |S| = number of tags in tag cloud • Partial (scoring) function • s(t,c) : T x C [0,1] • Similarity function • Sim(. , .) : C x C [0,1]

  17. Properties of Tag Cloud • Extent of S • The cardinality of S • ext(s) = |s| • Coverage of S • Scored size of objects associated with S • Where |Cq|s,q = sum of scores for every c Cq

  18. Properties of Tag Cloud • Overlap of S • The extent of redundancy • Cohesiveness of S • How closely related the objects in each association set of S are

  19. Properties of Tag Cloud • Relevance of S • Relevance between tags in S and original query q • Popularityof S • A tag is more popular if it is associated with many objects in Cq.

  20. Properties of Tag Cloud • Independenceof S • Tags are Independent if they refer to dissimilar objects • Balance of S • Ratio of minimum size of Association set to the maximum size of Association set for a particular tag in a Tag cloud S.

  21. Algosto generate Tag Clouds • Single vs Multi-objective tag selection • E.g. achieving high popularity, get more coverage, be more cohesive, Incorporating relevance • Input to algorithms • Cq,Tqand S ⊆ Tq

  22. Algosto generate Tag Clouds • Popularity algorithm(POP) • The most common algorithm in social information sharing • A tag is more popular if it is associated with many objects in Cq. • It allows user to see what other people are mostly interested in sharing. • For query q and parameter k, the algo returns top k tags in Tq according to their |Aq(t)|.

  23. Tf-idf based algorithms(TF,WTF) • f (q, t, c) = s(t, c) (tf-idf method) • f (q, t, c) = s(t, c).s(q, c) (weighted-idf or WTF method)

  24. Maximum Coverage Algorithm(COV)

  25. User Models for Tag Clouds • Build an Ideal user satisfaction model • Use this model to compare the tag clouds • Base model: Coverage • The probability that an object is of the user’s interest is r.p, while the probability that an object is of the user’s interest is p.

  26. User Models for Tag Clouds • Incorporating Relevance • For an object the probability that it is of the user’s interest is and for every object the probability that it is of the user’s interest is p. • Incorporating Cohesiveness • For an object the probability that it is of the user’s interest is and for every object the probability that it is of the user’s interest is p.

  27. User Models for Tag Clouds • Incorporating Overlap For an object c that is contained by and no other association sets the probability that it is of the user’s interest is the one that can be seen in and for every object the probability that it is of the user’s interest is p. • Taking into account Scores • Closing Comment

  28. Experimental Evaluation Datasets: • CourseRank • Del.icio.us

  29. Experimental Evaluation of algorithms: CourseRank • Most metrics are not correlated • Only coverage and popularity correlated • High coverage might not be highly relevant • Algorithms impact metrics differently

  30. Experimental Evaluation of algorithms : CourseRank

  31. Experimental Evaluation of algorithms : del.icio.us • Similar, but overall range of values for coverage metric is around 0.2-0.8, much lower than for CourseRank dataset

  32. Impact on failure probability • Algorithms impact failure probability differently

  33. Evaluation of User Models • 80% predicted correctly, even when failure probability small • 100% for 0.15-0.25 difference, so if agreement, we get best tag cloud !

  34. Conclusion • Metrics generally not correlated • So, different important aspects of tag cloud are covered. • COV best algorithm to find tag cloud followed by POP • POP works well with relevance and cohesiveness! • User model- useful tool to identify tag clouds preferred by users

  35. Future Work • Extend model to capture balance metric • Construct algorithm to minimize failure probability for a dataset and given extent • Take into account items with unassigned and spam tags

  36. Thank you!

More Related