1 / 28

Growing a Tree in the Forest: Constructing Folksonomies by Integrating Structured Metadata

Growing a Tree in the Forest: Constructing Folksonomies by Integrating Structured Metadata. Anon Plangprasopchok 1 , Kristina Lerman 1 , Lise Getoor 2 1 USC Information Sciences Institute 2 Department of Computer Science, University of Maryland SIGKDD 2010 2010. 12. 17.

lew
Download Presentation

Growing a Tree in the Forest: Constructing Folksonomies by Integrating Structured Metadata

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Growing a Tree in the Forest: Constructing Folksonomies by Integrating Structured Metadata Anon Plangprasopchok1, Kristina Lerman1, Lise Getoor2 1USC Information Sciences Institute 2Department of Computer Science, University of Maryland SIGKDD 2010 2010. 12. 17. Summarized and Presented by Kim Chung Rim, IDS Lab., Seoul National University

  2. Contents Introduction Goal Challenges Learning Folksonomies SAP: Growing a Tree Evaluation Conclusion & Discussion

  3. Introduction • The social Web has changed the way people create and use information • Users organize their content via tags • Sites such as Flickr, Del.icio.us and YouTube allows users to tag their contents with descriptive keywords

  4. Social Metadata • captures collective knowledge of Social Web users • Once mined from many users, • Such collective knowledge will add a rich semantic layer to the content of Social Web • Could be used for information discovery

  5. Structured Social Metadata • Some Social Websites allow users to organize tags hierarchically • Delicious • Users can group related tagsinto bundles • Flickr • Users can group related photos into sets and related sets into collections

  6. Folksonomy • Folksonomy – a common taxonomy that is learned from social metadata. • There exist predefined hierarchies such as WordNet or Linnaean classification system. • Benefits of Folksonomy over already existing hierarchies • represent collective agreement of many individuals • Are relatively inexpensive to obtain • Can adapt to evolving vocabularies and community’s information needs • Directly tied to the annotated content

  7. Goal animal animal fish cat dog fish cat dog reptile • dog • animal biggle jindo biggle jindo reptile cat • The goal of this paper is to • Extract users’ knowledge (folk knowledge) from user-generated semantics • To build up a tree of communal taxonomy

  8. Challenges • Sparseness • Social metadata is very sparse – only 3.74 tags per photo • Nosiy vocabulary • Spelling errors and meaningless tags – not sure, mykid • Ambiguity • Individual tag is often ambiguous – jaguar

  9. Challenges • Structural noise and conflicts • Users use different term to build their own hierarchy • Varying granularity level • Users’ level of expertise and expressiveness differ

  10. Learning Folksonomies - intuition animal • animal animal fish cat dog reptile cat fish cat dog reptile animal • dog animal fish cat dog biggle jindo fish cat dog biggle jindo roots with similar term should be aggregated horizontally A Root and a leaf with similar term should be aggregated vertically

  11. Learning Folksonomy • Relational Clustering • Two nodes should be clustered if their similarity is above a certain threshold  • Similarity is computed using local similarity & structural similarity • localSim(a,b) measures the similarity of node names and their tag distribution • structSim(a,b) measures the similarity of leaves •  is a weight for adjusting contributions from localSim and structSim( )

  12. Learning Folksonomy - terms • A personal hierarchy is called a sapling • A sapling is composed of a root node and its leaves • Tag statistic of a node x is defined as • Where and are tag and its frequency • is aggregated from all • = {(canada, 1), (vacation, 1), … } • = {(bc,1), (canada, 2), (vacation, 1), … }

  13. Local Similarity • The local similarity of nodes a and b is composed of • Name similarity • Tag distribution similarity • nameSim can be any string similarity metric, which returns a value from 0 to 1 • tagSim can be any function for measuring the similarity of distributions – in this paper, tagSim counts the number of common tags n, in the top K tag of a and b. if n ≥ J (a certain threshold), it returns 1, else it returns n/J.

  14. Structural Similarity • Compute structural similarity in two cases • Similarity between two root nodes • Similarity between a root node and a leaf node

  15. Root-to-Root Similarity • Similarity based on • How many of their children have common name • The tag distribution similarity of those that do not have the same name • Where , • is an aggregation of tag distributions of all

  16. Root-to-Leaf Similarity • Takes into account when a sapling contains the same leaf as the compared node • When merging UK->{Scotland, Glasgow, Edinburgh, London} and Scotland->{Glasgow, Shetland}, it can be said that Scotland and UK are similar because UK contains a shortcut to Glasgow • Measures overlap between siblings of and children of

  17. Relational Clustering animal • animal animal fish cat dog reptile cat fish cat dog reptile • When the nodeSim(a,b) exceeds a certain threshold, two nodes are clustered(merged)

  18. SAP: Growing a Tree canada canada canada canada canada victoria victoria victoria victoria Horizontal aggregation victoria Vertical aggregation victoria Using Incremental Relational Clustering,

  19. Handling Problems • Shortcuts • Take the longer route • Noisy vocabularies • If a name is used by less than 1% of the user who contributed in the sapling, it is considered as noisy and discarded

  20. Evaluation • 20,759 saplings from 7,121 users are selected from Flickr • Values for several parameters are chosen

  21. Evaluation Methodologies • Against the reference hierarchy (ODP) • Taxonomic Overlap – measuring structure similarity between two trees. Measures how many ancestor and descendant nodes overlap to those in the reference tree for each node and take the harmonic mean (fmTO) • Lexical Recall – measuring how well an approach can discover concepts in the reference hierarchy (LR) • Structural evaluation • Area Under Tree – a measurement to define how bushy and deep a tree is (AUT) • Manual evaluation (for concepts not in ODP) • Asking 5 judges whether the path is correct

  22. Evaluation metric - AUT animal fish cat dog biggle jindo • AUT for below tree is: • 1*(3+1)/2 + 1*(3+2)/2 = 4.5

  23. Baseline of comparison • SIG – past work of the authors • Assume nodes having the same name refer to the same concept • Keep the relations between node pairs • Combine all relations into a tree

  24. Results

  25. Results Simple comparison

  26. Some of the Learned Folksonomies

  27. Conclusion & Discussion This work can create more accurate and more detailed folksonomies than the current state-of-the-art approach by exploiting structural information This work is more scalable – it incrementally grows the folksonomies Lacks comparison with other state-of-the-art approaches

  28. Thank you

More Related