1 / 28

Network Structure of Folksonomies

Network Structure of Folksonomies. Vito D. P. Servedio. Dipartimento di Fisica, Università di Roma "La Sapienza“ Centro Studi e Ricerche "Enrico Fermi". TAGora: Semiotic Dynamics of Online Social Communities EU-IST-2006-034721. In collaboration with:.

jboland
Download Presentation

Network Structure of Folksonomies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Network Structure of Folksonomies Vito D. P. Servedio • Dipartimento di Fisica, Università di Roma "La Sapienza“ • Centro Studi e Ricerche "Enrico Fermi" TAGora: Semiotic Dynamics of Online Social Communities EU-IST-2006-034721

  2. In collaboration with: Andrea Baldassarri, Ciro Cattuto, Vittorio Loreto Miranda Grahl, Andreas Hotho, Christoph Schmitz, Gerd Stumme

  3. AGENDA • Properties of folksonomy hypergraphs • Network of tag co-occurrences • Clustering of resources V D P Servedio ~ECCS07~

  4. resource user tags a folksonomy example: del.icio.us screenshot V D P Servedio ~ECCS07~

  5. data structure: basic units of information • Post = ({tags}, user, resource) • TAS = tag assignment (tag, user, resource) ({bookmarking, sharing, collaborative, folksonomy}, andreab, http://del.icio.us) (bookmarking,andreab, http://del.icio.us) (sharing,andreab, http://del.icio.us) (collaborative,andreab, http://del.icio.us) (folksonomy,andreab, http://del.icio.us) V D P Servedio ~ECCS07~

  6. Can be viewed as a 3-modes network U1 T1 R2 hyperlink User 4 User 3 User 2 User 4 User 3 User 4 User 3 User 2 User 4 User 4 User 1 User 3 User 3 User 2 User 3 User 4 User 3 User 2 User 2 Tag 1 Res 1 Res 2 Res 3 Tag 2 Tag 3 User 4 folksonomy hypergraph structure V D P Servedio ~ECCS07~

  7. del.icio.us data collection TAGora Project (STREP FP6) Semiotic Dynamics in Online Social Communities • del.icio.us • Work co-ordinated by Uni of Kassel • Collected data from Nov. 2006. • Over 667K users • ~19 million resources • Nearly 2.5 million tags • ~ 140 million tag assignments • 50GB of data www.tagora-project.eu • Consortium: • University of Roma “La Sapienza” • SONY CSL Paris • University of Kassel • University of Koblenz-Landau • University of Southampton • flickr • Work co-ordinated by Uni of Koblenz • Collected data 21st May 2007. • ~ 300K users • ~25 million photos • 1.5 million tags • over 110 million tag assignments • bibsonomy • Complete dataset (june 2007) • ~1385 users • 37651 tags • Over 149K resources

  8. artificial networks: permuted and binomial In the following slides we shall use some artificial networks defined as: PERMUTED: take the original folksonomy and shuffle all nodes in the same class. example Resource1 User1 Tag1 Resource1 User1 Tag2 Resource1 User1 Tag3 Resource1 User2 Tag1 Resource1 User2 Tag2 Resource2 User3 Tag4 permuted example Resource1 User3Tag2 Resource2User2Tag3 Resource1 User1 Tag1 Resource1 User1Tag2 Resource1 User2 Tag1 Resource1 User1 Tag4 We end up with a hypergraph with same degree of the original one BINOMIAL: same number of hyperedges; endpoints chosen uniformly at random among T, U, R V D P Servedio ~ECCS07~

  9. T1 U1 R1 T2 U1 R1 T2U2 R2 T3U2R3 time average path length (extimated) moving on hyperedges V D P Servedio ~ECCS07~

  10. r2 u1 t1 t2 r t3 u2 cliquishness A high resource cliquishness indicates that many of the users related to that resource assign overlapping sets of tags to it |Tr|=3 |Ur|=2 |tur|=3 |tur| = # of hyperlinks connected to r |Tr| = # of adjacent tags |Ur| = # of adjacent users V D P Servedio ~ECCS07~

  11. |tur| = # of hyperlinks connected to r |tur| = number of tag-user pairs from tur that also occur with some other resource other than r r2 u1 t1 t2 r |tur|=3 |tur|=1 t3 u2 connectedness / transitivity V D P Servedio ~ECCS07~

  12. AGENDA • Properties of folksonomy hypergraphs • Network of tag co-occurrences • Clustering of resources V D P Servedio ~ECCS07~

  13. networks of tag co-occurrence Tags acquire a stronger semantic context when they co-occur each other e.g.: {Roma, holidays, Italy} vs {Roma, football, we_won} vs {Roma, love, girls} etc. • Tag co-occurrences in posts • Weighted graph of tags • Weight = number of common posts • Strength of a tag • Sum of its edge weights • Can we study “sematics“ of tags? • ({japan,tokyo} more frequent than {physics,sex}) • --check with Google!— • Compare statistics with shuffled graphs V D P Servedio ~ECCS07~

  14. weighted network of tag co-occurrence Two Tags co-occur if they are present in the same post We can say more: Two tags t, t’ co-occur with weight w if they are simultaneously present in w posts. In terms of adjacency Tensors: tensor contraction in flat space… We examine the weighted undirected network defined by W V D P Servedio ~ECCS07~

  15. Strength of node i: Strength distribution SPAM strength cumulative distribution tag shuffled example Resource1 User1 Tag2 Resource1 User1 Tag3 Resource1 User1 Tag1 Resource1 User2 Tag4 Resource1 User2 Tag1 Resource2 User3 Tag2 The tag reshuffling procedure makes almost no changes in the P(s): the strength is related to frequency of tags, not on semantics V D P Servedio ~ECCS07~

  16. Examine strength correlation between neighbors: Positive correlation: Assortative mixing e.g. Social networks Negative correlation: Disassortative mixing e.g. Technological networks Look for spam infection Reveal semantics via shuffled graph Average neighbour strength V D P Servedio ~ECCS07~

  17. average neighbor strength Scatter plot spam • Tags introduced with spamming, cluster together • Shuffling the graph changes the measure • Correlations related to semantics correspond to a region in the graph spam spam V D P Servedio ~ECCS07~

  18. AGENDA • Properties of folksonomy hypergraphs • Network of tag co-occurrences • Clustering of resources V D P Servedio ~ECCS07~

  19. Folksonomies: complex tripartite networks (tag, user, resource) Clustering detection can reveal sub-set of users (social communities) sub-set of tags (semantic frames, jargons…) sub-set of resources (social classification) Other… clustering and community detection Now we focus on clustering of resources using only tag assignments V D P Servedio ~ECCS07~

  20. resource similarity network Weighted networkHow to choose weights? How to take into account tag frequency? V D P Servedio ~ECCS07~

  21. Each resource is characterised by a tag-cloud: tags are assigned by users, and appear with different frequency. tag clouds for resources V D P Servedio ~ECCS07~

  22. Tag frequencies: Global frequency Relative frequencies T1 T2 K similarity metrics TF/IDF-like weighting procedure STATEMENT: Resources sharing “rare” tags are closely related V D P Servedio ~ECCS07~

  23. Sample of 400 resources: 200 resources tagged with “design” 200 resources tagged with “politics” Does the similarity network show two clusters? case in study Finer structure? Subclusters?

  24. W = { w’ } similarity matrix Broad variability of similarity strengths on logarithmic scale. P(w) A small power (0.1) is used as an effective way to treat with vanishing weights. w TASK: Find column and row permutations that uncover a block structure V D P Servedio ~ECCS07~

  25. First non trivial eigenvalues spectral analysis A. Capocci, V.D.P. Servedio, G. Caldarelli and F. Colaiori, Physica A352, 669 (2005). and many others Q Eigenvalues « Laplacian » matrix V D P Servedio ~ECCS07~

  26. reordered matrix politics design cluster identification Correlation of homologous components reveals cluster structure. V2 = {v2,1, v2,2, ...,v2,n }, V3 = { v3,1, v3,2, ..., v3,n }, V4 = { v4,1, v4,2, ..., v4,n } [ v2,i;v3,i;v4,i ] 2 4 3 1

  27. “humor” in politics visual design news in politics web design cooperative classification Tag clouds of the six identified clusters of resources:

  28. Conclusions and outlooks • Folksonomies are the way people is building the information and communication systems of our future. • Folksonomies are a laboratory to study human/social/semiotic dynamics. • A Folksonomy is a growing tri-partite network, whose nodes are users, resources and metadata (tags), while (hyper)links are annotation events • (note that this structure is similar to search queries: • user, search string, resource retrieved). • Folksonomies’ statistical structure reveals many complex features, typical of interacting humans. • Projections of folksonomy on different spaces can be useful to study: • spam infection; • semantic of tags; • emerging resource classification. V D P Servedio ~ECCS07~

More Related