1 / 27

Towards Effective Browsing of Large Scale Social Annotations WWW 2007

Towards Effective Browsing of Large Scale Social Annotations WWW 2007. Rui Li, Shenghua Bao, Yong Yu , Zhong Su, and Ben Fei Shanghai JiaoTong University IBM China Research Lab. Advisor: Hsin-Hsi Chen Reporter: Y.H Chang 2008-06-06. Outline. Introduction ELSABer overview

indiya
Download Presentation

Towards Effective Browsing of Large Scale Social Annotations WWW 2007

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards Effective Browsing of Large Scale Social AnnotationsWWW 2007 Rui Li, Shenghua Bao, Yong Yu, Zhong Su, and Ben Fei Shanghai JiaoTong UniversityIBM China Research Lab Advisor: Hsin-Hsi Chen Reporter: Y.H Chang 2008-06-06 Towards Effective Browsing of Large Scale Social Annotations

  2. Outline • Introduction • ELSABer overview • Components of ELSABer • Enhanced models • Experimental results • Conclusion Towards Effective Browsing of Large Scale Social Annotations

  3. Introduction • Today, a lot of services (e.g., Del.icio.us, Filckr) have been provided for helping users to manage and share their favorite URLs and photos based on social annotations. • How to effectively find desired resources from large annotation data is a new problem. • In this paper, we propose a novel algorithm, namely Effective Large Scale Annotation Browser (ELSABer), to browse large-scale social annotation data. Towards Effective Browsing of Large Scale Social Annotations

  4. Introduction • ELSABer helps the users browse huge number of annotations in a semantic, hierarchical and efficient way. • By incorporating the personal and time information, ELSABer can be further extended for personalized and time-related browsing. Towards Effective Browsing of Large Scale Social Annotations

  5. The prototype system based on ELSABer A set of pages related to the current annotation “programming” Sub-tags (sub category) of “programming” Towards Effective Browsing of Large Scale Social Annotations

  6. ELSABer overview • Input An empty concept set SC • Step 1 Output the initial view of annotations • generates TOP 100 tags from 2000 most frequently URLs and tags. • They are the roots in hierarchical browsing. • Loop User select a tag Ti • Step 2 Concept Matching • Add tag Tito set SC • Calculate related tag set and URL set • Step 3 (optional) sample URL set and sample Tag set • Step 4 Hierarchical Browsing • 4-1 Calculate candidate sub-tags • 4-2 Rank the sub-tags by Infor-score • IF Termination condition Satisfied; Return • ELSE Loop Towards Effective Browsing of Large Scale Social Annotations

  7. Components of ELSABer • Data setup and representation • Semantic Browsing • a. Annotation Similarity Estimation • b. Generating the Semantic Concept • Hierarchical Browsing • c. Sub-Tag Generation • d. Sub-Tag Clustering • Efficient Browsing Towards Effective Browsing of Large Scale Social Annotations

  8. Data setup and representation • Del.icio.us (May, 2006) • We define an annotation as a quadruple: • (User, URL, Tag, Time). • Associated matrix Mmxn • m and n is the total number of tags and URLs • |URL(ti)| represents the number of URLs annotated by tag ti. • Cijdenote the number of userswho annotate the jth URL with the ith tag Like the TFIDF of IR Towards Effective Browsing of Large Scale Social Annotations

  9. Data setup and representation • Given the associated matrix Mmxn : U1 U2 .. … .. Un T1 T2 . . . Tm the tag can be represented as a row vector Ti (U1,U2,.. Un) of M the URL can be represented as a column vector Ui (t1,t2,…,tm) of M. Towards Effective Browsing of Large Scale Social Annotations

  10. Semantic Browsinga. Annotation Similarity Estimation • Similarity: • Special case-1(stemming): Ex: Programs & Programming =>add 0.1 weight • Special case-2(punctuation): Ex: Web-dev & WebDev => add 0.08 weight Towards Effective Browsing of Large Scale Social Annotations

  11. Semantic Browsingb. Generating the Semantic Concept • Given the selected tag ti, we choose a tag set STi that is most related to ti by following rules: • 1. tj should be among the N most similar tags related to ti • 2. The similarity should be larger than a threshold θ. • N=4, θ=0.7 • semantic concept Ci = STi ∪{ti} Towards Effective Browsing of Large Scale Social Annotations

  12. Semantic Browsingb. Generating the Semantic Concept • The path of user’s clicking: t1, t2,…,tL will bring a sequence of concepts: C1, C2,…,CL. • Let concept set SC= {C1, C2,…, CL}. • The related URLs : • ReURL(SC ) = {u | ∀C ∈ SC ,T(u) ∩C ≠ Φ} • T(u) means the set of annotations given to URL u. • the related tags can be defined as all the tags given to ReURL(SC): • ReTag(SC ) {t | u∈ ReURL(SC ),t ∈ T(u)} Towards Effective Browsing of Large Scale Social Annotations

  13. Hierarchical Browsingc. Sub-Tag Generation • If the intersection URL set is the main part of all the URLs of ti, but a small part of tj, we can infer that ti is a sub-tag of tj 40 related tags of “google” Towards Effective Browsing of Large Scale Social Annotations

  14. Hierarchical Browsingc. Sub-Tag Generation U(ti) denotes the number of URLs tagged with ti Towards Effective Browsing of Large Scale Social Annotations

  15. Hierarchical Browsingc. Sub-Tag Generation • Given the features above, each related tag is represented as a feature vector. A decision tree can be derived from the manually labeled data set to predict the sub-tag relations using C4.5. Towards Effective Browsing of Large Scale Social Annotations

  16. Hierarchical Browsingd. Sub-Tag Clustering Towards Effective Browsing of Large Scale Social Annotations

  17. Hierarchical Browsingd. Sub-Tag Clustering • Infor(t) = w1TFIDF(t) + w2ICS(t) + w3TE(t) • Intra-Cluster Similarity: • ot denotes the centroid of all the URLs associated with the tag • Tag Entropy: • In our experiment, these weights are 0.58, 0.27, and 0.13, respectively. Towards Effective Browsing of Large Scale Social Annotations

  18. Efficient Browsing • Observation : People use popular tags to annotate URLs and also the popular URLs are annotated by the majority of tags. Towards Effective Browsing of Large Scale Social Annotations

  19. Efficient Browsing • So we can get good results efficiently by running our algorithm in a small sub tagging space . • In our experiment, we sampling 2000most frequentlyannotated URLs and 2000most frequently tag , so the size of M is 2000 × 2000 • <we do not cut off the “long tail”> • After a sequence of click by the user, the intention of the user will be more specific, this causes a decreasing number of related URLs or related tags. • When the number is less than 2000, all the tags and URLs will be calculated Towards Effective Browsing of Large Scale Social Annotations

  20. Enhanced Models • User’s profile: • The user interested annotations and resources can be found as follows: • Ri denotes the vector representation of a resource, and Ti denotes the vector representation of Ai. • Adjust the sampling and ranking algorithms according to the user’s preference: • Infor (t,U) = α × Infor (t) + β ×UI (t | P(U)) Towards Effective Browsing of Large Scale Social Annotations

  21. Enhanced Models Given the user required time interval TI= [ts, te]. We define the match of the URL’s time sequence TS and the user required time interval TI as follows: θ=0.5 Towards Effective Browsing of Large Scale Social Annotations

  22. Experiment results • The scale of the dataset: • Machine: Intel Pentium IV 3.0 GHz, 1GB memory, 2 processors • Java • Lucene API is also used to build URL and Tag index. Towards Effective Browsing of Large Scale Social Annotations

  23. Experiment results Red tag: owned by user Orange tag: recommended Towards Effective Browsing of Large Scale Social Annotations

  24. Experiment results Towards Effective Browsing of Large Scale Social Annotations

  25. Conclusion • Our main contributions: • The proposal of the effective algorithm – ELSABer based on the analysis of social annotation’s characteristics. • The proposal of enhanced models for personalized and time related browsing. Towards Effective Browsing of Large Scale Social Annotations

  26. Future work • more user studies • emphasize on how to find more qualified URL resources • utilize existing hierarchical structures such as ODP and WordNet for helping construct more meaningful hierarchical structures for social annotations. Towards Effective Browsing of Large Scale Social Annotations

  27. Thank you!! Towards Effective Browsing of Large Scale Social Annotations

More Related