270 likes | 369 Views
Towards Effective Browsing of Large Scale Social Annotations WWW 2007. Rui Li, Shenghua Bao, Yong Yu , Zhong Su, and Ben Fei Shanghai JiaoTong University IBM China Research Lab. Advisor: Hsin-Hsi Chen Reporter: Y.H Chang 2008-06-06. Outline. Introduction ELSABer overview
E N D
Towards Effective Browsing of Large Scale Social AnnotationsWWW 2007 Rui Li, Shenghua Bao, Yong Yu, Zhong Su, and Ben Fei Shanghai JiaoTong UniversityIBM China Research Lab Advisor: Hsin-Hsi Chen Reporter: Y.H Chang 2008-06-06 Towards Effective Browsing of Large Scale Social Annotations
Outline • Introduction • ELSABer overview • Components of ELSABer • Enhanced models • Experimental results • Conclusion Towards Effective Browsing of Large Scale Social Annotations
Introduction • Today, a lot of services (e.g., Del.icio.us, Filckr) have been provided for helping users to manage and share their favorite URLs and photos based on social annotations. • How to effectively find desired resources from large annotation data is a new problem. • In this paper, we propose a novel algorithm, namely Effective Large Scale Annotation Browser (ELSABer), to browse large-scale social annotation data. Towards Effective Browsing of Large Scale Social Annotations
Introduction • ELSABer helps the users browse huge number of annotations in a semantic, hierarchical and efficient way. • By incorporating the personal and time information, ELSABer can be further extended for personalized and time-related browsing. Towards Effective Browsing of Large Scale Social Annotations
The prototype system based on ELSABer A set of pages related to the current annotation “programming” Sub-tags (sub category) of “programming” Towards Effective Browsing of Large Scale Social Annotations
ELSABer overview • Input An empty concept set SC • Step 1 Output the initial view of annotations • generates TOP 100 tags from 2000 most frequently URLs and tags. • They are the roots in hierarchical browsing. • Loop User select a tag Ti • Step 2 Concept Matching • Add tag Tito set SC • Calculate related tag set and URL set • Step 3 (optional) sample URL set and sample Tag set • Step 4 Hierarchical Browsing • 4-1 Calculate candidate sub-tags • 4-2 Rank the sub-tags by Infor-score • IF Termination condition Satisfied; Return • ELSE Loop Towards Effective Browsing of Large Scale Social Annotations
Components of ELSABer • Data setup and representation • Semantic Browsing • a. Annotation Similarity Estimation • b. Generating the Semantic Concept • Hierarchical Browsing • c. Sub-Tag Generation • d. Sub-Tag Clustering • Efficient Browsing Towards Effective Browsing of Large Scale Social Annotations
Data setup and representation • Del.icio.us (May, 2006) • We define an annotation as a quadruple: • (User, URL, Tag, Time). • Associated matrix Mmxn • m and n is the total number of tags and URLs • |URL(ti)| represents the number of URLs annotated by tag ti. • Cijdenote the number of userswho annotate the jth URL with the ith tag Like the TFIDF of IR Towards Effective Browsing of Large Scale Social Annotations
Data setup and representation • Given the associated matrix Mmxn : U1 U2 .. … .. Un T1 T2 . . . Tm the tag can be represented as a row vector Ti (U1,U2,.. Un) of M the URL can be represented as a column vector Ui (t1,t2,…,tm) of M. Towards Effective Browsing of Large Scale Social Annotations
Semantic Browsinga. Annotation Similarity Estimation • Similarity: • Special case-1(stemming): Ex: Programs & Programming =>add 0.1 weight • Special case-2(punctuation): Ex: Web-dev & WebDev => add 0.08 weight Towards Effective Browsing of Large Scale Social Annotations
Semantic Browsingb. Generating the Semantic Concept • Given the selected tag ti, we choose a tag set STi that is most related to ti by following rules: • 1. tj should be among the N most similar tags related to ti • 2. The similarity should be larger than a threshold θ. • N=4, θ=0.7 • semantic concept Ci = STi ∪{ti} Towards Effective Browsing of Large Scale Social Annotations
Semantic Browsingb. Generating the Semantic Concept • The path of user’s clicking: t1, t2,…,tL will bring a sequence of concepts: C1, C2,…,CL. • Let concept set SC= {C1, C2,…, CL}. • The related URLs : • ReURL(SC ) = {u | ∀C ∈ SC ,T(u) ∩C ≠ Φ} • T(u) means the set of annotations given to URL u. • the related tags can be defined as all the tags given to ReURL(SC): • ReTag(SC ) {t | u∈ ReURL(SC ),t ∈ T(u)} Towards Effective Browsing of Large Scale Social Annotations
Hierarchical Browsingc. Sub-Tag Generation • If the intersection URL set is the main part of all the URLs of ti, but a small part of tj, we can infer that ti is a sub-tag of tj 40 related tags of “google” Towards Effective Browsing of Large Scale Social Annotations
Hierarchical Browsingc. Sub-Tag Generation U(ti) denotes the number of URLs tagged with ti Towards Effective Browsing of Large Scale Social Annotations
Hierarchical Browsingc. Sub-Tag Generation • Given the features above, each related tag is represented as a feature vector. A decision tree can be derived from the manually labeled data set to predict the sub-tag relations using C4.5. Towards Effective Browsing of Large Scale Social Annotations
Hierarchical Browsingd. Sub-Tag Clustering Towards Effective Browsing of Large Scale Social Annotations
Hierarchical Browsingd. Sub-Tag Clustering • Infor(t) = w1TFIDF(t) + w2ICS(t) + w3TE(t) • Intra-Cluster Similarity: • ot denotes the centroid of all the URLs associated with the tag • Tag Entropy: • In our experiment, these weights are 0.58, 0.27, and 0.13, respectively. Towards Effective Browsing of Large Scale Social Annotations
Efficient Browsing • Observation : People use popular tags to annotate URLs and also the popular URLs are annotated by the majority of tags. Towards Effective Browsing of Large Scale Social Annotations
Efficient Browsing • So we can get good results efficiently by running our algorithm in a small sub tagging space . • In our experiment, we sampling 2000most frequentlyannotated URLs and 2000most frequently tag , so the size of M is 2000 × 2000 • <we do not cut off the “long tail”> • After a sequence of click by the user, the intention of the user will be more specific, this causes a decreasing number of related URLs or related tags. • When the number is less than 2000, all the tags and URLs will be calculated Towards Effective Browsing of Large Scale Social Annotations
Enhanced Models • User’s profile: • The user interested annotations and resources can be found as follows: • Ri denotes the vector representation of a resource, and Ti denotes the vector representation of Ai. • Adjust the sampling and ranking algorithms according to the user’s preference: • Infor (t,U) = α × Infor (t) + β ×UI (t | P(U)) Towards Effective Browsing of Large Scale Social Annotations
Enhanced Models Given the user required time interval TI= [ts, te]. We define the match of the URL’s time sequence TS and the user required time interval TI as follows: θ=0.5 Towards Effective Browsing of Large Scale Social Annotations
Experiment results • The scale of the dataset: • Machine: Intel Pentium IV 3.0 GHz, 1GB memory, 2 processors • Java • Lucene API is also used to build URL and Tag index. Towards Effective Browsing of Large Scale Social Annotations
Experiment results Red tag: owned by user Orange tag: recommended Towards Effective Browsing of Large Scale Social Annotations
Experiment results Towards Effective Browsing of Large Scale Social Annotations
Conclusion • Our main contributions: • The proposal of the effective algorithm – ELSABer based on the analysis of social annotation’s characteristics. • The proposal of enhanced models for personalized and time related browsing. Towards Effective Browsing of Large Scale Social Annotations
Future work • more user studies • emphasize on how to find more qualified URL resources • utilize existing hierarchical structures such as ODP and WordNet for helping construct more meaningful hierarchical structures for social annotations. Towards Effective Browsing of Large Scale Social Annotations
Thank you!! Towards Effective Browsing of Large Scale Social Annotations