Towards Effective Browsing of Large Scale Social Annotations WWW 2007

Towards Effective Browsing of Large Scale Social AnnotationsWWW 2007 Rui Li, Shenghua Bao, Yong Yu, Zhong Su, and Ben Fei Shanghai JiaoTong UniversityIBM China Research Lab Advisor: Hsin-Hsi Chen Reporter: Y.H Chang 2008-06-06 Towards Effective Browsing of Large Scale Social Annotations

Outline • Introduction • ELSABer overview • Components of ELSABer • Enhanced models • Experimental results • Conclusion Towards Effective Browsing of Large Scale Social Annotations

Introduction • Today, a lot of services (e.g., Del.icio.us, Filckr) have been provided for helping users to manage and share their favorite URLs and photos based on social annotations. • How to effectively find desired resources from large annotation data is a new problem. • In this paper, we propose a novel algorithm, namely Effective Large Scale Annotation Browser (ELSABer), to browse large-scale social annotation data. Towards Effective Browsing of Large Scale Social Annotations

Introduction • ELSABer helps the users browse huge number of annotations in a semantic, hierarchical and efficient way. • By incorporating the personal and time information, ELSABer can be further extended for personalized and time-related browsing. Towards Effective Browsing of Large Scale Social Annotations

The prototype system based on ELSABer A set of pages related to the current annotation “programming” Sub-tags (sub category) of “programming” Towards Effective Browsing of Large Scale Social Annotations

ELSABer overview • Input An empty concept set SC • Step 1 Output the initial view of annotations • generates TOP 100 tags from 2000 most frequently URLs and tags. • They are the roots in hierarchical browsing. • Loop User select a tag Ti • Step 2 Concept Matching • Add tag Tito set SC • Calculate related tag set and URL set • Step 3 (optional) sample URL set and sample Tag set • Step 4 Hierarchical Browsing • 4-1 Calculate candidate sub-tags • 4-2 Rank the sub-tags by Infor-score • IF Termination condition Satisfied; Return • ELSE Loop Towards Effective Browsing of Large Scale Social Annotations

Components of ELSABer • Data setup and representation • Semantic Browsing • a. Annotation Similarity Estimation • b. Generating the Semantic Concept • Hierarchical Browsing • c. Sub-Tag Generation • d. Sub-Tag Clustering • Efficient Browsing Towards Effective Browsing of Large Scale Social Annotations

Data setup and representation • Del.icio.us (May, 2006) • We define an annotation as a quadruple: • (User, URL, Tag, Time). • Associated matrix Mmxn • m and n is the total number of tags and URLs • |URL(ti)| represents the number of URLs annotated by tag ti. • Cijdenote the number of userswho annotate the jth URL with the ith tag Like the TFIDF of IR Towards Effective Browsing of Large Scale Social Annotations

Data setup and representation • Given the associated matrix Mmxn : U1 U2 .. … .. Un T1 T2 . . . Tm the tag can be represented as a row vector Ti (U1,U2,.. Un) of M the URL can be represented as a column vector Ui (t1,t2,…,tm) of M. Towards Effective Browsing of Large Scale Social Annotations

Semantic Browsinga. Annotation Similarity Estimation • Similarity: • Special case-1(stemming): Ex: Programs & Programming =>add 0.1 weight • Special case-2(punctuation): Ex: Web-dev & WebDev => add 0.08 weight Towards Effective Browsing of Large Scale Social Annotations

Semantic Browsingb. Generating the Semantic Concept • Given the selected tag ti, we choose a tag set STi that is most related to ti by following rules: • 1. tj should be among the N most similar tags related to ti • 2. The similarity should be larger than a threshold θ. • N=4, θ=0.7 • semantic concept Ci = STi ∪{ti} Towards Effective Browsing of Large Scale Social Annotations

Semantic Browsingb. Generating the Semantic Concept • The path of user’s clicking: t1, t2,…,tL will bring a sequence of concepts: C1, C2,…,CL. • Let concept set SC= {C1, C2,…, CL}. • The related URLs : • ReURL(SC ) = {u | ∀C ∈ SC ,T(u) ∩C ≠ Φ} • T(u) means the set of annotations given to URL u. • the related tags can be defined as all the tags given to ReURL(SC): • ReTag(SC ) {t | u∈ ReURL(SC ),t ∈ T(u)} Towards Effective Browsing of Large Scale Social Annotations

Hierarchical Browsingc. Sub-Tag Generation • If the intersection URL set is the main part of all the URLs of ti, but a small part of tj, we can infer that ti is a sub-tag of tj 40 related tags of “google” Towards Effective Browsing of Large Scale Social Annotations

Hierarchical Browsingc. Sub-Tag Generation U(ti) denotes the number of URLs tagged with ti Towards Effective Browsing of Large Scale Social Annotations

Hierarchical Browsingc. Sub-Tag Generation • Given the features above, each related tag is represented as a feature vector. A decision tree can be derived from the manually labeled data set to predict the sub-tag relations using C4.5. Towards Effective Browsing of Large Scale Social Annotations

Hierarchical Browsingd. Sub-Tag Clustering Towards Effective Browsing of Large Scale Social Annotations

Hierarchical Browsingd. Sub-Tag Clustering • Infor(t) = w1TFIDF(t) + w2ICS(t) + w3TE(t) • Intra-Cluster Similarity: • ot denotes the centroid of all the URLs associated with the tag • Tag Entropy: • In our experiment, these weights are 0.58, 0.27, and 0.13, respectively. Towards Effective Browsing of Large Scale Social Annotations

Efficient Browsing • Observation : People use popular tags to annotate URLs and also the popular URLs are annotated by the majority of tags. Towards Effective Browsing of Large Scale Social Annotations

Efficient Browsing • So we can get good results efficiently by running our algorithm in a small sub tagging space . • In our experiment, we sampling 2000most frequentlyannotated URLs and 2000most frequently tag , so the size of M is 2000 × 2000 • <we do not cut off the “long tail”> • After a sequence of click by the user, the intention of the user will be more specific, this causes a decreasing number of related URLs or related tags. • When the number is less than 2000, all the tags and URLs will be calculated Towards Effective Browsing of Large Scale Social Annotations

Enhanced Models • User’s profile: • The user interested annotations and resources can be found as follows: • Ri denotes the vector representation of a resource, and Ti denotes the vector representation of Ai. • Adjust the sampling and ranking algorithms according to the user’s preference: • Infor (t,U) = α × Infor (t) + β ×UI (t | P(U)) Towards Effective Browsing of Large Scale Social Annotations

Enhanced Models Given the user required time interval TI= [ts, te]. We define the match of the URL’s time sequence TS and the user required time interval TI as follows: θ=0.5 Towards Effective Browsing of Large Scale Social Annotations

Experiment results • The scale of the dataset: • Machine: Intel Pentium IV 3.0 GHz, 1GB memory, 2 processors • Java • Lucene API is also used to build URL and Tag index. Towards Effective Browsing of Large Scale Social Annotations

Experiment results Red tag: owned by user Orange tag: recommended Towards Effective Browsing of Large Scale Social Annotations

Experiment results Towards Effective Browsing of Large Scale Social Annotations

Conclusion • Our main contributions: • The proposal of the effective algorithm – ELSABer based on the analysis of social annotation’s characteristics. • The proposal of enhanced models for personalized and time related browsing. Towards Effective Browsing of Large Scale Social Annotations

Future work • more user studies • emphasize on how to find more qualified URL resources • utilize existing hierarchical structures such as ODP and WordNet for helping construct more meaningful hierarchical structures for social annotations. Towards Effective Browsing of Large Scale Social Annotations

Thank you!! Towards Effective Browsing of Large Scale Social Annotations

Towards Effective Browsing of Large Scale Social Annotations WWW 2007

Towards Effective Browsing of Large Scale Social Annotations WWW 2007

Presentation Transcript

CS 006 Effective Use of WWW Fall 2007

The Environmental and Social Impacts of Large Scale Dams

e-Infrastructure for Large-Scale Social Simulation

Social Influence Analysis in Large-scale Networks

Towards Effective Partition Management for Large Graphs

Interactively Browsing Large image databases

Towards Viable Large Scale Heterogeneous Wireless Networks

Browsing Large Scale Cheminformatics Data with Dimension Reduction

Large-scale matching

LARGE SCALE

SharePoint 2007 Large Scale Infrastructure Planning

Large- scale Organisations

Path Towards A Large Scale

Towards Efficient Simulation of Large Scale P2P Networks

Large scale

FP7-ICT-2007-2 HELIOS Large-scale Integrating Project Large-scale integrating project (IP)

HiMap: Adaptive Visualization of Large-Scale Online Social Networks

Large-Scale and Cost-Effective Video Services

Control of Large Scale Systems

Scale Social review - EXCLUSIVE bonus of Scale Social

Large-Scale and Cost-Effective Video Services