370 likes | 528 Views
Hybrid Content and Tag-based Profiles for recommendation in Collaborative Tagging Systems. Latin American Web Conference IEEE Computer Society, 2008 Presenter: Ying-Ying, Chen. Outline. Introduction Related Works Approach Hybrid User Profiles Content-Based User Profiles
E N D
Hybrid Content and Tag-based Profiles for recommendation in Collaborative Tagging Systems Latin American Web Conference IEEE Computer Society, 2008 Presenter: Ying-Ying, Chen
Outline • Introduction • Related Works • Approach • Hybrid User Profiles • Content-Based User Profiles • Linking Tags to User Interests • Experimental Results • Conclusionsand Comments Speaker: Ying-ying, Chen
Introduction • User profiling is an essential component of personal information agents and recommendation systems in general. • Content-based recommendation approaches rely on profiles were collected from observation of the browsing history or documents read by the user. Speaker: Ying-ying, Chen
Introduction • Recently, collaborative or social tagging sites have achieved widespread success on the Web. In these sites, user annotate resources using a freely chosen set of keywords or tags commonly known as folksonomy. • The activities carried out by users in social tagging systems, including posting resources or assigning tags to resources, have become a novel resource of information about user interests. Speaker: Ying-ying, Chen
Introduction • This paper propose to integrate content-based profiles representing long-term user interests gathered by recommenders through observation of browsing activities with tag-based profiles acquired by capturing the user interaction with one or more collaborative tagging systems. • Hybrid profiles can be exploited to assist users in finding resources, people or tags within social tagging systems. Speaker: Ying-ying, Chen
Related Works • Vector of weighted tag • a vector of weighted tags is obtained using tag frequency of occurrence in there sources a user tagged and it is applied to rank Web search results according to their similarity with this tag vector. • TBProfile • It uses weighted vector of tags to represent userinterests, but tag weights are based on inverse user frequency. Speaker: Ying-ying, Chen
Related Works • Using a single vector of weighted tags has some drawbacks. • More frequent tags lose specificity. • Unique vector or tag cloud can’t embrace diverse interests spanning across different domains. • Graph-based cluster[Au Yeoung at al.] • Multiple tag-clouds Speaker: Ying-ying, Chen
Related Works • A number of problems result from the free-form nature of tagging . • Ambiguity • Synonymy • Solve: contextualizing tags based on the knowledge of user information preferences. Speaker: Ying-ying, Chen
Approach – Hybrid User Profiles • Folksonomies are the primary structure underlying collaborative tagging systems. • Folksonomy can be defined as a tuple F := (U, T, R, Y, ≺) U: users, R: resources, T: tags Y: the user-based assignment of tags to resources by a ternary relation. Y ⊆ U × T × R ≺: a user-specific sub-tag/super-tag-relation ≺⊆ U × T × T Speaker: Ying-ying, Chen
Approach – Hybrid User Profiles • The collection of all tag assignments of a single user constitutes a personomy, Pu. Pu := (Tu, Ru, Iu, ≺u) with Iu :={(t, r) ∈ T × R|(u, t, r) ∈ Y }, Tu := (Iu), Ru := (Iu), and ≺u:= {(t1, t2) ∈ T × T |(u, t1, t2) ∈≺} Speaker: Ying-ying, Chen
Approach Overview Speaker: Ying-ying, Chen
Approach – Content-Based User Profiles • WebDCC (Web Document Conceptual Clustering) • Input : Web Pages • Output : Hierarchy of concepts – User Profile. • Instance are represented using bag-of words approach for document representation. • It builds hierarchy of Concepts. • Each node is Concept and leafs are clusters. • Category is considered to be any set of instances and concept is the internal representation of a category. Speaker: Ying-ying, Chen
Approach – Content-Based User Profiles • User Profile Speaker: Ying-ying, Chen
Approach – Content-Based User Profiles • Agents capture experiences regarding user interests such as Web pages a user read or bookmarked for future reading, read news, etc. • Experiences are vector representations of information items based on the vector space model. Di = {(t1,w1), ..., (tm,wm)} Speaker: Ying-ying, Chen
Approach – Content-Based User Profiles • Hierarchies of concepts produced by this algorithm are classification trees. • Root→most general concept • Terminal concept→cluster • WebDCC integrates classification and learning by sorting each experience through the concept hierarchy and simultaneously updating it. Speaker: Ying-ying, Chen
Approach – Content-Based User Profiles • hierarchy consists of a number of concepts C = {c1, c2, . . . , cn} • In order to automatically assign experiences to concepts with a description given by set of term ci = {(t1,w1), ..., (tm,wm)} • weight associated to the term in the category ci. • This description constitutes a linear classifier for the category. Speaker: Ying-ying, Chen
Approach – Content-Based User Profiles • WebDCC aims at obtaining a hierarchical set of linear classifiers, each of which is based on a set of relevant features. • This goal is achieved by combining • feature selection algorithm to choose the appropriate terms at each node in the tree • supervised learning algorithm to construct a classifier for that node Speaker: Ying-ying, Chen
feature selection algorithm • A feature selection threshold, ; is defined in the [0; 1] range such that the weight required for a feature to be selected needs to be higher than . • A simple and effective approach to weigh terms is the document frequency , denoted by DF(tk); which is the number of instances in which the term tk occurs. Speaker: Ying-ying, Chen
supervised learning algorithm • Each node in the hierarchy acts as a linear classifier which is compared with the resource to be classified • prototype pci • category ci • d are the documents belonging to the category ci • A resource is classified in a certain category if it exceeds a minimum similarity to the category prototype. Speaker: Ying-ying, Chen
Approach – Content-Based User Profiles • Given the cluster sji belonging to the category ci , which is composed of the vector representations corresponding to a set of documents , the centroid vector psji is defined as follows: Speaker: Ying-ying, Chen
Approach–Content-Based User Profiles • As the result of resource comparison with the prototypes , the resource is assigned to the cluster with the closest centroid below the category ci, • C={Csport,Cpolitics}, the Classify function applied to each of them might return the following result: {(Csport,0.97),(Cpolitics,0.14)} Speaker: Ying-ying, Chen
Approach – Content-Based User Profiles • Provided that the similarity is higher than a minimum similarity threshold δ.Experiences no similar enough to any existent centroid according to this threshold cause the creation of new singleton clusters. Speaker: Ying-ying, Chen
Approach – Content-Based User Profiles • Clustercohesiveness • nr:the size of the sr • If the cohesiveness value is higher than a threshold φ; a new concept is created. Otherwise, no updating in the hierarchy takes place. Speaker: Ying-ying, Chen
Approach-Linking Tags to User Interests • In order to build hybrid profiles, categories representing user interests in content-based profiles are populated with the tags users frequently associate to resources in that categories. • Tagged resources have to be first categorized according to the current representation of user interests given by the interest hierarchy. Speaker: Ying-ying, Chen
Approach-Linking Tags to User Interests • For each cluster in the hierarchy, a set of the most frequently used tags is extracted to represent the corresponding tag assignment preferences for the experiences or resources belonging to this cluster. • The set of tags related to a cluster sji within the category ci can be defined within the personomy Pu as follows: Tsji = {t ∈ T |(t,r) ∈ Iu ∧ r ∈ sij } Speaker: Ying-ying, Chen
Approach-Linking Tags to User Interests • Where the tag-frequency for a tag t in Tsji is the number of times the tag was used to tag resources belonging to the cluster as follows: = |{r ∈ R|(t,r) ∈ Iu ∧ r ∈ sij }| Speaker: Ying-ying, Chen
Experimental Results • Experiments were performed using data collected from del.icio.us social bookmarking system. Speaker: Ying-ying, Chen
Experimental Results • For a given user u ∈ U and a given resource r ∈ R , a tag recommender system tries to find a set of tags ˜ T (u,r) ⊆ T for the user to annotate the resource. • Training set 80% of the total tagged bookmarks • Testing set containing the remaining 20% Speaker: Ying-ying, Chen
Experimental Results • The quality of a given list of top-N recommendations was evaluated considering the number of hits. • Number of hits is the number of tag assignments in the test set that were also present in the top-N recommended tags. • N is the total number of recommendations. Speaker: Ying-ying, Chen
Experimental Results • High values of hit-rate indicate that the algorithm was able to predict the assignments in the test sets of the corresponding users. • ˜ T (u,r)→the set of recommended tags • tags (u,r)→the set of real tags assigned by the user to the resource. Speaker: Ying-ying, Chen
Experimental Results • F-measurewasusedtocombineprecisionandrecallvalues: Speaker: Ying-ying, Chen
Experimental Results • Precision increases as the similarity threshold grows , since clusters are smaller in size and recommendations are based on fewer , but highly similar resources. • Conversely , recall tends to decrease since smaller clusters offer less tag diversity. • The best values of hit-rate can be found in the interval 0.1 ≤ δ ≤ 0.3, within which also the best relation between precision and recall is attained for most users. Speaker: Ying-ying, Chen
Experimental Results • Hybrid profiles were compared with tag recommendation based on two different approaches commonly used in folksonomies: • Most popular tags by user(MPTU) Tagsaresortedaccordingtotheirfrequencyofoccurrenceintheuserresourcesandthetop-Ntagsareinturnappliedtomakerecommendations. Tag-based profiles consisting of a single vector of tags. • Most popular tags by resource(MPTR) It is based on collective knowledge instead of person alone. Speaker: Ying-ying, Chen
Experimental Results • recommendations based on hybrid profiles consistently reached higher hit-rates than the approaches based on tag popularity. Speaker: Ying-ying, Chen
Experimental Results • The differences in the performance of hybrid profiles with respect to MPTU and MPTR tested with a paired two-tailed t-test resulted statistical significant at a level of α =0.05 with p-values 0.0119 and 0.0001 respectively. Speaker: Ying-ying, Chen
Conclusions and Comments • Experimental results showing that hybrid profiles are able to out perform two commonly used recommendation methods based on tag popularity. • Future • Non-obviousness • Discriminating power • Comments • The experimental sample are too small • The possibility of tag-based profile Speaker: Ying-ying, Chen