170 likes | 342 Views
Clustering Applications in Web Mining and Web Personalization. Bamshad Mobasher DePaul University. Clustering Application: Web Usage Mining. Discovering Aggregate Usage Profiles
E N D
ClusteringApplications in Web Mining and Web Personalization Bamshad Mobasher DePaul University
Clustering Application: Web Usage Mining • Discovering Aggregate Usage Profiles • Goal: to effectively capture “user segments” based on their common usage patterns from potentially anonymous click-stream data • Method: Cluster user transactions to obtain user segments automatically, then represent each cluster by its centroid • Aggregate profiles are obtained from each centroid after sorting by weight and filtering out low-weight items in each centroid • Note that profiles are represented as weighted collections of items (pages, products, etc.) • weights represent the significance of the item within each cluster • profiles are overlapping, so they capture common interests among different groups/types of users (e.g., customer segments)
Profile Aggregation Based on Clustering Transactions (PACT) • Discovery of Profiles Based on Transaction Clusters • cluster user transactions - features are significant items present in the transaction • derive usage profiles (set of item-weight pairs) based on characteristics of each transaction cluster as captured in the cluster centroid • Deriving Usage Profiles from Transaction Clusters • each cluster contains a set of user transactions (vectors) • for each cluster compute centroid as cluster representative • a set of item-weight pairs: for transaction cluster C, select each pageviewpi such that (in the cluster centroid) is greater than a pre-specified threshold
PACT - An Example Original Session/user data Given an active session A B, the best matching profile is Profile 1. This may result in a recommendation for page F.html, since it appears with high weight in that profile. Result of Clustering PROFILE 0 (Cluster Size = 3) -------------------------------------- 1.00 C.html 1.00 D.html PROFILE 1 (Cluster Size = 4) -------------------------------------- 1.00 B.html 1.00 F.html 0.75 A.html 0.25 C.html PROFILE 2 (Cluster Size = 3) -------------------------------------- 1.00 A.html 1.00 D.html 1.00 E.html 0.33 C.html
Web Usage Mining: clustering example • Transaction Clusters: • Clustering similar user transactions and using centroid of each cluster as an aggregate usage profile (representative for a user segment) Sample cluster centroid from dept. Web site (cluster size =330)
Clustering Application: Discovery of Content Profiles • Content Profiles • Goal: automatically group together documents which partially deal with similar concepts • Method: • identify concepts by clustering features (keywords) based on their common occurrences among documents (can also be done using association discovery or correlation analysis) • cluster centroids represent docs in which features in the cluster appear frequently • Content profiles are derived from centroids after filtering out low-weight docs in each centroid • Note that each content profile is represented as a collections of item-weight pairs (similar to usage profiles) • however, the weight of an item in a profile represents the degree to which features in the corresponding cluster appear in that item.
Content Profiles – An Example PROFILE 0 (Cluster Size = 3) -------------------------------------------------------------------------------------------------------------- 1.00 C.html (web, data, mining) 1.00 D.html (web, data, mining) 0.67 B.html (data, mining) PROFILE 1 (Cluster Size = 4) ------------------------------------------------------------------------------------------------------------- 1.00 B.html (business, intelligence, marketing, ecommerce) 1.00 F.html (business, intelligence, marketing, ecommerce) 0.75 A.html (business, intelligence, marketing) 0.50 C.html (marketing, ecommerce) 0.50 E.html (intelligence, marketing) PROFILE 2 (Cluster Size = 3) ------------------------------------------------------------------------------------------------------------- 1.00 A.html (search, information, retrieval) 1.00 E.html (search, information, retrieval) 0.67 C.html (information, retrieval) 0.67 D.html (information, retireval) Filtering threshold = 0.5
User Segments Based on Content • Essentially combines usage and content profiling techniques discussed earlier • Basic Idea: • for each user/session, extract important features of the selected documents/items • based on the global dictionary create a user-feature matrix • each row is a feature vector representing significant terms associated with documents/items selected by the user in a given session • weight can be determined as before (e.g., using tf.idf measure) • next, cluster users/sessions using features as dimensions • Profile generation: • from the user clusters we can now generate overlapping collections of features based on cluster centroids • the weights associated with features in each profile represents the significance of that feature for the corresponding group of users.
User transaction matrix UT Feature-Document Matrix FP
Content Enhanced Transactions User-Feature Matrix UF Note that: UF = UT x FPT Example: users 4 and 6 are more interested in concepts related to Web information retrieval, while user 3 is more interested in data mining.
Clustering and Collaborative Filtering :: Example - clustering based on ratings Consider the following book ratings data (Scale: 1-5)
Clustering and Collaborative Filtering :: Example - clustering based on ratings • Cluster centroids after k-means clustering with k=4 • In this case, each centroid represented the average rating (in that cluster of users) for each item • The first column shows the centroid of the whole dataset, i.e., the overall item average ratings across all users
Clustering and Collaborative Filtering :: Example - clustering based on ratings This approach provides a model-based (and more scalable) versionof user-based collaborative filtering, compared to k-nearest-neighbor NU1 has highest similarity to cluster 3 centroid. The whole cluster couldbe used as the “neighborhood” for NU1.
Clustering and Collaborative Filtering :: clustering based on ratings: movielens
Hierarchical Clustering:: example – clustered search results Can drill down within clusters to view sub-topics or to view the relevant subset of results
ClusteringApplications in Web Mining and Web Personalization Bamshad Mobasher DePaul University