1 / 43

Efficient Top-k Querying over Social Tagging Networks

Ralf Schenkel. Efficient Top-k Querying over Social Tagging Networks. Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Gerhard Weikum. Social Tagging Networks. Common examples: Flickr (images) YouTube (videos) del.icio.us (bookmarks)

todd
Download Presentation

Efficient Top-k Querying over Social Tagging Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ralf Schenkel Efficient Top-k Querying over Social Tagging Networks Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Gerhard Weikum

  2. Social Tagging Networks Common examples: • Flickr (images) • YouTube (videos) • del.icio.us (bookmarks) • Librarything (books) • Discogs (CDs) • CiteULike (papers) • Facebook • Myspace (media) Definition: Social Tagging Network Website where people • publish + tag information • review + rate information • publish their interests • maintain network of friends • interact with friends SIGIR, Singapore

  3. Outline • Search in Social Tagging Networks • Graph Model • Different Information Needs • Effective Query Scoring • Efficient Query Evaluation • Summary & Further Challenges SIGIR, Singapore

  4. Social Network Model travelChina queueingtheory travelNorway USERS TAGS ITEMS SIGIR, Singapore

  5. Social Network Model travelChina queueingtheory travelNorway USERS TAGS ITEMS SIGIR, Singapore

  6. Social Network Model travel queues travel probability travel probability travel tripvldb travelChina queueingtheory travelNorway USERS TAGS harrypotter ITEMS SIGIR, Singapore

  7. Components of a Social Tagging Network Graph G=(UI, EUEIEUI) with • 2 types of nodes: • Users U (optionally weighted) • Items I (optionally weighted) • 3 types of edges: • EU: User-User (optionally weighted) • EI: Item-Item (optionally weighted) • EUI: User-Item (labeled with tags T, opt. weighted) SIGIR, Singapore

  8. Information Need 1: Global travel queues travel probability travel probability travel tripvldb travelChina queueingtheory travelNorway USERS harry potter TAGS harrypotter ITEMS Tags by all users equally important SIGIR, Singapore

  9. Information Need 2: Similar Users travel queues travel probability travel probability travel tripvldb travelChina queueingtheory ? travelNorway USERS travel TAGS harrypotter Tags by users with similar tags/items(„brothers in spirit“)more important ITEMS SIGIR, Singapore

  10. Information Need 3: Trusted Friends travel queues travel probability travel probability travel tripvldb travelChina queueingtheory ? travelNorway USERS probability TAGS harrypotter ITEMS Tags by closely related usersmore important SIGIR, Singapore

  11. Wishlist for Social-Aware Social Search • Search results depend on • Global popularity of items • Collection context of the querying user (books, tags) • Social context of the querying user (trusted friends) • Scalable query processing (similar wishlist for social recommendations) SIGIR, Singapore

  12. Outline • Search in Social Tagging Networks • Effective Query Scoring • Quantifying Friendship Strengths • User-specific Scoring Functions • Experimental Evaluation • Efficient Query Evaluation • Summary & Further Challenges SIGIR, Singapore

  13. Notation U set of users T set of tags I set of items tags(u): tags used by user u items(u): items tagged by user u items(t): items tagged with tag t by at least one user df(t): number of items tagged with tag t tfu(i,t): number of times user u tagged item i with tag t tf(i,t): number of times item i was tagged with tag t user uj tagst11… t1m1 tagstn1… tnmn item i1 … item in SIGIR, Singapore

  14. Quantifying Friendship Strengths • Global „friendship“ strength: • Content-based friendship strength • Graph-based friendship strength • Integrated friendship strength SIGIR, Singapore

  15. Content-Based Friendship Strength • Several alternatives: • based on overlap of tag usage: • based on overlap of tagged items: • For both: • Pcontent(u,u):=0 • normalization such that SIGIR, Singapore

  16. Graph-Based Friendship Strength Pgraph(u,u‘) u2 u3 u4 u5 u6 u7 Edges weighted with Pcontent: • For both: • Pgraph(u,u):=0 • normalization such that u1 u5 u3 u7 u2 u6 u4 Unweighted edges: SIGIR, Singapore

  17. Integrated Friendship Similarity Mixture of • content-based similarity • graph-based friendship similarity • background model (global) (0,,1; +=1) Pint(u,u‘) SIGIR, Singapore

  18. Towards a User-specific Score global friendship strength Convert into user-specific social frequency: Define user-specific social score: SIGIR, Singapore

  19. Including Tag Expansion Problem: Users use different tags for similar things  poor recall (missing relevant results) Example:MPI, MPII, MPI-INF, MPI-CS, Max-Planck-Institut, D5, AG5, DB&IS, UdS, Saarland University, … Solution: 1. Define notion of similar tags 2. Expand queries with similar tags 3. Modify scoring function for expanded queries SIGIR, Singapore

  20. Heuristics for finding similar tags Specialization heuristics: Tag t2specialization of t1 if t1 occurs (almost) whenever t2 occurs Co-Occurrence heuristics: Tags t1 and t2similar if they occur (almost) always together SIGIR, Singapore

  21. Scoring Expanded Queries Naive approach: For query tag t, add similar tags t‘ with sim(t,t‘)>δ to query But: „transportation disaster“ expanded by „train car bus plane …“ „international crime“ expanded by „mafia camorra yakuza …“ Result quality drops due to topic drift Better: auto-tuning incremental expansion [SIGIR’05] For query tag t, consider only expansion with highest combined score per item SIGIR, Singapore

  22. Experimental Evaluation: Effectiveness Systematic evaluation of result quality difficult Three setups: • Manual queries + human assessments • Queries+assessments derived from external info (ex: DMOZ categories) • Automated assessments from context of user • Items tagged by user and/or friends • Items tagged in the future  SIGIR, Singapore

  23. Prototype Implementation SIGIR, Singapore

  24. Preliminary User Study LibraryThing user study: [Data Engineering Bulletin, June 2008] • 6 librarything users with reasonably large library and friend sets • Overall 49 queries • Crawled (part of) librarything: ~1,3 mio books, ~15 mio tags, ~12,000 users, ~18,000 friends • Measured NDCG[10] (1-α) (content) • Result quality generally very high • Limited social influence is best (not enough friends?) • Tag expansion has limited influence on results (1-α) (graph) SIGIR, Singapore

  25. Outline • Search in Social Tagging Networks • Effective Query Scoring • Efficient Query Evaluation • Threshold Algorithms • ContextMerge • Experimental Evaluation • Summary & Further Challenges SIGIR, Singapore

  26. Algorithmic Overview • Input: query q={t1…tn} for user u, α, ,  • Output: k items with highest scores • Goals: • Avoid computing all results • Minimize disk I/O and CPU load • Utilize precomputed information on disk SIGIR, Singapore

  27. Excursion: Threshold Algorithms for Text IR Input: • query q={t1…tn} • lists L(tp) with pairs <i,score(i,tp)>, sorted by score(i,tp)↓ Output: k items with highest aggregated score Algorithm: • scan lists in parallel • maintain partial candidate results with score bounds • terminate as soon as top-k results are stable SIGIR, Singapore

  28. Excursion: Threshold Algorithms Many powerful extensions: • Probabilistic pruning of candidates withguarantees on result quality • Random accesses to index lists • Scheduling scans and random accesses • Dynamic query expansion techniques • Hierarchical top-k for phrases • Structured queries for XML Most variants provably instance optimal Impossible to precompute scoreu(i,t) (materialize BM25 model per user+config)  cannot directly apply Threshold Algorithms SIGIR, Singapore

  29. Revisiting the Social Frequency independent of user u dependent of user u Compute sfu(i,t) on the fly from tf(i,t), friends of u and their tagged documents SIGIR, Singapore

  30. ContextMerge (=0) Precomputed lists: • ITEMS(t): pairs <i,tf(i,t)>, sorted by tf(i,t)↓ • FRIENDS(u): pairs <u‘,Pgraph(u,u‘)>, sorted by Pgraph(u,u‘)↓ • USERITEMS(u‘,t): pairs <i,tfu‘(i,t)>, unsorted Adapted Threshold Algorithm for query u,t1…tn: • Scan ITEMS(tp) and n copies of FRIENDS(u),pick „best“ list • If ITEMS(tp): read next entry • If FRIENDS(u,p): read USERITEMS(u‘,tp) for next friend u‘ • Update candidates and topk • Check for termination SIGIR, Singapore

  31. ContextMerge: Candidates Candidate items c maintain for each query term t tf(t): value read from ITEMS(t) or UNDEF tfu(t): sum of values read from USERITEMS(u‘,t), weighted byPgraph(u,u‘) c(t): unweighted sum of values read from USERITEMS(u‘,t) To compute worstscore(c): • plug tf(t) and tfu(t) into defintion of sfu(t) (0 if UNDEF) • plug sfu(t) into definition of scoreu(t) SIGIR, Singapore

  32. ContextMerge: Candidates To compute bestscore(c): • if tf(t)=UNDEF [not yet seen in ITEMS(t)] use tf(t)=highttfu(t)=highFt· (hight-c(t)) • else [already seen in ITEMS(t)] use tfu(t)=highFt· (tf(t)-c(t)) and plug it into definition of sfu as before hight: current high score in ITEMS(t)highFt: current high score in FRIENDS(u,t) SIGIR, Singapore

  33. ContextMerge: List Selection Lists are greedily selected by highest expected score • ITEMS(t): compute sfu(t), scoreu(t) with tf(t)=hight, tfu(t)=0 • FRIENDS(u,t): compute sfu(t), scoreu(t) with tf(t)=0, tfu(t)=highFt·maxtf max tfu(t) u,t SIGIR, Singapore

  34. ContextMerge: Schematic execution consideredUSERITEMS(u‘,t1) consideredUSERITEMS(u‘,t2) Items(t1) Items(t2) Friends(u,t1) Friends(u,t1) SIGIR, Singapore

  35. ContextMerge: Schematic execution consideredUSERITEMS(u‘,t1) consideredUSERITEMS(u‘,t2) Items(t1) Items(t2) Friends(u,t1) Friends(u,t1) u7 SIGIR, Singapore

  36. ContextMerge: Schematic execution consideredUSERITEMS(u‘,t1) consideredUSERITEMS(u‘,t2) Items(t1) Items(t2) Friends(u,t1) Friends(u,t1) u7 SIGIR, Singapore

  37. ContextMerge: Schematic execution consideredUSERITEMS(u‘,t1) consideredUSERITEMS(u‘,t2) Items(t1) Items(t2) Friends(u,t1) Friends(u,t1) u7 SIGIR, Singapore

  38. ContextMerge: Schematic execution consideredUSERITEMS(u‘,t1) consideredUSERITEMS(u‘,t2) Items(t1) Items(t2) Friends(u,t1) Friends(u,t1) SIGIR, Singapore

  39. Experimental Evaluation: Efficiency • Testbed: 3 large crawls of real social networks • Flickr: 10 mio pictures, ~50,000 users • Del.icio.us: ~175,000 bookmarks, ~12,000 users • Librarything: ~6.5 mio books, ~10,000 users • Queries: • ~150 frequent tag pairs in each set • for each query pick user with „enough“ results & friends • Cost measure: #sorted acc. + 100#random acc. • Baseline: full join + sort SIGIR, Singapore

  40. Experimental Evaluation: Efficiency α SIGIR, Singapore

  41. Outline • Search in Social Tagging Networks • Effective Query Scoring • Efficient Query Evaluation • Summary & Further Challenges SIGIR, Singapore

  42. Summary • Need for social-aware social search, supporting • global • social • spiritual information needs • Social scoring • integrating global, collection, and social context • including dynamic tag expansion • ContextMerge: scalable implementation SIGIR, Singapore

  43. Further Challenges • Meaningful & common benchmark • Incremental maintenance for high dynamics • Extend to ratings, user weights, item weights, … • Extend to non-tags (like image features) • Automatic query parameterization • Meaningful explanations of results • Exploit dynamics (hot topics, evolving groups,….) Social-Aware Search & Recommendationsat planet scale SIGIR, Singapore

More Related