1 / 0

Characterizing Web Content, User Interests, and Search Behavior by Reading Level and Topic

Characterizing Web Content, User Interests, and Search Behavior by Reading Level and Topic. Jin Young Kim*, Kevyn Collins-Thompson, Paul Bennett and Susan Dumais. *Work done during internship at Microsoft Research . Search and recommendation are about the matching. Queries Documents

shiloh
Download Presentation

Characterizing Web Content, User Interests, and Search Behavior by Reading Level and Topic

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Characterizing Web Content, User Interests, and Search Behavior by Reading Level and Topic

    Jin Young Kim*, Kevyn Collins-Thompson, Paul Bennett and Susan Dumais *Work done during internship at Microsoft Research
  2. Search and recommendation are about the matching.

    Queries Documents Websites Users
  3. Term-space matching is not always a good idea.

    Granularity Sparsity Efficiency
  4. Can we build representations beyond the term vectors?

    Topic Category Reading Level Sentiment Style
  5. What would be their implications for search and recommendations?

    Queries Documents Websites Users Topic Category Reading Level Sentiment Style
  6. WHAT WE FOUND: In a Nutshell, WHAT WE DID: Build Profiles of Reading Level and Topic (RLT) For queries, websites, users and search sessions In order to characterize and compare entities Profile matching predicts user’s content preference Profiles can indicate when not to personalize Profile features can predict expert content
  7. Building Reading Level and Topic Profiles
  8. Predicting Reading Level and Topic for URL Reading Level Classifier Based on language model and other sources Topic Classifier Trained using URLs in each Open Directory Project category Profile Distribution over reading level, topic,or reading level and topic (RLT) P(R|d1) P(T|d1)
  9. Entity Profile Built from Related URLs Entities and Related URLs Websites : content vs. user-viewed URLs Users : URLs visited during search sessions Queries : top-10 retrieved URLs Example: Site profile made from URLs visited during search sessions P(R|d1) P(R|d1) P(R|d1) P(T|d1) P(T|d1) P(T|d1) P(R,T|s)
  10. Entity Profile Built with Related Entities Entity and related entities User – Websites visited Website – Surfacing queries Query – Issuing users Example: Site profile made from the profiles of its visitors Query Surface Issue Website User Visit P(R,T|s) P(R,T|u) P(R,T|u) P(R,T|u)
  11. Characterizing and Comparing Profiles Characterizing an Individual Entity Mean : expectation Variance : entropy Characterizing a Group of Entities Build a group centroid from its members Variance : divergence among members Comparing Entitles and Groups Difference in mean Divergence in profile (distribution)
  12. Characterizing Web Content, User Interests, and Search Behavior
  13. Data Set Session Log Data 2,281,150 URL visits (1,218,433 SERP clicks) Collected from 8,841 users Profiles of Entities 4,715 websites with 25+ clicked URLs 7,613 users with 25+ URL visits 141,325 unique queries
  14. Reading Level Distribution for Top ODP Categories Each topic has different reading level distribution
  15. Topic and reading level characterize websites in each category
  16. Profile matching predict user’s preference over search results Metric % of user’s preferences predicted by profile matching,for each clickedwebsite over the skippedwebsite above Results By degree of focus in user profile : H(R,T|u) By the distance metric between user and website KLR(u,s) / KLT(u,s) / KLRLT(u,s)
  17. Users’ Deviation from Their Own Profiles Stretch reading Session-level reading level >> Long-term reading level Casual reading Session-level reading level << Long-term reading level
  18. Comparing Expert vs. Non-expert URLs Expert vs. Non-expert URLs taken from [White’09]
  19. Predicting Expert vs. Novice Websites Results Features
  20. WHAT WE FOUND: Thank you for your attention! WHAT WE DID: Build Profiles of Reading Level and Topic (RLT) For Queries, Websites, Users and Search Sessions To characterize and compare entities Profile matching predict user’s content preference Profiles can indicate when not to personalize Profile features can predict expert content More at : @jin4ir / cs.umass.edu/~jykim
  21. Optional Slides
  22. Correlation between Site vs. Visitor Profiles Website reading level vs. visitor diversity Breakdown per topic revealsstronger relationship
  23. Query / User Reading Level against P(Topic) User profile shows different trends in Computers
More Related