Characterizing Web Content, User Interests, and Search Behavior by Reading Level and Topic

Characterizing Web Content, User Interests, and Search Behavior by Reading Level and Topic
Jin Young Kim*, Kevyn Collins-Thompson, Paul Bennett and Susan Dumais *Work done during internship at Microsoft Research

Search and recommendation are about the matching.
Queries Documents Websites Users

Term-space matching is not always a good idea.
Granularity Sparsity Efficiency

Can we build representations beyond the term vectors?
Topic Category Reading Level Sentiment Style

What would be their implications for search and recommendations?
Queries Documents Websites Users Topic Category Reading Level Sentiment Style

WHAT WE FOUND: In a Nutshell, WHAT WE DID: Build Profiles of Reading Level and Topic (RLT) For queries, websites, users and search sessions In order to characterize and compare entities Profile matching predicts user’s content preference Profiles can indicate when not to personalize Profile features can predict expert content

Building Reading Level and Topic Profiles

Predicting Reading Level and Topic for URL Reading Level Classifier Based on language model and other sources Topic Classifier Trained using URLs in each Open Directory Project category Profile Distribution over reading level, topic,or reading level and topic (RLT) P(R|d1) P(T|d1)

Entity Profile Built from Related URLs Entities and Related URLs Websites : content vs. user-viewed URLs Users : URLs visited during search sessions Queries : top-10 retrieved URLs Example: Site profile made from URLs visited during search sessions P(R|d1) P(R|d1) P(R|d1) P(T|d1) P(T|d1) P(T|d1) P(R,T|s)

Entity Profile Built with Related Entities Entity and related entities User – Websites visited Website – Surfacing queries Query – Issuing users Example: Site profile made from the profiles of its visitors Query Surface Issue Website User Visit P(R,T|s) P(R,T|u) P(R,T|u) P(R,T|u)

Characterizing and Comparing Profiles Characterizing an Individual Entity Mean : expectation Variance : entropy Characterizing a Group of Entities Build a group centroid from its members Variance : divergence among members Comparing Entitles and Groups Difference in mean Divergence in profile (distribution)

Characterizing Web Content, User Interests, and Search Behavior

Data Set Session Log Data 2,281,150 URL visits (1,218,433 SERP clicks) Collected from 8,841 users Profiles of Entities 4,715 websites with 25+ clicked URLs 7,613 users with 25+ URL visits 141,325 unique queries

Reading Level Distribution for Top ODP Categories Each topic has different reading level distribution

Topic and reading level characterize websites in each category

Profile matching predict user’s preference over search results Metric % of user’s preferences predicted by profile matching,for each clickedwebsite over the skippedwebsite above Results By degree of focus in user profile : H(R,T|u) By the distance metric between user and website KLR(u,s) / KLT(u,s) / KLRLT(u,s)

Users’ Deviation from Their Own Profiles Stretch reading Session-level reading level >> Long-term reading level Casual reading Session-level reading level << Long-term reading level

Comparing Expert vs. Non-expert URLs Expert vs. Non-expert URLs taken from [White’09]

Predicting Expert vs. Novice Websites Results Features

WHAT WE FOUND: Thank you for your attention! WHAT WE DID: Build Profiles of Reading Level and Topic (RLT) For Queries, Websites, Users and Search Sessions To characterize and compare entities Profile matching predict user’s content preference Profiles can indicate when not to personalize Profile features can predict expert content More at : @jin4ir / cs.umass.edu/~jykim

Optional Slides

Correlation between Site vs. Visitor Profiles Website reading level vs. visitor diversity Breakdown per topic revealsstronger relationship

Query / User Reading Level against P(Topic) User profile shows different trends in Computers

Characterizing Web Content, User Interests, and Search Behavior by Reading Level and Topic

Characterizing Web Content, User Interests, and Search Behavior by Reading Level and Topic

Presentation Transcript

Characterizing Web Content, User Interests, and Search Behavior by Reading Level and Topic

Search and recommendation are about the matching.

Term-space matching is not always a good idea.

Can we build representations beyond the term vectors?

What would be their implications for search and recommendations?

Combining Link and Content Information in Web Search

Content Metadata and Search

Web 2.0 – User generated Content

Privacy and User Generated Content

Characterizing the Influence of Domain Expertise on Web Search Behavior

Utiliz ing OPAC Search Logs and Google Analytics Assessing OPAC Effectiveness and User Search Behavior

Detecting Search Engine Switching Based on User Preferences, Search Tasks, and Behavior Patterns

Reading and Content Area Learning

JSMeter : Characterizing the Behavior of JavaScript Web Applications

User Generated Content and Crowdsourcing

Content Search Web Part and Stuff

Personalized Web Search by Mapping User Queries to Categories

Topics and Transitions: Investigation of User Search Behavior

Content Management Server powered by Semantic Web Search

Improving Web Search Ranking by Incorporating User Behavior Information

Modeling User Interactions in Web Search and Social Media

Characterizing and Predicting Search Engine Switching Behavior

Characterizing and Supporting Cross-Device Search Tasks

Web Content, Search Portals And Social Media Global Market

Characterizing and Predicting Search Engine Switching Behavior

Characterizing and Supporting Cross-Device Search Tasks

Content Reading and Literacy: