470 likes | 562 Views
HUMANE INFORMATION SEEKING: Going beyond the IR Way. JIN YOUNG KIM @ SNU DCC. Jin Young Kim. Graduate of SNU EE / Business 5 th Year Ph.D Student in UMass Computer Science Starting as a Applied Researcher at Microsoft Bing. Today’s Agenda. A brief introduction of IR as a research area
E N D
HUMANE INFORMATION SEEKING:Going beyond the IR Way JIN YOUNG KIM @ SNU DCC
Jin Young Kim • Graduate of SNU EE / Business • 5thYear Ph.D Student in UMass Computer Science • Starting as a Applied Researcher at Microsoft Bing
Today’s Agenda • A brief introduction of IR as a research area • An example of how we design a retrieval model • Other research projects and recent trends in IR
Background An Information Retrieval Primer
Information Retrieval? • The study of how an automated system can enable its users to access, interact with, and make sense of information. Query Surface Issue Document User Visit
IR Research in Context • Situated between human interface and system / analytics research • Aims at satisfying user’s information needs • Based on large-scale system infrastructure & analytics • Need for convergence research! End-user Interface (UX / HCI / InfoViz) Information Retrieval Large-scale System Infra. Large-scale (Text)Analytics
Major Problems in IR • Matching • (Keyword) Search : query – document • Personalized Search : (user+query) – document • Contextual Advertising : (user+context) – advertisement • Quality • Authority/ Spam / Freshness • Various ways to capture them • Relevance Scoring • Combination of matching and quality features • Evaluation is critical for optimal performance Query Surface Issue Document User Visit
Humane Information Retrieval Going Beyond the IR Way
Information seeking requires a communication. You need the freedom of expression. You need someone who understands.
Information Seeking circa 2012 Search engine accepts keywords only. Search engine doesn’t understand you.
Toward Humane Information Seeking Rich User Modeling Rich User Interactions Profile Context Behavior Search Browsing Filtering
HCIR Way: IRWay: The from Query to Session Rich User Modeling Rich User Interaction USER SYSTEM Response Action User Model Interaction History Response Action Profile Context Behavior Response Action Filtering Conditions Related Items … Filtering / Browsing Relevance Feedback … HCIR = HCI + IR
The Rest of Talk… Personal Search Improving search and browsing for known-item finding Evaluating interactions combining search and browsing Web Search User modeling based on reading level and topic Providing non-intrusive recommendations for browsing Book Search Analyzing interactions combining search and filtering
Personal Search Retrieval And Evaluation Techniques for Personal Information [Thesis]
Example: Search over Social Media Example: Desktop Search Ranking using Multiple Document Types for Desktop Search [SIGIR10] Evaluating Search in Personal Social Media Collections [WSDM12]
Structured Document Retrieval: Background • Field Operator / Advanced Search Interface • User’s search terms are found in multiple fields Understanding Re-finding Behavior in Naturalistic Email Interaction Logs. Elsweiler, D, Harvey, M, Hacker., M [SIGIR'11]
Structured Document Retrieval: Models • Document-based Retrieval Model • Score each document as a whole • Field-based Retrieval Model • Combine evidences from each field f1 f2 ... fn q1 q2 ... qm q1 q2 ... qm f1 f1 w1 w1 f2 f2 w2 w2 ... ... fn fn wn wn Document-based Scoring Field-based Scoring
Improved Matching for Email Search Structured Documents [CIKM09, ECIR09,12] • Field Relevance • Different field is important for different query-term ‘registration’ is relevant when it occurs in <subject> 1 1 2 2 1 2 ‘james’ is relevant when it occurs in <to>
Estimating the Field Relevance • If User Provides Feedback • Relevant document provides sufficient information • If No Feedback is Available • Combine field-level term statistics from multiple sources from/to from/to + ≅ title title content content Collection Top-k Docs from/to title content Relevant Docs
Retrieval Using the Field Relevance • Comparison with Previous Work • Ranking in the Field Relevance Model sum q1 q2 ... qm q1 q2 ... qm f1 f1 f1 f1 w1 w1 multiply P(F1|q1) P(F1|qm) f2 f2 f2 f2 w2 w2 P(F2|q1) P(F2|qm) ... ... ... ... Per-term Field Score fn fn fn fn wn wn P(Fn|q1) P(Fn|qm) Per-term Field Weight
Evaluating the Field Relevance Model • Retrieval Effectiveness (Metric: Mean Reciprocal Rank) Per-term Field Weights Fixed Field Weights
Evaluation Challenges for Personal Search [CIKM09,SIGIR10,CIKM11] • Evaluation of Personal Search • Each based on its own user study • No comparative evaluation was performed yet • Solution: Simulated Collections • Crawl CS department webpages, docs and calendars • Recruit department people for user study • Collecting User Logs • DocTrack: a human-computation search game • Probabilistic User Model: a method for user simulation
Summary so far… • Query Modeling for Structured Documents • Using the estimated field relevance improves the retrieval • User’s feedback can help personalize the field relevance • Evaluation Challenges in Personal Search • Simulation of the search task using game-like structures • Related work : ‘Find It If You Can’ [SIGIR11]
Web Search Characterizing Web Content, User Interests, and Search Behavior by Reading Level and Topic [WSDM12]
Reading level distribution varies across major topical categories
User Modeling by Reading Level and Topic • Reading Level and Topic • Reading Level: proficiency (comprehensibility) • Topic: topical areas of interests • Profile Construction • Profile Applications • Improving personalized search ranking • Enabling expert content recommendation P(R|d1) P(R|d1) P(R|d1) P(T|d1) P(T|d1) P(T|d1) P(R,T|u)
Metric • % of user’s preferences predicted by profile matching • Profile matching measured in KL-Divergence of RT profiles • Results • By the degree of focus in user profile • By the distance metric between user and website Profile matching can predict user’s preference over search results
Expert vs. Non-expert URLs taken from [White’09] Comparing Expert vs. Non-expert URLs Lower Topic Diversity Higher Reading Level
Enabling Browsing for Web Search [Work-in-progress] • SurfCanyon® • Recommend results based on clicks Initial results indicate that recommendations are useful for shopping domain.
Book Search Understanding Book Search Behavior on the Web [Submitted to SIGIR12]
Understanding Book Search on the Web • OpenLibrary • User-contributed online digital library • DataSet: 8M records from web server log
Comparison of Navigational Behavior • Users entering directly show different behaviors from users entering via web search engines Users entering via Google Users entering the site directly
Comparison of Search Behavior Rich interaction reduces the query lengths Filtering induces more interactions than search
Where’s the Future? – Social Search • The New Bing Sidebar makes search a social activity.
Where’s the Future? – Semantic Search • The New Google serves ‘knowledge’ as well as docs.
Where’s the Future? – Siri-like Agent • The New Google serves ‘knowledge’ as well as docs.
Exciting Future is Awaiting US! • Recommended Readings in IR: • http://www.cs.rmit.edu.au/swirl12 Any Questions?
Selected Publications More at @lifidea, or cs.umass.edu/~jykim • Structured Document Retrieval • A Probabilistic Retrieval Model for Semi-structured Data [ECIR09] • A Field Relevance Model for Structured Document Retrieval [ECIR11] • Personal Search • Retrieval Experiments using Pseudo-Desktop Collections [CIKM09] • Ranking using Multiple Document Types in Desktop Search [SIGIR10] • Building a Semantic Representation for Personal Information [CIKM10] • Evaluating an Associative Browsing Model for Personal Info. [CIKM11] • Evaluating Search in Personal Social Media Collections [WSDM12] • Web / Book Search • Characterizing Web Content, User Interests, and Search Behavior by Reading Level and Topic [WSDM12] • Understanding Book Search Behavior on the Web [In submission to SIGIR12]
My Self-tracking Efforts • Life-optimization Project (2002~2006) • LiFiDeA Project (2011-2012)
The Great Divide: IR vs. HCI IR HCI User / System User Value / Satisfaction Interface / Visualization Human-centered Design User Study CHI / UIST / CSCW • Query / Document • Relevant Results • Ranking / Suggestions • Feature Engineering • Batch Evaluation (TREC) • SIGIR / CIKM / WSDM Can we learn from each other?
The Great Divide: IR vs. RecSys IR RecSys User / Item Proactive (push item) RecSys / KDD / UMAP • Query / Document • Reactive (given query) • SIGIR / CIKM / WSDM
The Great Divide: IR in CS vs. LIS IR in CS IR in LIS Focus on behavioral study & understanding User study & qualitative evaluation ASIS&T / JCDL UNC / Rutgers / UW • Focus on ranking & relevance optimization • Batch & quantitative evaluation • SIGIR / CIKM / WSDM • UMass / CMU / Glasgow
Problems & Techniques in IR • What • How
More about the Matching Problem • Finding Representations • Term vector vs. Term distribution • Topical category, Reading level, … • Estimating Representations • By counting terms • Using automatic classifiers • Calculating Matching Scores • Cosine similarity vs. KL-divergence • Combining multiple reps. Query Surface Issue Document User Visit