Web Search with Variable User Model

Web Search with Variable User Model Peter Gurský Stanislav Krajči Tomáš Horváth Róbert Novotný Jozef Jirásek Veronika Vaneková Peter Vojtáš PF UPJŠ Košice MFF UK Praha Datakon, 22.10.2007

Problem: Information Overload • Multiple sources • Different structure, layout, usage • Various software tools with different sets of answers Datakon 22.10.2007

Objectives • Integrate data from heterogeneous sources • Find adequate number of answers that match user preferences • Suitable representation of user preferences Datakon 22.10.2007

System Architecture Corporate memory Ontology HTML files annotation crawler Top-k objects query WEB evaluation Middleware system best objects Datakon 22.10.2007

Text-Oriented Annotation • Regular expressions • Analyze of visual representation • Structural differences: • Element hierarchy • HTML attributes • HTML node values Datakon 22.10.2007

Graphic-Oriented Annotation • Preliminary exploration. • Web pages may contain pictures, flash animations, ... This information is not available from web page source. • We use OCR processing and analysis of color, position, ... Datakon 22.10.2007

User Dependent Querying Object display and evaluation Evaluate Evaluate Display Find Rules Suitable Object Search (Top-k) Learning Preferences (IGAP) Find Top-k Objects RDF repository Preferences Datakon 22.10.2007

Retrieving Preferences from User • Direct user specification • Collaborative filtering • Learning preferences from sample objects evaluated by user • Iterative method: repeat evaluating until the relevant objects are found Datakon 22.10.2007

Learning Preference from Evaluation Datakon 22.10.2007

Basic Fuzzy Set Types • Lower values are better • Higher values are better • Middle values are better • Either high or low, but not middle Datakon 22.10.2007

Aggregation Each fuzzy set relates to one attribute, e.g. number of stars. Thus we obtain partial relevance for every attribute. Overall relevance is result of aggregation: • Weighted average (continuous range)goodU = 2/3*cheapU + 1/3*high-classU • Rules (discretized range)evaluationU = good IF (price≤500 AND stars≥***)evaluationU = excellent IF (distance≤1 km) Datakon 22.10.2007

User 1 User 2 User 3 User 4 Close Far Middle distance Border Middle price Cheap Middle price Border Datakon 22.10.2007

Relevant Object Search • having retrieved local and global preferences, we can find top-k objects according to user preferences • do not browse and calculate above all data, use only those that are necessary • use 3-phased No Random Access Algorithm – an improvement of Fagin's algorithm Datakon 22.10.2007

User Independent Querying • Text-based vector model • Document is defined as a vector ofTF-IDF weights of the document terms • Weights are stored in database index • Similarity ofqueryand document collection isdetermined by cosine measure Datakon 22.10.2007

Thank You for Your Attention. Questions?

Web Search with Variable User Model

Web Search with Variable User Model

Presentation Transcript

Click Chain Model in Web Search

Web Search

Personalized Ranking Model Adaptation for Web Search

Web Search

Web Search

Click Chain Model in Web Search

Web Search

Web Search

User-Centric Web Search: We-Centric Aspect

User Experience Issues in Web Search

Data Analysis with a Latent Variable Model

Learning User Clicks in Web Search

Web Search

Advanced Search with Solr - User Guide

Web Search

Web Search

OJAX: A Web 2.0 search user interface

Web Search