640 likes | 746 Views
Capturing User Intent for Information Retrieval. Hien Nguyen University of Connecticut Major advisor : Dr. Eugene Santos Jr. (UCONN) Associate advisor : Dr. Robert McCartney (UCONN) Associate advisor : Dr. AnHai Doan (UIUC) Associate advisor : Dr. Robert Henning (UCONN).
E N D
Capturing User Intent for Information Retrieval Hien Nguyen University of Connecticut Major advisor: Dr. Eugene Santos Jr. (UCONN) Associate advisor: Dr. Robert McCartney (UCONN) Associate advisor: Dr. AnHai Doan (UIUC) Associate advisor: Dr. Robert Henning (UCONN)
A simple example Julia Ben FORD Ken
Outline • Problem • Motivation • Our approach • Empirical evaluation • Conclusion
Outline Outline • Problem • Motivation • Our approach • Empirical evaluation • Conclusion
Problem Information needs Information resources Intermediary User • Why do we need user models for IR? • Employing a cognitive user model for Information Retrieval (IR): capture and use knowledge about a user to improve a user’s effectiveness in an information seeking task.
Problem • Why is it a tough problem? • Partiality • Vagueness • Dynamics • Uncertainty
Outline Outline Outline • Problem • Motivation • Our approach • Empirical evaluation • Conclusion
Motivation • Existing methodologies for building user model for IR include: • System-centered approaches: use IR techniques. (e.g Spink & Losee(1996), Efthimis(96), Lopez-Pujalte (03),Drucker et al(02)) • User-centered approaches: use Human Computer Interaction (HCI)/Artificial Intelligence (AI) techniques (e.g:Belkin(93), Radlord(96)) • Hybrid approaches: combine IR with HCI/AI techniques. (e.g Logan et al. (94), Decampos et. al 98, Ruthven et al. 03) • Very little crossover between IR and AI/HCI to build user models for IR
Motivation • Existing methodologies for building user model for IR include: • System-centered approaches: use IR techniques. (e.g Spink & Losee(1996), Efthimis(96), Lopez-Pujalte (03),Drucker et al(02)) • User-centered approaches: use Human Computer Interaction (HCI)/Artificial Intelligence (AI) techniques (e.g:Belkin(93), Radlord(96)) • Hybrid approaches: combine IR with HCI/AI techniques. (e.g Logan et al. (94), Decampos et. al 98, Ruthven et al. 03) • Very little crossover between IR and AI/HCI to build user models for IR
Motivation • Existing methodologies for building user model for IR include: • System-centered approaches: use IR techniques. (e.g Spink & Losee(1996), Efthimis(96), Lopez-Pujalte (03),Drucker et al(02)) • User-centered approaches: use Human Computer Interaction (HCI)/Artificial Intelligence (AI) techniques (e.g:Belkin(93), Radlord(96)) • Hybrid approaches: combine IR with HCI/AI techniques. (e.g Logan et al. (94), Decampos et. al 98, Ruthven et al. 03) • Very little crossover between IR and AI/HCI to build user models for IR
Motivation • Important factors for building user models for IR: • Partiality • Vagueness • Incremental Relevance feedback • Dynamics • Adaptive • Uncertainty • Intent: • Author’s intent • User’s intent
Thesis of our research • We try to improve a user’s effectiveness in an information seeking task by: • Developing a hybrid user model to capture user intent dynamically by analyzing behavioral information of retrieved relevant documents and by combiningthe captured user intent with the elements of an IR system in a decision theoretic framework (ICAI00, IAT01, UM03, HFES03 & 04, AH04) • Using IR evaluation procedures and collections and examining usability testing to evaluate this model (HFES04, AH04, UM05)
Contributions • Develop a hybrid user model by combining information about a user and information about an IR system in a decision theoretic framework • Develop a unified evaluation framework • Fine-grained representation • Ability to learn user knowledge dynamically
Outline Outline Outline • Problem • Motivation • Our approach • IPC Model • Hybrid Model • Empirical evaluation • Conclusion
IPC User Model (ICAI00, ITA01, UM03, HFES03) • Captures user intent. • Consists of 3 components: • User interests (I): “What needs to be done or accomplished?” • User preferences (P): “How is something done or accomplished?” • User context (C): “Why is the user trying to accomplish something?”
Context Network (C) Cosmids Enzyme Isa Biologically Active Substance Isa Urate Urate Urate oxidase Urate oxidase Urate oxidase Enzyme Isa (a) Isa Isa Isa Isa Biologically Active Substance Isa Enzyme Enzyme (b) Biologically Active Substance Isa • Captures user knowledge. It contains concept nodes and relation nodes. • Is constructed “on-the-fly” by finding intersections of all retrieved relevant document graphs.
Interest Set (I) • Determines what is currently relevant to a user. • Each element of interest set consists of interest concept (a) and interest levelL(a). • Fading mechanism: L(a) = 0.5*(L(a) + n/m) n: number of retrieved relevant document with a m: number of retrieved documents
Preference Network (P) Pc11 Pc12 G1 A1 Pc22 Pc31 Pc32 Pc11 Pc12 G1 G2 G3 A2 A1 A3 • Represents how a user wants to form a query • Is represented using Bayesian networks. • Consists of pre-condition, goal and action nodes • Pre-condition: represents the requirement of a tool used to form a query • Goal: represents a tool to form a query (filter/expander) • Action: represents the modified query
Preference Network (P) • Update: When a user gives relevance feedback after each query. • Correction function calculates the probability that a new preference network will improve retrieval performance for both tools. • The one with higher probability will be added
Implementation of IPC User Model • Given M={I,P,C} and a query graph q. • Construct I’ by spreading activation algorithm on C. • Set as evidences all interest concepts of I’ found in P and query node representing q found in P. • Perform belief updating on P. Choose top n goal nodes from P (G) • For every goal g in G: • Depending on each g, add corresponding paths in C to q
An example suspicious banking transactions involving Abdul Ramazi. Query: Banking transaction Retrieved document: • Report 1: date 1 April, 2003 • Report 14: date 21 April, 2003 • Report 16: date 27 April, 2003 • Report 7: date 15 April, 2003 • Report 8: date 19 April, 2003 Report 1: date 1 April, 2003 Report 14: date 21 April, 2003 Report 16: date 27 April, 2003 Report 7: date 15 April, 2003 Report 8: date 19 April, 2003
Example of Query Graph banking_ transaction isa related_to transaction bank Query: banking transaction:
Example of Document Graph FBI 1) Report Date: 1 April, 2003. FBI: Abdul Ramazi is the owner of the Select Gourmet Foods shop in Springfield Mall. First Union National Bank lists Select Gourmet Foods as holding account number. Six checks totaling $35.000 have been deposited in this account in the past four months and are recorded as having been drawn on accounts at the Pyramid Bank of Cairo, Egypt and the Central Bank of Dubai, United Arab Emirates. Both of these banks have just been listed as possible conduits in money launderingschemes. Dubai Central Bank First Union National Bank Cairo Pyramid Bank Cairo Abdul_ Ramazi Related_to Related_to Isa Related_to Isa Isa Isa Isa Select Gourmet Foods shop Abdul Ramazi Bank money laundering scheme Relate_to Holding Account number Springfield Mall ……… Isa Related_to scheme account
Intersection of retrieved relevant documents First Union National bank bank isa account _owner Abdul_ Ramari bank _account isa related_to Abdul _ramazi Abdul related_to Abdul _ramazi ramazi isa …
Existing Interest Set Interest concept Interest level money_laundering 0.87 deposit 0.82 withdraw 0.8 bank _account 0.76 …..
Updated Interest Set Interest concept Interest level abdul_ramazi 0.83 chicago 0.76 bank _account 0.7 first_union_national_bank 0.66 …..
Existing Context Network money_ laundering deposit account withdraw isa related _to isa isa bank_ account related _to banking_ transaction isa Related_to transaction bank
Updated Context Network money_ laundering deposit account account _owner withdraw isa isa related_to related _to isa isa Abdul_ Ramari bank_ account related _to banking_ transaction First Union National bank isa Related_to isa transaction bank ...
Existing Preference Network wmd bank_ account Iraq terrorism query_10 68822.. query_11 6678 Qusay forged_do cument money_ laundering … filter_10 68822.. expander _1168822 proactive_ query_10 68822.. proactive_ query _1168822
Updated Preference Network deposit withdraw forged_do cument query_11 6678 query_10 68822.. terrorism bank_ account … …. filter_10 68822.. filter_ 1163 money_ laundering proactive_ query_10 68822.. proactive_ query _1168822
Modified Query Graph Original query graph Modified query graph Abdul_ Ramazi bank_ account related_to isa related _to account _owner banking_ transaction First Union National bank banking_ transaction isa related_to isa related_to isa transaction bank transaction bank
Outline • Problem • Motivation • Our approach • IPC Model • Hybrid Model • Empirical evaluation • Conclusion
Hybrid User Model • Motivation • Allows deeper influence on an IR system. • Adaptation using only a user’s information may not be helpful if a user is new to a domain • Insight information about an IR system may help a user get closer to his/her final searching goal
Hybrid User Model • Our approach: • Convert this problem into a multi-attribute decision problem • Determine a set of attributes: {I,P,C,Q,T,In,D,S} • Evaluate each outcome by effectiveness function: average precision at three point fixed recalls
Hybrid User Model • Our approach (continue) • Reduce the number of attributes, only Query (Q) and Threshold (T) are considered • Construct a value function over these two attributes: V(Q,T) = 1V1(Q) + 2V2(T) iff x2i x1i for all i=1,2 x2i> x1i for some i
Hybrid User Model • Sub value function for a query • Take advantage of literature on predicting query performance from IR • Initial sub value function (He and Ounis 04) • Update sub value function
Hybrid User Model • Sub value function over threshold (Boughanem 00)
Implementation of Hybrid User Model Query IPC Model Q1,Q2,…,Qm Computer V(Qi) Send Qi,T to search module Threshold preference Compute Threshold Update V(Q) V(T) Feedback
Outline Outline Outline • Problem • Motivation • Our approach • IPC Model • Hybrid Model • Empirical evaluation • Conclusion
Evaluation objectives • Does our user model capture a user’s intent accurately? • Does our user model improve a user’s effectiveness in an information seeking task?
Evaluation framework Evaluation Framework Accuracy Effectiveness Hypothetical User Real User
Evaluation of user model accuracy • Objective: determines how accurate a user’s intent has been captured by comparing models generated by humans and models generated by our system. • Procedures: • 5 graduate students • 10 queries from CACM collection on distributed computing and optimization • Each user filled our a questionnaire • For each query, each user generates a model from looking at the first 15 returned documents.
Evaluation of user model accuracy Profile of 5 participants
Evaluation of user model accuracy • Metrics:
Discussion • Context: • Similarity of Lexical (30.2%) is along the line with the work reported in (Maedche and Staab 2002) for similarity between two ontologies generated by humans. • Taxonomy similarity shows the differences between machine and humans. • Interests and Preferences are captured relatively accurately.
Evaluations with a hypothetical user (HEFS 04, AH04) • Metrics: precision, recall, average precision at three point fixed recall • Testbed: home-made medical database, Cranfield, CACM, Medline. • Procedures: standard and new. • Compare with Ide dec-hi using term frequency inverted document frequency (TFIDF) (Salton and Buckley 90) (Lopez-Pujalte et al 03)