640 likes | 814 Views
Analyzing Retrieval Models using Retrievability Measurement. Shariq Bashir Supervisor: ao. Univ. Prof. Dr. Andreas Rauber Institute of Software Engineering and Interactive Systems Vienna University of Technology. bashir@ifs.tuwien.ac.at http://www.ifs.tuwien.ac.at/~bashir/. Outline.
E N D
Analyzing Retrieval Models using Retrievability Measurement Shariq Bashir Supervisor: ao. Univ. Prof. Dr. Andreas Rauber Institute of Software Engineering and Interactive Systems Vienna University of Technology bashir@ifs.tuwien.ac.at http://www.ifs.tuwien.ac.at/~bashir/
Outline • Introduction to Retrievability (Findability) Measure • Setup for Experiments • Findability Scoring Functions • Relationship between Findability and Query Characteristics • Relationship between Findability and Document Features • Relationship between Findability and Effectiveness Measures
Introduction • Retrieval Systems are used for searching information • Rely on retrieval models for ranking documents • How to select best Retrieval Model • Evaluate Retrieval Models • State of the Art • Effectiveness Analysis, or • Efficiency (Speed/Memory)
Effectiveness Measures • (Precision, Recall, MAP) depends upon • Few topics • Few judged documents • Suitable for precision oriented retrieval task • Less suitable for recall oriented retrieval task • (e.g. patent or legal retrieval)
Findability Measure • Considers all documents • The goal is to maximize the findability of documents • Documents in Retrieval Model having higher findability are more easy to find than Retrieval Model having lower findability • Applications • Offers another measure for comparing Retrieval Models • Subset of documents that are hard or easy to find
Findability Measure • Factors that affect Findability • User Query • [Query = Data Mining books] vs [Query = Han Kamber books] • for searching book “Data Mining Concepts and Techniques” • The maximum number of top links/docs checked • The ranking strategy of Retrieval Models
Retrievability Measure [Leif Azzopardi and Vishwa Vinay, CIKM 2008] Given a collection D of documents, and query set Q retrievability of dD kdq rank of dD in the result set of query qQ c the point in rank list where user will stop f(kdq,c) =1 if kdq<= c, and 0 otherwise Gini-Coefficient = Summarize findability scores
Outline • Introduction to Findability Measure • Setup for Experiments • Retrievability Scoring Functions • Relationship between Findability and Query Characteristics • Relationship between Findability and Document Features • Relationship between Findability and Effectiveness Measures
Setup for Experiments • Collections • TREC Chemical Retrieval Track Collection 2009 (TREC-CRT) • USPTO Patent Collections • USPC Class 433 (Dentistry) (DentPat) • USPC Class 422 (Chemical apparatus and process disinfecting, deodorizing, preserving, or sterilizing) (ChemAppPat) • Austrian News Dataset (ATNews) TREC-CRT, ATNews are more skewed USPTO Collections are less skewed
Setup for Experiments • Retrieval Models • Standard Retrieval Models • TFIDF, NormTFIDF, BM25, SMART • Language Models • Jelinek-Mercer Smoothing, Dirichlet Smoothing (DirS), Two-Stage Smoothing (TwoStage), Absolute Discounting Smoothing (AbsDis) • Query Generation • All sections of Patent documents • Terms removed with document frequency (df) > 25% • All term combinations of 3- and 4-terms
Setup for Experiments 52 443 583 746 962 1474 Docs. Ordered by Increasing Vocabulary Size 5 101 155 198 255 427 Docs. Ordered by Increasing Vocabulary Size TREC-CRT ATNews 243 597 690 776 895 Docs. Ordered by Increasing Vocabulary Size 284 381 426 463 504 559 866 Docs. Ordered by Increasing Vocabulary Size DentPat ChemAppPat
Outline • Introduction to Retrievability Measure • Setup for Experiments • Findability Scoring Functions • Relationship between Findability and Query Characteristics • Relationship between Findability and Document Features • Relationship between Findability and Effectiveness Measures
Findability Scoring Functions • Standard Findability Scoring Function • Does not consider the difference in Docs. vocabulary size • Biased towards long documents • With r(d), Doc2 has higher Findability than Doc5 • But, due to small vocabulary size Doc5 does not have larger query subset All 3-Terms combinations Findability Percentage Doc2 = 3600/6545 = 0.55 Doc5 = 90/120 = 0.75
Findability Scoring Functions • Normalize Findability • Normalize r(d) relative to number of Queries generated from d • This will account for the difference between doc lengths • (d) queries generated from d
Findability Scoring Functions • Comparison between r(d) and r^(d) • Retrieval ordered by Gini-Coefficients (Retrieval Bias) • Findability Ranks of Documents
Findability Scoring Functions • Correlation between r(d) and in Terms of Gini-Coefficients Retrieval Models are ordered by r(d) and r^(d) ChemAppPat TREC-CRT
Findability Scoring Functions • Correlation between r(d) and in Terms of Documents Findability Ranks • TREC-CRT and ATNews • The correlation between r(d) and is low (high difference) • Due to large difference between document lengths • ChemAppPat and DentPat • The correlation between r(d) and is high (low difference) • Due to not large difference between document lengths Correlation between r(d) and r^(d) Back
Findability Scoring Functions • Which Findability Functions is better (r(d) or r^(d) ). • On Gini-Coefficient it is difficult to decide Ordered the documents based on findability scores and then partitioned into 30 Buckets . . . . . Bucket 1 Bucket 2 Bucket 30 Low Findability Buckets <---------------------------------> High Findability Buckets . . . . . 40 Random Docs (Known Items) 40 Random Docs (Known Items) 40 Random Docs (Known Items) One Query/Document between 4 – 6 length One Query/Document between 4 – 6 length One Query/Document between 4 – 6 length . . . . . The goal is to search known-item using its own Query Effectiveness of Known-Items is measured through Mean Reciprocal Rank (MRR) Low MRR Effectiveness <-------------Expected Results-------> High MRR Effectiveness
Retrievability Scoring Functions • Which Findability Functions is better (r(d) or r^(d) ). • Expected Results • High findability buckets should have high effectiveness, since they are easy to findable than low findability buckets • Positive correlation with MRR • r^(d) buckets have good positive correlation with MRR than r(d) Correlation between Findability and MRR TREC-CRT ChemAppPat
Outline • Introduction to Findability Measure • Setup for Experiments • Findability Scoring Functions • Relationship between Findability and Query Characteristics • Relationship between Findability and Document Features • Relationship between Findability and Effectiveness Measures
Query Characteristics and Findability Current Findability Analysis Style • Queries do not have similar quality • Some queries are very specific (target oriented) than others • What is the effect of query quality on Findability • Need to analyze Findability with different query quality subsets Findability Score of Documents Q = Query Set GINI-Coefficients • Creating Query Quality Subsets • Supervised Quality Labels: We do not have supervised labels • Query Characteristics (QC): • Query Result List size • Query Term Frequencies in the Documents • Query Quality based on Query Performance Prediction Methods For each QC, large query set is partitioned into 50 subsets
Query Characteristics and Findability • Query Subsets with Query Quality Query Subset 1 = Findability Analysis Query Quality is predicted Simplified Clarity Score (SCS) [He & Ounis SPIRE 2004] Q ordered by SCS score And Partitioned into 50 Subsets Query Subset 2 = Findability Analysis Q = Query Set . . . Query Subset 50 = Findability Analysis • X-Axis = Query Subsets ordered by Low SCS score to High SCS score • Y-Axis = Gini-Coefficients • Low SCS scores Subsets = High Gini-Coefficients • High SCS scores Subsets = Low Gini-Coefficients TREC-CRT Collections
Outline • Introduction to Findability Measure • Setup for Experiments • Retrievability Scoring Functions • Relationship between Findability and Query Characteristics • Relationship between Findability and Document Features • Relationship between Findability and Effectiveness Measures
Document Features and Findability Large Processing Time Large Computation Resources Query Processing Findability Analysis Can we predict Findability without processing exhaustive set of queries Does not require heavy Processing Only predict Findability Ranks Can’t predict Gini-Coefficients Relationship between Document Features and Findability Scores
Document Features and Findability • The following three classes of document features are considered • Surface Level features • Based on (Term Frequencies within Documents) and (Term Document Frequencies within Collection) • Features based on Term Weights • Based on the Term Weighting strategy of retrieval model • Density around Nearest Neighbors of Documents • These features are based on the density around nearest neighbors of documents
Document Features and Findability Surface Level Features
Document Features and Findability TREC-CRT ChemAppPat
Document Features and Findability • Combining Multiple Features • No feature performs best for all collections and for all retrieval models • Worth to analyze to what extent combining multiple features increases the correlation • Regression Tree, 50%/50% training/testing splitting Correlation by combining multiple features Correlation with best single feature % of increase in correlation
Outline • Introduction to Findability Measure • Setup for Experiments • Findability Scoring Functions • Relationship between Findability and Query Characteristics • Relationship between Findability and Document Features • Relationship between Findability and Effectiveness Measures
Relationship between Findability and Effectiveness IR • Automatic Retrieval Models Ranking • Tuning/Increasing Retrieval Model Effectiveness on the basis of Findability Measure Effectiveness Measures (Recall, Precision, MAP) Findability Measure Goal: Maximizing Findability Does not need Relevance Judgments Goal: Maximizing Effectiveness Depends upon Relevance Judgments Does any relationship exist between both? Maximizing Findability -> Maximizing Effectiveness If relationship exists
Relationship between Findability and Effectiveness • Retrieval Models • Standard Retrieval Models and Language Models • Low Level Features of IR (tf, idf, doc length, vocabulary size, collection frequency) • Term Proximity based Retrieval Models
Relationship between Findability and Effectiveness • Correlation exists • Not perfect, but retrieval models having low retrieval bias consistently appear in at least top half of the ranks Correlation = 0.80 0.75 0.80 0.73
Relationship between Findability and Effectiveness • Tuning Parameter values over Findability • Retrieval Models contain parameters • Controls the query term normalization or smooth the document relevance score in case of unseen query terms • We tune the parameter values over findability • Examine this effect on Gini-Coefficient and Recall/Precision/MAP
Relationship between Findability and Effectiveness Parameter b values are changed between 0 to 1
Relationship between Findability and Effectiveness For JM Parameter values are changed between 0 to 1
Relationship between Findability and Effectiveness • Evolving Retrieval Model using Genetic Programming and Findability • Genetic Programming branch of soft computing • Helps to solve exhaustive search space problems Genetic Programming Retrieval Features Repeat until 100 generations complete Initial population Selecting Best Retrieval Model (Findability Measure) Randomly Combine IR Features Next Generation Recombination (Crossover, Mutation)
Relationship between Findability and Effectiveness • Evolving Retrieval Model using Genetic Programming and Findability • Solution (Retrieval Model) are represented with Tree structure. • Nodes of trees either operators (+,/,*) or ranking features • Ranking Features • Low Level Retrieval Features • Term Proximity based Retrieval Features • Constant Values (0.1 to 1) • 100 Generations are evolved with 50 solutions per generation
Relationship between Findability and Effectiveness • Evolving Retrieval Model using Genetic Programming and Findability • Two correlation analysis are test • (1) Relationship between Findability and Effectiveness on the basis of fittest individual of each generation • (2) Relationship between Findability and Effectiveness on the basis of average fitness of each generation
Relationship between Findability and Effectiveness • Evolving Retrieval Model using Genetic Programming and Findability • (First): Relationship between Findability and Effectiveness on the basis of Fittest individual of each generation
Relationship between Findability and Effectiveness • Evolving Retrieval Model using Genetic Programming and Findability • (Second): Relationship between Findability and Effectiveness on the basis of Average Fitness of each generation • Generations having low average Gini-Coefficient also have high effectiveness on Recall@100
Conclusions • Findability focuses on all documents not set of few judged documents • We propose normalized findability scoring function that produces better findability rank of documents • Analysis between findability and query characteristics • Different ranges of query characteristics have different retrieval bias • Analysis between findability and document features • Suitable for predicting document findability ranks • Relationship between findability and effectiveness • Findability can be used for automatic ranking • Used to find tune IR systems in un-supervised manner
Future Work • Query Popularity and Findability • We are not differentiating between popular and unpopular queries • Visualizing Findability • Documents that are high findable with one model • Documents that are high findable with multiple models • Documents that are not findable with all models • Effect of Retrieval Bias in K-Nearest Neighbor classification • High Findable samples also affect the classification voting in K-NN
Gini-Coefficient Gini-Coefficient calculates retrievability inequality between documents. Also represents retrieval bias. Provides bird-eye view. If G = 0, then no bias, If G = 1, then only one document is Findable, and all other document have r(d) = 0. Back
Findability Scoring Functions TREC-CRT ChemAppPat
Document Features and Retrievability • Features based on Term Weights • Terms of Documents are weighted by retrieval model. • Then terms are added into inverted lists. • Term weights in the inverted lists are sorted by decreasing score.
On high skewed collections, these features have good correlation. On less skewed collections, these features do not have good correlation. This may be because, in less skewed collection the term weights of documents are less extreme due to almost similar doc lengths. Document Features and Retrievability TREC-CRT ChemAppPat Back
Document Features and Retrievability • Document Density based Features • These feature are based on average density of the k-nearest neighbors of documents. • k is used with 50,100, and 150. • Density is also computed with all terms of a document and top 40 (high frequency) terms of a document.