320 likes | 342 Views
Explore innovative ranking strategies for performing keyword searches in relational databases to enhance search precision and user experience.
E N D
Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng (Binghamton University) Abdur Chowdhury (America Online, Inc.)
Effective Keyword Search in Relational Databases • Introduction • IR ranking in text databases • Our ranking strategy in RDBs • Experiments • Conclusions and future work SIGMOD 2006: Effective Keyword Search in Relational Databases
Introduction Why keyword search in relational databases? • We want to search text data in relational databases • SQL with the “contains” operator is not for non-expert users • Keyword search is tremendous successful in text database by ranking documents based on similarity. It is for non-expert users SIGMOD 2006: Effective Keyword Search in Relational Databases
Introduction • Text data in relational databases SIGMOD 2006: Effective Keyword Search in Relational Databases
Introduction Suppose a user is looking for albums titled “off the wall” SIGMOD 2006: Effective Keyword Search in Relational Databases
Introduction • Keyword search is very successful in text database by ranking documents based on similarity. Google, Yahoo and MSN search are the examples. So, let’s do keyword search in relational databases! (DBXplorer, BANKS, DISCOVER & IR-style DISCOVER, ObjectRank, Ranking Objects) SIGMOD 2006: Effective Keyword Search in Relational Databases
Introduction • Let’s do it, but how? • What are answers to be ranked? • How should we rank these answers? SIGMOD 2006: Effective Keyword Search in Relational Databases
Introduction -- an answer An answer for a given query Q: a tuple tree, in which every leaf node must have at least one keyword in Q. SIGMOD 2006: Effective Keyword Search in Relational Databases
Introduction • Use a slightly modified algorithm [DISCOVER] to produce all answers for a given query. SIGMOD 2006: Effective Keyword Search in Relational Databases
Introduction: Ranking • Our focus is on the effectiveness problem of ranking answers: the more relevant an answer is to the user query, the higher it should be ranked. SIGMOD 2006: Effective Keyword Search in Relational Databases
Introduction: Contributions • We identify four new factors that are critical to effective ranking and we propose a new ranking strategy • Design and conduct comprehensive experiments for the effectiveness problem • Experimental results show our strategy is significantly better than existing works in effectiveness SIGMOD 2006: Effective Keyword Search in Relational Databases
Effective Keyword Search in Relational Databases • Introduction • IR ranking in text databases • Our ranking strategy in RDBs • Experiments • Conclusions and future work SIGMOD 2006: Effective Keyword Search in Relational Databases
tf=2, ntf=1.53;tf=10, ntf=2.2; half: idf =0.69, 1/100, idf=4.6, 1/200,000, idf=12, s=0.2 1: ndl=1, half, ndl=0.9, 1/10:ndl = 0.8, 2: ndl=1.2, 10: ndl=2.8 3.3 IR Ranking • Q=(k1, k2, ..,kn), D is a document, Sim(Q,D) is the ranking score of D. SIGMOD 2006: Effective Keyword Search in Relational Databases
Effective Keyword Search in Relational Databases • Introduction • IR ranking in text databases • Our ranking strategy in RDBs • Experiments • Conclusions and future work SIGMOD 2006: Effective Keyword Search in Relational Databases
Our Ranking Strategy • T=(D1,D2,..Dn), so Sim(Q,D)Sim(Q,T) SIGMOD 2006: Effective Keyword Search in Relational Databases
Our Ranking Strategy • T=(D1,D2,..Dn), so Sim(Q,D)Sim(Q,T) SIGMOD 2006: Effective Keyword Search in Relational Databases
Our Ranking Strategy • Tuple Tree Size Normalization # of tuples in a tuple tree T SIGMOD 2006: Effective Keyword Search in Relational Databases
Document length of Di Average Document length of the text column of Di Our Ranking Strategy • Document Length Normalization Reconsidered SIGMOD 2006: Effective Keyword Search in Relational Databases
Our Ranking Strategy • Document Frequency Normalization SIGMOD 2006: Effective Keyword Search in Relational Databases
Our Ranking Strategy • T=(D1,D2,..Dn) • maxWgt is the maximum weight(k, Di) • sumWgt is the sum of weight(k, Di) SIGMOD 2006: Effective Keyword Search in Relational Databases
Our Ranking Strategy • T=(D1,D2,..Dn), so Sim(Q,D)Sim(Q,T) SIGMOD 2006: Effective Keyword Search in Relational Databases
Our Ranking Strategy • Schema Terms in Query • lyrics for How come by D12 • lusher the singer's lyrics to burn • Phrase-based Ranking • Using position information to boast phrase matching • Concept-based Ranking • Can improve effectiveness • Can assign semantics to answers SIGMOD 2006: Effective Keyword Search in Relational Databases
Effective Keyword Search in Relational Databases • Introduction • IR ranking in text databases • Our ranking strategy in RDBs • Experiments • Conclusions and future work SIGMOD 2006: Effective Keyword Search in Relational Databases
Experiments – data set • A Lyrics Database • 50 Queries from an AOL query log • Relevance Judgment: pooling + logs
Experiments: some queries • to me lyrics by lionel richie • inner smile texas lyrics • lionel richie lyrics • lionel richie lyrics you mean more to me • avril lavigne lyrics for the album under this skin • avril lavigne lyrics
Experiments – measure • Reciprocal rank: measures how good the system is to return the first relevant answer. • MAP (mean average precision): A precision is computed after each relevant answer is retrieved. Then we average all precision values to get a single number to measure the overall effectiveness.
Experiments – results • Our ranking strategy: the four new factors.
Experiments – results • Comparison with related works
Effective Keyword Search in Relational Databases • Introduction • IR ranking in text databases • Our ranking strategy in RDBs • Experiments • Conclusions and future work SIGMOD 2006: Effective Keyword Search in Relational Databases
Conclusions • Effectiveness is as important as efficiency • The four new factors are critical to search effectiveness • Our strategy is significantly more effective than related works SIGMOD 2006: Effective Keyword Search in Relational Databases
Future Work • Utilize link analysis • Combine non-text columns • Efficiency Problem • More real world data sets SIGMOD 2006: Effective Keyword Search in Relational Databases
Questions ? SIGMOD 2006: Effective Keyword Search in Relational Databases