1 / 15

Presented by Archana vijayalakshmanan 4/11/2006

Automated Ranking of Database Query Results Sanjay Agarwal, Surajit Chaudhuri, Gautam Das, Aristides Gionis. Presented by Archana vijayalakshmanan 4/11/2006. Contents. Introduction Different ranking functions Breaking ties Implementation Conclusion. Introduction.

ciqala
Download Presentation

Presented by Archana vijayalakshmanan 4/11/2006

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automated Ranking of Database Query ResultsSanjay Agarwal, Surajit Chaudhuri, Gautam Das, Aristides Gionis Presented by Archana vijayalakshmanan 4/11/2006

  2. Contents • Introduction • Different ranking functions • Breaking ties • Implementation • Conclusion

  3. Introduction • Automated ranking of the results of the query is popular aspect of IR. • Database system support only a boolean query model. • Empty answers • Many answers • Automated ranking of query results is taking user query and mapping to Top-K query with ranking function.

  4. Automated Ranking functions for the ‘Empty Answers Problem’ • IDF Similarity • QF Similarity • QFIDF Similarity

  5. w IDF Similarity <attribute,value> tuple d • Database(only categorical attribute) T=<t1,……tm> • Q=<q1,…...qm> Condition is “WHERE is A1=q1” • IDFk(t)=log(n/Fk(t)) • n-number of tuples in database • Fk(t) -Frequency of tuples in database where Ak=t • Similarity between T and Q is • Sum of corresponding similarity coefficients over all attributes • dot product is un-normalized • TF is irrelavant • Similarity function known as IDF similarity • Eg query={CONVERTIBLE,NISSAN} • IR technique Q=set of key words IDF(w)=log(N/F(w)) TF(w,d)=Frequency of occurance of w in d Cosine similarity between query and document is normalized dot product of the two corresponding vector Similarity function known as cosine similarity with TF-IDF weightings

  6. Generalizations of IDF similarity • For numeric data • Inappropriate to use previous similarity coefficients. • frequency of numeric value depends on nearby values. • Discretizing numeric to categorical attribute is problematic. • Solution: • {t1,t2…..tn} be the values of attribute A.For every value t, sum of”contributions” of t from every other point ti contributions modeled as gaussian distribution • Similarity function is bandwidth parameter • For range/set of values

  7. QF Similarity • Importance of attribute values is determined by frequency of their occurence in workload • For categorical data • query frequency QF(q)= rawfrequency of occurrence of value q of attribute A in query strings of workload (RQF(q) raw frequency of most frequently occuring value in workload (RQFMax) • s(t,q)= QF(q), if q=t 0 , otherwise • Similarity between pairs of different categorical attribute values can also be derived from workload eg. To find S(TOYOTA,HONDA), • Analyzing IN clauses of queries: If certain pair of values often occur together in the workload ,they are similar .e.g queries with C as “MFR IN {TOYOTA,HONDA,NISSAN}” • Several recent queries in workload by a specific user repeatedly requesting for TOYOTA and HONDA.

  8. QFIDF Similarity • QF is purely workload-based. Big disadvantage for insufficient or unreliable workloads. • For QFIDF Similarity • S(t,q)=QF(q) *IDF(q) when t=q where QF(q)=(RQF(q)+1)/(RQFMax+1). • Thus we get small non zero value even if value is never referenced in workload model

  9. Breaking ties • Problem: Many tuples may tie for the same similarity score and get ordered arbitarily.Arise in empty and many answers problem. • Solution: Determine the weights of missing attribute values that reflect their “global importance” for ranking purposes by using workload information. • Extend QF similarity ,use quantity to break ties. • Extending IDF similarity by using IDF values presents challenges.

  10. Implementation • Pre-processing component • Query–processing component

  11. Pre-processing component • Compute and store a representation of similarity function in auxiliary database tables. • For categorical data, compute IDF(t) (resp QF(t)) ,to compute frequency of occurences of values in database and store the results in auxillary database tables. • For numeric data, an approximate representation of smooth function IDF() (resp(QF()) is stored, so that function value is retrieved at runtime.

  12. Query processing component • Main task: Given a query Q and an integer K, retrieve Top-K tuples from the database using one of the ranking functions. • Ranking function extracted in pre-processing phase. • SQL-DBMS for solving top-K problem. • Handling simpler query processing problem • Input: table R with M categorical columns, Key column TID, C is conjunction of form Ak=qk..... and integer K. • Output: top-K tuples of R similar to Q. • Similarity function: Overlap Similarity.

  13. Implementation of Top-K operator • Traditional approach • Indexed based approach • overlap similarity function satisfies the following monotonic property. Adapt TA algorithm If T and U are two tuples such that for all K, Sk(tk,qk)< Sk(uk,qk) then SIM(T,Q) < SIM(U,Q) • To adapt TA implemented Sorted and random access methods. • Performs sorted access for each attribute, retrieve complete tuples with corresponding TID by random access and maintains buffer of Top-K tuples seen so far.

  14. Indexed-based TA(ITA) Sorted access Random access

  15. Conclusion • Thus TF-IDF based techniques were extended to numerical and mixed data. • Workload tracking was used as a weak form of collaborative filtering.

More Related