Detecting Attribute Dependencies from Query Feedback

IDS Lab. Winter Seminar Detecting Attribute Dependenciesfrom Query Feedback VLDB 2007 Peter J. Haas, Volker Markl (IBM Almaden Research Center, USA) Fabian Hueske(univ. of Ulm, Germany) Summarized by Jung-yeon, Yang Presented by Jung-yeon, Yang 2008.01.25

Introduction • Real-world datasets typically exhibit strong and complex dependencies between data attributes. • Discovering this dependency structure is important in a variety of settings • Commercial database systems currently use knowledge about dependent attributes for statistics configuration • Approaches to detection and ranking • Proactive • reactive Center for E-Business Technology

The Problem • Detecting Dependent Attributes • Example : Color and Year are independent if • F(P) = fraction of rows in table that satisfy predicate P • Dependence = “significant” departure from independence • Detection needed for automatic statistics configuration in query optimizers • Which multivariate statistics should we keep? • Need to rank the dependencies (limited space budget) • Other uses include • Schema discovery for data integration • Data mining (dependency diagrams) • Root-cause analysis and system monitoring Center for E-Business Technology

Previous approaches : Proactive • CORDS [IMH+, SIGMOD `04] • Sample the relation(or view) and compute a contingency table • Compute (robust) chi-squared statistic • Declare dependency if • Both t and sample size chosen using chi-squared theory Center for E-Business Technology

Previous approaches : Reactive • Focus system resources on interesting attributes • Complement proactive approaches • Can exploit DB2 feedback warehouse Center for E-Business Technology

A Spectrum of Reactive Approaches Center for E-Business Technology

Reactive approach • Correlation Analyzer • Use multiple observations (actuals) for each attribute pair • O1 = {(blue,2005):0.02, (blue)0.2, (2005):0.103} • O2 = {(red,2006):0.07, (red)0.82, (2006):0.11} • etc • Computes ratio for each pair and compares to • O1 : • O2 : • Attribute dependency if two or more observations look dependent • Problems • Ad hoc procedures, wasted information • Unstable: depends on amount, ordering of feedback Center for E-Business Technology

New Reactive approach • Dependency Discovery • Like CORDS, but uses incomplete contingency table with exact entries • Critical value u from extension of classical chi-squared theory • Normalize HM to get ranking metric Center for E-Business Technology

New Reactive approach • The HM Statistic • Similar to chi-square • Choosing the Threshold u Center for E-Business Technology

New Reactive approach • Missing Feedback • Assume (rough) upper bound on relative error of estimate • Can obtain from feedback-warehouse records • Handling Inconsistency • Problem : No full multivariate frequency distribution consistent with feedback • Records collected at different time points • Insert/Delete/Updates in between feedback observations • Solution method 1 : use timestamps to resolve conflicts • Solution method 2 : linear programming • Obtain minimal adjustment of frequencies needed for consistency Center for E-Business Technology

New Reactive approach • Ranking Attribute Pairs • Heuristic normalizations HM / z • Table Cardinality • Minimal number of distinct values • Degrees of freedom of chi-squared distribution • * 0.995 Quantile of Center for E-Business Technology

Experiment • Normalization Constants Center for E-Business Technology

Experiment • Ranking vs Amount of Feedback Center for E-Business Technology

Experiment • Dependency Measure vs Amount of Feedback Center for E-Business Technology

Experiment • Execution Time Center for E-Business Technology

Conclusions • Query feedback is an effective way to detect dependence • Chi-squared extension to implement detection • Attributes can be in multiple tables • Effective ranking methods • Practical solutions for handling inconsistent or missing feedback • Acceptable performance using sampling and incremental maintenance Center for E-Business Technology

Detecting Attribute Dependencies from Query Feedback

Detecting Attribute Dependencies from Query Feedback

Presentation Transcript

LEARNING FROM FEEDBACK

Relevance Feedback and Query Expansion

Learning From Feedback

Analyzing Attribute Dependencies

Lab 7: Attribute SQL- Query the database

Feedback from Yesterday

Mining Functional Dependencies from Data

Improving Retrieval Accuracy in Web Databases Using Attribute Dependencies

Query Chains: Learning to Rank from Implicit Feedback

WhittleSearch : Image Search with Relative Attribute Feedback

Feedback from AST

Dependencies

Feedback from ACDC3

Concept lattices constrained by attribute dependencies

Feedback from CTC

FEATURE Attribute Attribute

Query Relevance Feedback and Ontologies

Query Reformulation: User Relevance Feedback

Attribute Control Chart From

feedback from WWW6

Relevance feedback using query-logs

Query Chains: Learning to Rank from Implicit Feedback