190 likes | 341 Views
Reverse Data Management. … and the case for Reverse What-If queries. Alexandra Meliou, Wolfgang Gatterbauer , Dan Suciu. http:// db.cs.washington.edu /causality/. Forward and Backward Paradigm. Forward transformations.
E N D
Reverse Data Management … and the case for Reverse What-If queries Alexandra Meliou, Wolfgang Gatterbauer, Dan Suciu http://db.cs.washington.edu/causality/
Forward and Backward Paradigm Forward transformations e.g: query processing, data integration, data mining, clustering, indexing Source data Target data http://db.cs.washington.edu/causality/
Forward and Backward Paradigm Forward transformations e.g: query processing, data integration, data mining, clustering, indexing Source data Target data Backward transformations e.g: data cleaning, provenance, causality, data generation, view updates Reverse Data Management (RDM) http://db.cs.washington.edu/causality/
The Problem Space of RDM Target Data Specific data instance, or diffs between versions e.g. before and after a view update explicit specification Described indirectly, through constraints and statistics e.g. declarative data generation implicit specification http://db.cs.washington.edu/causality/
The Problem Space of RDM Target Data Source data needs to be modified in order to achieve the desired effect in the output e.g. view updates No source data is provided as a reference, but needs to be computed from scratch e.g. inverse schema mappings explicit specification implicit specification Source Data no source reference source http://db.cs.washington.edu/causality/
The Problem Space of RDM modify the source data, to achieve the desired effect, while minimizing side-effects Target Data explicit specification implicit specification Source Data no source reference source http://db.cs.washington.edu/causality/
The Problem Space of RDM Target Data trace the source tuples that correspond to the target tuples of interest explicit specification implicit specification Source Data no source reference source http://db.cs.washington.edu/causality/
The Problem Space of RDM repair a data instance in order to satisfy a constraint Target Data explicit specification implicit specification Source Data no source reference source http://db.cs.washington.edu/causality/
The Problem Space of RDM Target Data explicit specification implicit specification Source Data no source reference source http://db.cs.washington.edu/causality/
Introducing Reverse What-If Queries Target Data explicit specification Reverse What-If or How-To queries implicit specification Source Data no source reference source http://db.cs.washington.edu/causality/
Hypothetical (What-If) Queries • Example from [Balmin et al. VLDB 2000] • “An analyst of a brokerage company wants to know whatwould be the effect on the return of customers’ portfolios ifduring the last 3 years they had suggested Intel stocks instead of Motorola” Change something in the source (hypothesis) Observe the effect in the target forward How would the target data change, given a change in the source? http://db.cs.washington.edu/causality/
Reverse What-If, or How-To queries • Modified example: • “An analyst wants to figure out how toachieve a 10% return in customer portfolios, with the least number of trades” Find changes to the source that achieve the desired effect Declare a desired effect in the target reverse What is the best hypothetical scenario that achieves the desired outcome? http://db.cs.washington.edu/causality/
Example (constraints) • Company reorganization: A company going through financial strain wants to reduce operational costs by 10%, through: • lay-offs, salary decreases, or department and project merging, • within certain constraints specified by the company’s requirements: • any salary decreases should be uniform across employees of the same department, • every project should have at least a certain number of employee hours devoted to it, • the solution should be achieved with the minimum number of employee reassignments (variables) (constraints) (optimization objective) http://db.cs.washington.edu/causality/
Declarative Problem Specification CREATE CONSTRAINT Constr1 AS NOT EXISTS (SELECT ok, sum(quant’) AS c FROM LineItem_N GROUP BY ok HAVING c > 100) Variable Definitions Problem constraints CREATE OBJECTIVE Obj1 AS SELECT sum(*) FROM (SELECT quant – quant’ FROMLineItemas L1, LineItem_N as L2 WHERE L1.ok = L2.ok, AND L1.pk = L2.pk AND L1.sk = L2.sk) Optimization criterion CREATE REPLACEMENT LineItem_N AS (SELECT ok, pk, sk, VAR(quant) AS quant’ FROM LineItem) HOW TO minimize(Obj1) SUBJECT TO Constr1 Problem statement query http://db.cs.washington.edu/causality/
System Architecture How-To Engine variables constraints objectives How-To parser How-To evaluation How-To query How-To answer DB http://db.cs.washington.edu/causality/
System Architecture How-To Engine variables constraints objectives How-To parser How-To evaluation How-To query How-To answer DB • User Input: • Support variable, constraint and objective specifications • Maintain declarativity http://db.cs.washington.edu/causality/
System Architecture How-To Engine variables constraints objectives How-To parser How-To evaluation How-To query How-To answer DB • Evaluation requirements: • Efficiency! http://db.cs.washington.edu/causality/
Evaluation User Input 100 How-To Evaluation LP/IP transformation LP/IP Solver Map LP/IP solution to data LP reduction How-To answer DB http://db.cs.washington.edu/causality/
Conclusions • Reverse Data Management • Encompasses many important database problems • Harder in general: the inverse of a function is not always a function • How-To queries (reverse what-if) • Implement optimization problems within a DBMS • Plenty of challenges: • Declarative input specification • Efficient evaluation • Optimization (combination of Integer Prog. and DB techniques) • Under-specified and over-specified problem handling • Solution “stability” and “sensitivity” http://db.cs.washington.edu/causality/