Optimal Workload-Based Weighted Wavelet Synopsis

Optimal Workload-Based Weighted Wavelet Synopsis Yossi Matias Daniel Urieli School of Computer Science Tel Aviv University

Outline • Motivation • Background & Contributions • Wavelet synopses • Optimal WB weighted wavelet synopses

Compact Data Synopses “Transformed” Query Approximate Answer KB/MB FAST!! Approximate Query Processing OperationalDatabase SQL Query Long Response Times! Exact Answer GB/TB

Goals • Develop data synopses • Most accurate answers • Using a small amount of memory • Massive data sets efficient construction • Time • I/O

Data synopses • Samples: random samples, stratified samples, congressional samples, reservoir-sampling, backing samples, join synopses, sketches • [Olken-Rotem, Vitter, Alon-Matias-Szegedy, Gibbons-Matias-Poosala, Acharia et al…] • Used in commercial DB systems • Histograms: equi-depth, compressed, v-optimal, spline, multi-dimensional, dynamic, Max-diff, MHIST • [Poosala-Ionnidis, etc.] • Used in commercial DB systems • Wavelets synopses: basic, multi-dim, probabilistic, dynamic, extended • Adapts to nature of data effectively • [Matias-Vitter-Wang, Garafolakis-Gibbons, Chakrabarti et al, Rousopoulous-Kiotidis…] • Workload-based wavelet synopses [Matias, Portman]

Accuracy of various synopses

Workload-based synopses • Future queries correlated to past queries • Can be thought of as taken from a probability distribution roughly determined by the workload • Workload based synopses: optimized for a given query workload • “Standard” synopses assume uniform workload

Workload-based synopses – prior work • Workload-based sampling • Overcoming limitations of sampling for aggregation queries [Chaudhuri, Das, Datar, Motwani, and Narasayya] • Icicles: Self-tuning samples for approximate query answering [Ganti, Lee, Ramakrishnan] • Workload-based histograms • Self-tuning histograms [Aboulnaga and Chaudhuri] • ST-holes [ Bruno et al. ] • Hierarchical range histogram [Guha-Koudas-Srivastava-02] • Workload-based wavelets • By Yossi Matias and Leon Portman

Workload-Based Wavelet synopses [MP03] • Adapts effectively to a given query workload (not only to data) • Reduces the mean-squared-absolute / relative error over a workload of queries • Order magnitude improvement over prior wavelet synopses • Not necessarily optimal

Contributions • Optimal Workload-based Weighted Wavelet (WWW) synopses • WB-MSE (Workload-Based Mean Squared Error) • WB-MRE (Workload-Based Mean-squared Relative Error) • Equivalently, minimize the expected squared, absolute or relative error over a point query • First to minimize the MRE over the data • WB-MRE with uniform distribution • Both WWW synopses are optimal enhanced wavelet synopses • A generalized definition which allows coefficients with arbitrary values • Optimal cost construction • Linear construction time • I/O optimal

Techniques • Problem definition in terms of • Weighted norm • Weighted-inner-product • Weighted-inner-product-space • Weighted wavelets for building data synopses

Resolution Averages Detail Coefficients 3 [2, 2, 0, 2, 3, 5, 4, 4] ---- 2 [2, 1, 4, 4] [0, -1, -1, 0] 1 [1.5, 4] [0.5, 0] 0 [2.75] [-1.25] [2.75, -1.25, 0.5, 0, 0, -1, -1, 0] Haar wavelet decomposition • Wavelets: mathematical tool for hierarchical decomposition of functions/signals • Haar wavelets: simplest wavelet basis, easy to understand and implement • Recursive pair wise averaging and differencing at different resolutions. • A linear time algorithm.

+ 2.75 + - -1.25 0.5 0 0 -1 0 -1 + - + - + - + - + - + - 2 2 0 2 3 5 4 4 Wavelet error tree [MVW98] Original data

+ + - + - + - + - + - + - + - 1 -1 0 1 The Haar Basis

+ 2.75 + - -1.25 0.5 0 0 -1 0 -1 + - + - + - + - + - + - 2 2 0 2 3 5 4 4 Wavelet error tree [MVW98] How should we choose which coefficients to retain? 1 1 Original data

Parseval-based optimal thresholding • Given a vector with respect to some orthonormal basis • Goal: approximate the vector using only M << N basis coefficients • Then, choosing the largest M coefficients is optimal • Minimizes the L2 norm of the error vector

Haar Wavelet Synopses - summary • Compute Haar wavelet decomposition of D • Coefficient thresholding: only M<<|D| = N coefficients can be kept • Parseval-based thresholding • optimal w.r.t the MSE • Several other greedy heuristics exists

Given a synopsis S 3.5 3.5 -0.5 -1 0 -2 -1 0 0 -0.5 standard thresholding -0.707 0 0 -1 -0.5 0 WL2(S) 0.498 standard: 4 4 2 2 2 6 4 4 Importance: 0.001 0.001 0.001 0.001 0.249 0.249 0.249 0.249 Workload Example 2 2 2 6 3 5 4 4

0 3.5 -0.5 -1 0 0 -2 -1 WL2(S) 0.008 Importance: 0.001 0.001 0.001 0.001 0.249 0.249 0.249 0.249 Workload Example 3.5 Workload- based thresholding -0.5 -0.707 0 0 -1 -0.5 0 2 2 2 6 3 5 4 4 standard: 4 4 2 2 2 6 4 4 Workload based 5 4 2 2 4 4 3 4

Error definition • D = (d1,…,dN) - our data. • - the point query • - the approximated answer • abs-error: rel-error: • The purpose: reduce a norm of • For example:

Workload-based Error • A workload: (c1,…,cN), where ci is the probability that qi appears. • Given a workload W = (c1,…,cN) we define the Weighted L2 Norm: • When ci = 1/N: WL2(E) = MSE

Our goal • Minimizing the WL2 norm of the errors vector E • For given data set D and query workloads W • Equivalently: minimizing the expected squared error over a point query taken from a given distribution

Regular Haar transform Given a data set D = (d0,…,dN-1) D Haar Transform (HT) HT(D) standard thresholding wavelet synopsis

Parseval’s formula, the WL2 norm, the weighted inner product, and the algorithm for computing the WH basis from the workload Overview Given a data set D = (d0,…,dN-1) and a workload vector W = (c0,…,cN-1) W D WHB(W) Weighted Haar Basis (WHB( Weighted Haar Transform (WHT) WHT(D) standard thresholding WB – wavelet synopsis

x -y 0 1 The weighted Haar basis • The Weighted Haar Basis would also look like but

0 1 c0,c1,… , cN-1 Compute theWeighted Haar Basis • Meaning it would look more like: Recall the weight coefficients (the workload) W = (c0,…,cN-1) for D = (d0,…,dN-1)

Experimental results WB-MSE VS. STANDARD

Experimental results WB-MRE, ADAPTIVE, STANDARD

Experimental results WB-MRE, ADAPTIVE

Thank you!

Optimal Workload-Based Weighted Wavelet Synopsis

Optimal Workload-Based Weighted Wavelet Synopsis

Presentation Transcript

Synopsis

Directional Lifting-Based Wavelet Transform

Wavelet-Based Speech Enhancement

Wavelet-based Image Compression

Optimal Adaptive Wavelet Bases

Globally Optimal Wavelet-Based Motion Estimation using Interscale Edge and Occlusion Models

Constructing Optimal Wavelet Synopses

SYNOPSIS?

SYNOPSIS

Wavelet Based Image Coding

Wavelet-Based Network Traffic Modeling

Wavelet Based Color Compression

Wavelet-Based Speech Enhancement

Weighted Adaptive Lifting-based Wavelet Transform

Transport in weighted networks: optimal path and superhighways

Synopsis

Optimal SURE Parameters for Sigmoidal Wavelet Shrinkage

Constructing Optimal Wavelet Synopses

Synopsis

OGSA-based Grid Workload Monitoring

Wavelet-Based Speech Enhancement

Constructing Optimal Wavelet Synopses