130 likes | 211 Views
MauveDB: Model-based User Views. Problem. Databases are unusable for scientific data Data are incomplete, imprecise, and erroneous Need to be filtered/synthesized using models Scientists use the in the most rudimentary ways As a backing store for raw data Run few or no queries
E N D
Problem • Databases are unusable for scientific data • Data are incomplete, imprecise, and erroneous • Need to be filtered/synthesized using models • Scientists use the in the most rudimentary ways • As a backing store for raw data • Run few or no queries • User-define functions are inadequate • Static models, insufficient for many applications • Let’s discuss this later?
Approach • Define user-views based on a model syntax • Extend traditional SQL-view model • User views provide access to synthesized data • Data independence • Present stable view of system • When sites don’t report data (missing values) • When network changes • Report data at different locations than sampled • View maintenance • Issues of whether to materialize or not
Processing Scientific Data • Without Model-based views • Export to Matlab then apply models • Use custom, programmatic querying tools • Can’t use SQL • Getting data back into database is awkward and inefficient • With Model-based views • Self-updating models as data changes • Standard SQL data against synthesized data
Example • Benefits • Network changes are transparent • Spatial or temporal biases removed (e.g., for aggregates) • What about model errors?
View Creation: Regression • Select a virtual grid on which data are reported • Using MatLab style syntax • Create a unique model at each time T
View Creation: Interpolation • Interpolate missing values from nearby sites
The AS Clause • AS clause specifies each model • AS FIT • AS INTERPOLATE • Probably needs extended syntax for models methods • INTERPOLATE with splines, nearest neighbor, regression • User-views are only as flexible as models pre-programmed into the syntax • How does this compare with UDFs, table valued functions? • Is this the appropriate level for this kind of customization?
View Maintenance • Options • Logical: build results for each query • Materialized: pre-compute all results for each model • Partial/Cached: store results generated by queries • Model-based: often models have fixed costs • Building basis functions, matrix inversions, linear solutions • Tradeoff between query latency and overhead • Is implementing model logic at such a low level reasonable?
Outcomes/Opinions • Is MauveDB the technology that will make scientists use databases?