100 likes | 180 Views
Explore the benefits of Adaptive Loading for optimizing query performance in flat file systems, avoiding upfront overheads of traditional databases. Learn to iterate data loading, tuning, and schema definition for cost-effective data management solutions.
E N D
Here are my Data Files. Here are my Queries. Where are my Results? StratosIdreos* IoannisAlagiannis‡ Ryan Johnson§ Anastasia Ailamaki‡ §University of Toronto ‡ÉcolePolytechniqueFédérale de Lausanne *CWI, Amsterdam
CERN ($20B physics experiment) • Last year: 35PB! • Experiments, simulation, user data… • All stored in flat files • Database only stores metadata • Custom solutions & scripts • Almost never a DBMS Why???
Why people don’t use DBMS? Requirements Analysis Define a schema Load the data Iterate to convergence Tune the system Evolving requirements => no convergence
Data import & tuning Massage Data Load Tuples DBMS owns the data now Flat Files Why wait? Why complete load? Database Which format? Hire DB expert? Not worth the startup cost
Avoiding up-front overheads Flat File Flat files an integral part of the system Hot data Query over flat files Adaptive loads Tuning in background DBMS actions driven by workload
Adaptive loading Flat File Metadata ColumnLoad Loaded Columns: a2 a3 Partial Load Full Load Metadata Loaded Parts: a2 a3 Storage
Dynamic file adaptation New Flat Files a) Parse only needed columns b) New flat file per attribute Original Flat File Analyze non-tokenized attributes
Adaptive loading in practice Q1: Loading Cost + First Query Constant performance for all queries Q11: load from FF Filtering on-the-fly Q1: half the cost On-the-fly load Cache data select sum(a1), avg(a2) from R where a1<v1 and a2<v2 Amortize loading cost over the query sequence
Towards a fully autonomous system Give me your queries Give me your data as is Get your results! Adaptive Load Adaptive Data Store Adaptive Kernel Invisible DBMS (supports SQL + your tools) grep, awk Challenge: make this invisible