630 likes | 727 Views
Data Placement Problems in Database Applications. An Zhu Stanford University. Data Placement. Data objects Multiple disks Assignment of objects to disks Optimize performance Optimize I/O Handle dynamic situations. Outline. Multimedia Systems [GKKTZ 00]
E N D
Data Placement Problems in Database Applications An Zhu Stanford University
Data Placement • Data objects • Multiple disks • Assignment of objects to disks • Optimize performance • Optimize I/O • Handle dynamic situations AZ
Outline • Multimedia Systems [GKKTZ 00] • Maximize the total clients served • Relational Database Layout [AFMPZ 03] • Minimize the combined I/O access time • Load Rebalancing Problem [AMZ 03] • Minimize the makespan within allowed moves AZ
Outline • Multimedia Systems[GKKTZ 00] • Maximize the total clients served • Relational Database Layout [AFMPZ 03] • Minimize the combined I/O access time • Load Rebalancing Problem [AMZ 03] • Minimize the makespan within allowed moves AZ
Multimedia Storage Systems • Movie objects • Clients/subscribers • Parallel disks • Limited storage: # of movies—Nj • Limited bandwidth: # of clients—Cj • Homogeneous system: Nj=k, Cj=L, j • Uniform ratio: Cj/Nj=r, j AZ
An Example 000/600 000/600 100 100 100 100 100 100 000/600 100 100 100 100 400 400 Total Storage: 12 , Total Capacity: 1800 AZ
An Example 400/600 000/600 100 100 000/600 100 100 100 100 400 400 Total Storage: 12 , Total Capacity: 1800 AZ
An Example 400/600 400/600 000/600 100 100 400 400 Total Storage: 12 , Total Capacity: 1800 AZ
Not All Clients Can be Satisfied 400/600 400/600 600/600 400 Total Satisfied Clients: 1400/1800=7/9 AZ
Sliding Window Algorithm • Consider one disk at a time • Maintain an ordered list of movies • The first consecutive k movies (or less) with at least L combined clients • Assign the first L clients to the disk and reconsider leftover clients AZ
An Example 100 000/600 000/600 100 100 100 100 100 100 000/600 100 100 100 100 400 400 Max window size k=4 AZ
An Example 200 000/600 000/600 100 100 100 100 100 100 000/600 100 100 100 100 400 400 Max window size k=4 AZ
An Example 400 000/600 000/600 100 100 100 100 100 100 000/600 100 100 100 100 400 400 Max window size k=4 AZ
An Example 400 000/600 000/600 100 100 100 100 100 100 000/600 100 100 100 100 400 400 Max window size k=4 AZ
An Example 000/600 000/600 100 100 100 100 100 100 000/600 100 100 100 100 400 400 700 Max window size k=4 AZ
An Example 600/600 000/600 100 100 100 100 100 100 000/600 100 0 0 0 100 400 Max window size k=4 AZ
An Example 600/600 000/600 100 100 100 100 100 100 000/600 100 100 400 Max window size k=4 AZ
An Example 600/600 600/600 100 100 100 100 100 100 400 000/600 Max window size k=4 AZ
An Example 600/600 600/600 100 100 400/600 Total Satisfied Clients: 1600/1800=8/9 AZ
Theoretical Bounds • Satisfies at least fraction of total clients • In the worst case, no algorithm can satisfy more clients • Translates to an -approximation • PTAS: (1+)-approximation, >0 AZ
Theoretical Bounds • Satisfies at least fraction of total clients • In the worst case, no algorithm can satisfy more clients • Translates to an -approximation • PTAS: (1+)-approximation, >0 AZ
Proof Sketch • Load vs. storage saturated: ML, MS • Least loaded disk: cL • ML+MS=M, 0<c<1 • All remaining movies each have no more than cL/k clients • Initial instance is feasible (w.l.o.g.) AZ
An Example 600/600 600/600 100 100 ML=2, MS=1, c=400/600 cL/k=100 400/600 Total Satisfied Clients: 1600/1800=8/9 AZ
Proof Outline • If there is a load saturated disk with less than k movies • All clients are satisfied • Otherwise • At most ML movies are left • Satisfy at least fraction of the clients AZ
Lemma • If any of the load saturated disk has less than k objects • Any k-1 remaining movies in the list has L clients or more AZ
Lemma • The remaining disks are all load saturated • So, all clients are satisfied At least L At least L AZ
Otherwise… • Each disk has exactly k movies • Total assigned movies: M·k • Initial movies: N M·k • “New” movies generated: ML • # of movies left: ≤ ML • # of clients/remaining movie: ≤ cL/k • Total # of remaining clients: cLML/k AZ
Otherwise… • Total clients: ≤ M·L • Assigned clients: ML·L + Ms·cL • Total # of remaining clients : ≤Ms·(1-c)L • Final bound: AZ
Simulation Results M=5 L=100 N=M·k Zipf with =0.0 ( i-1 ) AZ
Recap • The problem is NP-complete • PTAS: best possible approximation bound • : best possible absolute bound • Sliding window algorithm: practical with O((M+N)log(M+N)) running time AZ
Outline • Multimedia Systems [GKKTZ 00] • Maximize the total clients served • Relational Database Layout[AFMPZ 03] • Minimize the total I/O access time • Load Rebalancing Problem [AMZ 03] • Minimize the makespan within allowed moves AZ
Relational Databases • Objects: indexes, tables, views • Multiple disks • Minimize the total I/O access time AZ
Past Work • Full striping • Split uniformly across all available disks • Utilize I/O parallelism • : transfer rate 200MB 200MB =0.05s/MB,Tt=10s AZ
Past Work • Full striping • Split uniformly across all available disks • Utilize I/O parallelism • : transfer rate 200MB 50MB =0.05s/MB,Tt=10s =0.05s/MB,Tt=2.5s 50MB 50MB 50MB 50MB 50MB 50MB AZ
Past Work • Co-accessed objects with Random I/O • Seek time/per block size: 0.01s/0.1MB • Seek rate: =0.1s/MB • Smaller object dominates A Ts=50·2=10s 50MB 50MB 50MB 50MB B 100MB 100MB 100MB 100MB AZ
Past Work • Combined access time • Transfer time: Tt=(50+100)·=7.5s • Seek time: Ts=min(50,100)·=10s • Combined time: Tt+Ts=17.5s A 50MB 50MB 50MB 50MB B 100MB 100MB 100MB 100MB AZ
Past Work • Fully striping is no longer optimal [Agrawal Chaudhuri Das Narasayya03’] • Combined time: 200·=10s 200MB 200MB 100MB 100MB AZ
Data Layout Problem • Work Load (SQL DML) • A set of queries and/or updates • A set of co-accessed objects (pairwise) • Access stats (pairwise) • Minimize the estimated I/O access time AZ
Theoretical Questions • Approximation and its hardness • Transfer time: P • Seek time: Very Hard • Combined time • Hard • Minimizing transfer time alone is a “good” approximation AZ
Transfer Time • Heterogeneous disks • Different rate: j • Storage constraint: cj • Objects • Different size: si • Access frequency: i,i’ • Solvable using Linear Programming (LP) AZ
LP Amount of object i assigned to disk j Each object must be completely assigned Each disk’s storage limit is kept Transfer time for (i,i’) on disk j Overall transfer time for (i,i’) Minimize the total transfer time AZ
Seek Time • Hard even on disks with no storage constraint • Integral assignment • Each object is assigned to one machine only • Conversion from a fraction assignment with no loss AZ
Conversion • f( , )=1, f( , )=1, f( , )=0 • Total seek cost: 1002+1002 • Want: each file is spread uniformly across a subset of disks A B C B A C 100MB 150MB 200MB 200MB 100MB 100MB AZ
Conversion • f( , )=1, f( , )=1, f( , )=0 • Total seek cost: 1002+1002 • New cost: 1002+1252 A B C B A C 125MB 125MB 200MB 200MB 100MB 100MB AZ
Conversion • f( , )=1, f( , )=1, f( , )=0 • Total seek cost: 1002+1002 • New cost: 1002 A B C B A C 250MB 125MB 125MB 200MB 200MB 100MB 100MB AZ
Conversion • f( , )=1, f( , )=1, f( , )=0 • Total seek cost: 0 • Each file resides on only one disk A B C B A C 400MB 250MB 250MB 200MB 200MB 200MB 100MB 100MB AZ
Implications • A polynomial time algorithm • Equivalent to Minimum Edge Deletion k-Partition • NP-Hard to approximate: O(n2) • Forces combined time be hard to approximate AZ
Combined Time • Let • Hard to approximate: ·, 1>>0 • Optimize transfer time alone gives 1+ AZ
Outline • Multimedia Systems [GKKTZ 00] • Maximize the total clients served • Relational Database Layout [AFMPZ 03] • Minimize the combined I/O access time • Load Rebalancing Problem[AMZ 03] • Minimize the makespan within allowed moves AZ
Load Rebalancing • Access pattern changes • Initial layout no longer balanced MAX LOAD 1 3 6 9 7 4 10 2 8 5 11 AZ