110 likes | 238 Views
Storage Issues. Last Week. Deduplication storage Read performance is critical to reconstruct the original data stream. Problem. Assume that data streams are stored in different disks. After deduplication , every chunk is unique.
E N D
Last Week • Deduplication storage • Read performance is critical to reconstruct the original data stream.
Problem • Assume that data streams are stored in different disks. • After deduplication, every chunk is unique. • Decide a chunk deployment such that the number of cross-disk access of every data stream during reconstruction is almost the same. A B C
Examples Ca = 5 Ca = 0 Avg. = 2 Avg. = 2 A A Cb= 4 Cb= 0 Std. = √(8 / 3) Std. = √(14 / 3) B B Cc = 2 Cc = 1 C C
Examples(Cont.) Ca = 2 Ca = 3 Avg. = 2 Avg. = 2 A A Cb= 3 Cb= 3 Std. = √(6 / 3) Std. = √(2 / 3) B B Cc = 0 Cc = 1 C C
Optimal Solution Ca = 2 Avg. = 2 A Cb= 2 Std. = 0 B Cc = 2 C
Another Example A A Ca = k Ca = k/2 Avg. = k/2 Avg. = k/2 … … … … B B Cb= k/2 Cb= k/2 Std. = √(k2 / 12) Std. = 0 C C Cc = k/2 Cc = 0 k k/2
VM annotation • Use annotation to guide memory deduplication tool like KSM. • More aggressive if running the same OS kernel.
VM annotation(Cont.) • Every VM is configured via a XML file in /etc/libvirt/qemu. • We can insert annotations to VM’s configure file. • Export ('dump') the XML of the virtual machine you want to edit. • Edit the XML. • Import ('define') the XML.
The Problem Is… • What should we annotate? • OS/ kernel version. • Purpose/ Running what application • Computation-intensive or I/O intensive
Paper Study • Characterizing Datasets for Data Deduplication in Backup Applications • Tradeoffs in Scalable Data Routing for Deduplication Clusters