230 likes | 330 Views
Compressing Relations And Indexes. Jonathan Goldstein Raghu Ramakrishnan Uri Shaft Department of Compter Sciences, University of Wisconsin-Madison June 18, 1997. Agenda. Introduction Compressing A Relation Compression Applied to Rectangle Base Indexes Performance Evaluation
E N D
Compressing Relations And Indexes Jonathan Goldstein Raghu Ramakrishnan Uri Shaft Department of Compter Sciences, University of Wisconsin-Madison June 18, 1997
Agenda Introduction Compressing A Relation Compression Applied to Rectangle Base Indexes Performance Evaluation Questions and Remarks
Introduction • Page level Compression • Performance Study • Application to B-trees and R-trees • Multidimensional bulk loading algorithm
Compressing A relation • Frames Of Reference • Non numeric attributes • File level compression
Lossy Compression Point approximation in lossy compression
Compressing an indexing structure • Compressing a B-tree • Compressing a rectangle based indexing structure • Compression oriented Bulk Loading
Bulk-Loading Algorithm • Input. A set of points in some n-dimentional space. • Output. A partition of the inut into subsets. • Requirements. The partition shuold group points that are close to each other in the same group as much as possiblg
GB-Pack compression oriented bulk loading • Qualities: • trading off some tree quality for increased compression. • number of entries per page is data-dependent. • cutting a dimension in a value boundary in the data.
Performance Evaluation • Relational Compression Experiments. • CPU vs. I/O Costs. • Comparison With Techniques in commercial systems. • Importance of Tuple-Level Decompression. • R-tree Compression Experiments.
Synthetic Data Sets • Size: The number of tuples in the relation. • Dimensionality: The number of attributes of the relations. • Range: The range of values for the attributes. • Distribution :uniform(worst case) / exponential. • Partition Strategy. • Page size.
Sales Data Set Sales data set. Compression Achieved versus dimensionality
R-tree Compression Experiments Testing the quality of R-trees on Sales Data Set.