90 likes | 466 Views
Cube Tree. Dimension: number of group-by values Relation tuples map to a point in the space Aggregates: projection of all data points on all the subspaces. Intersection between a subspace and the orthogonal hyper-plane stores the aggregates. Origin represents aggregate with no grouping
E N D
Cube Tree • Dimension: number of group-by values • Relation tuples map to a point in the space • Aggregates: projection of all data points on all the subspaces. • Intersection between a subspace and the orthogonal hyper-plane stores the aggregates. • Origin represents aggregate with no grouping • Query a group-by aggregate on the corresponding hyper-planes
Packed R-Tree • Sort-pack: (for multi-dimension data) • Achieves excellent clustering • Significantly reduces the overlap and dead space • A preferred structure for Datcubes storage • Representation of Datacube only provide good clustering for half of the total group-bys • Degradation due to strong interleaving between points of these group-bys.
Dataless & Reduced Cubetree • Dataless Cubtree: Only contains aggregate values but no data values • Better clustering than a full tree in a R-Tree • Projection points are not interleaved • Reduced Cubetree: Each hyper-plane which containing aggregates will form a R-Tree independently • The dimension of R-Tree reduced by one. • Better clustering and query performance
Allocating of goupbys to R-Trees • A set of group-bys are compatible if there exist a sort order that guarantees no dispersion • Allocate a group-by to one of the N R-Trees • the set of group-bys for this R-Tree is compatible • if a group-by cannot find a compatible set • assign it to a set that contain all of its gorup-by attributes. (false allocation) • Selection of sort order for Packed R-Tree is also an import parameter for favoring some prefered group-bys
Iceberg Cube • Selectively compute only those partitions that satisfy an aggregate condition • Aggregate with low support reveal little meaning & make the cube sparse • Conditions like • Minimum support of a partition • Required Range
Bottom-Up Cube Parent to compu the child
Bottom-Up Cube (2) • Starting from a bottom single dimension groupby • If current inputs can be pruned return • Partition the data in this group-by • If a partition is greater than the minsup • recursive call on BUC with the partition as inputs • Loop until all dimensions is done
Bottom-Up Cube (3) • Similar idea of Apriori-gen • Apriori will generate all the candidates at the same level first (breadth first) • BUC is in depth first manner. • To reduce memory requirement • Dimension ordering: provide better pruning • Cardinality, Skew & Correlation