180 likes | 204 Views
This study explores efficient methods for processing spatial OLAP queries involving spatial data warehouses. It highlights the use of aggregate R-trees and a3DR-trees to store and retrieve aggregated spatial data without predefined hierarchies, optimizing query performance. The research also delves into the benefits of storing spatiotemporal aggregate information for applications like traffic management systems. Experimental evaluations compare query processing techniques such as single-window and multiple-window queries, showcasing the advantages of using R-trees and a3DR-trees over traditional methods. Conclusions suggest that spatio-temporal OLAP is a promising area for further research, emphasizing the integration of spatial and temporal dimensions for enhanced query efficiency.
E N D
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong University of Science and Technology Clear Water Bay, Hong Kong
Motivating Scenario The spatial dimension at the finest granularity consists of a set of regions (e.g., road segments in traffic supervision systems, areas covered by cells in mobile communication systems) The raw data provide the set of objects that fall in each region every timestamp (e.g., cars in a road segment, users serviced by a cell). Queries ask for aggregate data over regions that satisfy some spatio-temporal condition (find the current traffic in all areas in a 1km range around each hospital). Unlike traditional OLAP, there do not exist pre-defined hierarchies.
The aggregate R-tree An R-tree with aggregate data for every entry. The same idea can be applied for other access methods (e.g, quadtrees). Other functions may be used (e.g., avg, max).
Why keep spatiotemporal aggregate information For efficient query processing (e.g., the number of objects inside an area can be found by a window query instead of a spatial join). Aggregate information is all that we need/know for some applications (e.g., traffic systems record the number of cars in an area not their ids) Storing historical information about individual objects may raise privacy issues (having all locations of mobile phone users through history may be illegal) Although the actual data may be highly volatile and involve extreme space requirements, the summarized data are less voluminous and may remain rather constant for long intervals.
aR-trees and OLAP operations The aR-tree corresponds to a lattice. There may be multiple dimensions.
Query Processing- Single Window "find the total number of cars on all road segments inside a query window" • Start from the root of the aR-tree: for all entries one of the following three conditions may hold: • · The entry is disjoint with the query window; thus, the corresponding node cannot contain any cars contributing to the answer and is not retrieved. • · The entry is inside the query window in which case all aggregate information is stored with the entry and the corresponding node does not need to be accessed. • · The entry partially overlaps the query window in which case the corresponding node must be recursively followed.
Query Processing - Multiple Windows "Find the total number of cars on road segments inside each city suburb" Without aR-trees, the query can be processed as a multiway spatial join (suburbs, cars, road segments). With aR-trees, it is processed as a pairwise join (suburbs, aR-tree). If the query windows (i.e., suburbs) fit in memory, we propose an extension of the single-window technique that considers all windows in parallel.
Experimental Settings Tiger Dataset (130,000 road segments) We randomly selected 5,000 seed points which were located on roads. For each seed point, we generated a cluster with 250 points (i.e. car positions) with Gaussian distribution; therefore the total number of cars was 1.25M. The distribution of the queries follows the distribution of the roads
Evaluation for Single-Window Queries Raw data approach: join the cars and streets datasets. Fact table approach: an R-tree indexes the fact table (i.e., similar to aR-trees, but no aggregate information in the intermediate nodes).
Evaluation for Multiple-Window Queries aR-tree (single queries):a set of single-window queries processed using the single_aggregation algorithm of aR-trees. Fact table (join): join between the R-tree index of the fact table and the query windows which fit in memory. Fact table (single): indexed nested loops using the R-tree index of the fact table.
Applications to spatio-temporal data Query: "find the total number of objects in the regions intersecting some window qs during a time interval qt"
The aggregate 3DR-tree (a3DR-tree) Each entry has the form <r.MBR, r.pointer, r.lifespan, r.aggr[]>, that is, for each region it keeps the aggregate value and the time interval during which this value is valid. Whenever the aggregate information about a region changes a new entry is created. Advantage: the a3DR-tree integrates spatial and temporal dimensions in the same structure (and is, therefore, expected to be more efficient than column scanning for queries that involve both conditions) Disadvantage: it wastes space by storing the MBR each time there is an aggregate change
Query Example Find all objects in some region overlapping the query window qs during the time interval [1-3]
Conclusions and directions for future work Spatio-temporal OLAP very promising direction of work Incorporation of multi-version structures for dynamic dimensions Formalization - analysis of when aggregation multi-trees are preferable