R ++ -tree : an efficient spatial access method for highly redundant point data

R++-tree: an efficient spatial access method for highly redundant point data Martin Šumák, Peter Gurský University of P. J. Šafárik in Košice

Research motivation • Besides kNN and range queries, R-tree-like index is usable for computation of Top-k query (find best k objects according to user preferences) • h(x1, x2) = f1(x1) + f2(x2) Martin Šumák, Peter Gurskýat ADBIS 2013

Why highly redundant point data • Our data consists of flats with the following attributes: • price • area • floor • max floor of building • year of approbation • number of rooms • Each flat is represented by a point in 6-dimensional space Martin Šumák, Peter Gurský at ADBIS 2013

R+-tree fundamentals • R+-tree is R-tree-like index with the following specialities: • zero overlaps betweennodes at the same level • rectangles of nodescover all the parent’srectangle • suitable for point dataand point queries Martin Šumák, Peter Gurský at ADBIS 2013

R+-tree fundamentals • desired state • zero overlaps • minimum bounding rect. • R+-tree • avoids overlaps at the cost of rectangles size Martin Šumák, Peter Gurský at ADBIS 2013

TheR++-tree idea • desired state • zero overlaps • minimum bounding rect. • R++-tree • inner nodes keep two rectangles for each child node – the minimum and the parent covering one Martin Šumák, Peter Gurský at ADBIS 2013

TheR++-tree idea • desired state • zero overlaps • minimum bounding rect. • R++-tree • inner nodes keep two rectangles for each child node – the minimum and the parent covering one Leaf nodes left unchanged Martin Šumák, Peter Gurský at ADBIS 2013

Nodes of R++-tree • Leaf nodes • Exactly same as leaf nodes of R+-tree • Contain Id and coordinates for each object • Take one disk page each • Inner nodes • Contain pointer and two rectangles for each child node • Take two disk pages each Martin Šumák, Peter Gurský at ADBIS 2013

Using of two rectangles in inner nodes • Searching • Only the minimum bounding rectangles are necessary • Inserting new objects • Both minimum bounding and parent covering rectangles need to be used (read/updated) Martin Šumák, Peter Gurský at ADBIS 2013

Implementation of inner nodes • First page contains minimum bounding rectangles • Second page contains parent covering rectangles Martin Šumák, Peter Gurský at ADBIS 2013

Advantages and drawbacks of two pages idea • Advantages • searching requires reading of one page per each node involved • rate between page size and node capacity is the same as in R+-tree • Drawbacks • When updating, two pages per inner node need to be processed • The real impact on whole index size is relatively low Martin Šumák, Peter Gurský at ADBIS 2013

Experiments - data • Artificial data (range, kNN and top-k query) • 100 000 random points of 2–10-dimensional space • decimal values within [0; 1] • Integer values from 1 to 100 • Integer values from 1 to 10 • Pseudo-real data (top-k query) • 6 dimensional points – data of flats for sale • 550 000 flats (20-multiple set) • 2 700 000 flats (100-multiple set) Martin Šumák, Peter Gurský at ADBIS 2013

Experiments - measures • 300 random queries per each data set and query type • Average time per query • Average number of I/Os per query • One I/O corresponds to reading of one page i.e. processing one node Martin Šumák, Peter Gurský at ADBIS 2013

Artificial data100 000 random points with decimal values within [0; 1] Martin Šumák, Peter Gurský at ADBIS 2013

Artificial data100 000 random points with integer values from 1 to 100 Martin Šumák, Peter Gurský at ADBIS 2013

Artificial data100 000 random points with integer values from 1 to 10 Martin Šumák, Peter Gurský at ADBIS 2013

Pseudo-real data550 000 flats (i.e. 6-dimensional points) Martin Šumák, Peter Gurský at ADBIS 2013

Pseudo-real data2 700 000 flats (i.e. 6-dimensional points) Martin Šumák, Peter Gurský at ADBIS 2013

Thank you for your attention Martin Šumák, Peter Gurský at ADBIS 2013

R ++ -tree : an efficient spatial access method for highly redundant point data

R ++ -tree : an efficient spatial access method for highly redundant point data

Presentation Transcript

CPSC 335

Spatial Databases: Lecture 2

Point Pattern Analysis

Spatial Data Analysis

Presented by Snehal Thakkar

An Efficient Data Envelopment Analysis with a large data set in Stata

District Data Workday

District Data Workday

Best practices to ensure efficient data models, fast data activation, and performance of your SAP NetWeaver BW 7.3 data

Check Point DLP Technical Presentation

Trees

Cisco Small Business Wireless Access Point WAP371

Nonlinear Models with Spatial Data

Multidimensional Access Methods

Spatial and Temporal Data Mining

Spatial organization

Computational Geometry and Spatial Data Mining

Spatial Data Mining Toolkit for Refining MSDS (aka TopoAssistant)

DECISION TREE

Temple University – CIS Dept. CIS616– Principles of Data Management

V. Megalooikonomou Spatial Access Methods (SAMs) I