260 likes | 386 Views
R ++ -tree : an efficient spatial access method for highly redundant point data . Martin Šumák , Peter Gurský University of P. J. Šafárik in Košice. Research motivation.
E N D
R++-tree: an efficient spatial access method for highly redundant point data Martin Šumák, Peter Gurský University of P. J. Šafárik in Košice
Research motivation • Besides kNN and range queries, R-tree-like index is usable for computation of Top-k query (find best k objects according to user preferences) • h(x1, x2) = f1(x1) + f2(x2) Martin Šumák, Peter Gurskýat ADBIS 2013
Why highly redundant point data • Our data consists of flats with the following attributes: • price • area • floor • max floor of building • year of approbation • number of rooms • Each flat is represented by a point in 6-dimensional space Martin Šumák, Peter Gurský at ADBIS 2013
R+-tree fundamentals • R+-tree is R-tree-like index with the following specialities: • zero overlaps betweennodes at the same level • rectangles of nodescover all the parent’srectangle • suitable for point dataand point queries Martin Šumák, Peter Gurský at ADBIS 2013
R+-tree fundamentals • desired state • zero overlaps • minimum bounding rect. • R+-tree • avoids overlaps at the cost of rectangles size Martin Šumák, Peter Gurský at ADBIS 2013
TheR++-tree idea • desired state • zero overlaps • minimum bounding rect. • R++-tree • inner nodes keep two rectangles for each child node – the minimum and the parent covering one Martin Šumák, Peter Gurský at ADBIS 2013
TheR++-tree idea • desired state • zero overlaps • minimum bounding rect. • R++-tree • inner nodes keep two rectangles for each child node – the minimum and the parent covering one Leaf nodes left unchanged Martin Šumák, Peter Gurský at ADBIS 2013
Nodes of R++-tree • Leaf nodes • Exactly same as leaf nodes of R+-tree • Contain Id and coordinates for each object • Take one disk page each • Inner nodes • Contain pointer and two rectangles for each child node • Take two disk pages each Martin Šumák, Peter Gurský at ADBIS 2013
Using of two rectangles in inner nodes • Searching • Only the minimum bounding rectangles are necessary • Inserting new objects • Both minimum bounding and parent covering rectangles need to be used (read/updated) Martin Šumák, Peter Gurský at ADBIS 2013
Implementation of inner nodes • First page contains minimum bounding rectangles • Second page contains parent covering rectangles Martin Šumák, Peter Gurský at ADBIS 2013
Advantages and drawbacks of two pages idea • Advantages • searching requires reading of one page per each node involved • rate between page size and node capacity is the same as in R+-tree • Drawbacks • When updating, two pages per inner node need to be processed • The real impact on whole index size is relatively low Martin Šumák, Peter Gurský at ADBIS 2013
Experiments - data • Artificial data (range, kNN and top-k query) • 100 000 random points of 2–10-dimensional space • decimal values within [0; 1] • Integer values from 1 to 100 • Integer values from 1 to 10 • Pseudo-real data (top-k query) • 6 dimensional points – data of flats for sale • 550 000 flats (20-multiple set) • 2 700 000 flats (100-multiple set) Martin Šumák, Peter Gurský at ADBIS 2013
Experiments - measures • 300 random queries per each data set and query type • Average time per query • Average number of I/Os per query • One I/O corresponds to reading of one page i.e. processing one node Martin Šumák, Peter Gurský at ADBIS 2013
Artificial data100 000 random points with decimal values within [0; 1] Martin Šumák, Peter Gurský at ADBIS 2013
Artificial data100 000 random points with decimal values within [0; 1] Martin Šumák, Peter Gurský at ADBIS 2013
Artificial data100 000 random points with decimal values within [0; 1] Martin Šumák, Peter Gurský at ADBIS 2013
Artificial data100 000 random points with integer values from 1 to 100 Martin Šumák, Peter Gurský at ADBIS 2013
Artificial data100 000 random points with integer values from 1 to 100 Martin Šumák, Peter Gurský at ADBIS 2013
Artificial data100 000 random points with integer values from 1 to 100 Martin Šumák, Peter Gurský at ADBIS 2013
Artificial data100 000 random points with integer values from 1 to 10 Martin Šumák, Peter Gurský at ADBIS 2013
Artificial data100 000 random points with integer values from 1 to 10 Martin Šumák, Peter Gurský at ADBIS 2013
Artificial data100 000 random points with integer values from 1 to 10 Martin Šumák, Peter Gurský at ADBIS 2013
Pseudo-real data550 000 flats (i.e. 6-dimensional points) Martin Šumák, Peter Gurský at ADBIS 2013
Pseudo-real data550 000 flats (i.e. 6-dimensional points) Martin Šumák, Peter Gurský at ADBIS 2013
Pseudo-real data2 700 000 flats (i.e. 6-dimensional points) Martin Šumák, Peter Gurský at ADBIS 2013
Thank you for your attention Martin Šumák, Peter Gurský at ADBIS 2013