220 likes | 326 Views
Graduate Course Spatial Data. 한국기술대학교 민준기. Spatial Data. Traditional Data Single Dimension value, text New Application GIS, CAD LBS Multimedia Data Multi-dimensional Data. Spatial Access Method(SAM). Support efficient access of Spatial Data B-Tree Only one dimensional Data
E N D
Graduate CourseSpatial Data 한국기술대학교 민준기
Spatial Data • Traditional Data • Single Dimension • value, text • New Application • GIS, • CAD • LBS • Multimedia Data • Multi-dimensional Data
Spatial Access Method(SAM) • Support efficient access of Spatial Data • B-Tree • Only one dimensional Data • Not appropriate to multi-dimensional Data • One of famous spatial indexes • R-Tree
R-Trees : A Dynamic Index Structure for Spatial Searching • R-Tree • A Height-balanced Tree with index records in its leaf nodes containing pointers to data objects. • Dynamic structure: inserts and deletes can be intermixed with searches and no periodic reorganization is required.
R1 A1 A2 a3 a4 a1 a2 R-Trees : A Dynamic Index Structure for Spatial Searching • R-Tree • It is difficult to handle pure spatial data • Based On MBR (minimum bounding rectangle) approximation A2 a3 A1 a4 a1 a2
R-Tree Structure • Node = (E1,… ,EM) • Ei = (I, pointer) where I = (I0,..,Id) , d is dimension and Ij = [a,b] • Let M be the maximum number of entries, and m <= M/2 be the minimum number of entries of a node
Property of R-tree • Every leaf Node contains between m and M index record unless it is the root. • For each index record (I, pointer) in a leaf node, I is the smallest rectangle that spatially contains the n-dimensional data object represented by the indicated tuple. • Every non-leaf node has between m and M children unless it is the root. • For each entry (I, pointer) in a non-leaf node, I is the smallest rectangle that spatially contains the rectangles in the child node. • The root node has at least two children unless it is a leaf. • All leaves appear on the same level.
Property of R-Tree • The height of an R-Tree containing N index records is at most [log_mN]-1 • The maximum number of nodes is [N/m]+[N/m^2]+...+1 • Worst case space utilization for all nodes except root node is m/M. #of leaf nodes
R-Tree Search • Due to the overlap of MBRs, many index nodes may be visited. Search(MBR) if(leaf node){ check all entries in this node which overlap MBR }else{ for each childnode nx which overlap MBR nx.seach(MBR) }
R-Tree Insertion • Algorithm Insertion (newMBR) • Find position for new record • ChooseLeaf Call to select a leaf node • Add record to leaf node • If full, SplitNode call • Propagate changes upward • AdjustTree • Grow tree taller
R-Tree Insert • Algorithm ChooseLeaf CL1 Set N to be a root CL2 If N is a leaf return N else Choose the entry in N whose rectangle needs least area enlargement to include the new data. Resolve ties by choosing the entry with the smallest rectangle CL3 Set N to be the childnode pointed to by the childpointer of the chosen entry. CL4 Repeat CS2.
R-Tree Insert • If there is no room invokes SplitNode • Splite MBR to minize the MBR size • Optimal SpliteNode -> cases that make two subset with M+1entries-> O(2M-1) bad good
R-Tree Insert • Approximation (see details) • Quadratic (O(M2)) • Linear • Select two entries whose lengh are fartest • Insert Remains intp groups
R-Tree Insertion • Adjust covering rectangles and propagating nodes splits as necessary • Ascend from leaf node L to the root AdjustTree Algorithm • [Initialize] N = L • [Check if done] if N is root, stop • [Adjust covering rectangle in parent entry] • Let P be the parent of N, E_N be N’s entry of P • Modify E_N MBR to enclose all MBRS in N. • [Propagate node split upward] • If N has a partnet NN resulting from an earlier split, • Create a new entry E_NN and add E_NN to P • If P has no room, invoke SplitNode • [Move up to next node] • Set N= P and NN= PP, goto step 2.
Processing and Optimization of Multiway Spatial Joins Using R-trees • Cost Based Query Optimizer • Join Selectivity • probability that a tuple is result • best efficient query execution plan generate • Spatial Join Selectivity • Multi-dimension attribute • commonly 2dimension • In this work, focus computation the cost of filer Step(= consider only MBR)
Previous Work • Assumption • [0,1)d • d-dimensional work space • data is uniformly distributed • each dimension is independent
q qy qx Previous Work • Window Query • find all points include window q • S(q) =|qi|d |qi| = size of q of dimension i
(|Sa,y|+|Sb,y|) (|Sa,x|+|Sb,x|) Previous Work • 2-Way Join Query • find Ra interset Rb S(Ra,Rb) = (|Sa|+ |Sb|)d (where |Si| = average size of Ri on one dimension d = dimension)
|Sa| |Sb| |Sc| Previous Work • M-Way Linear Queries(Acyclic Queries) • Ra intersect Rb and Rb intersect Rc S(Ra,Rb,Rc) = (|Sa|+ |Sb|)d (|Sb|+ |Sc|)d • Generalization ∏ (|Si|+|Sj|)d ∀i,j:Q(i,j) = TRUE
R1 R2 S1 S2 S3 R3 Previous Work • M-Way Clique Join Query(M≥3) • Papadias, Mamoulis, Theodoridis(ACM PODS99) • Clique: if a set of rectangles mutually intersect, then they must share a common area Query graph Spatial relationship
s1 s1 s1 s2 s2 s2 Previous Work • Common Area(qn) • Proof(by induction): 확률: 대표값 : |s1|
Previous Work • Selectivity of M-Way Clique Join Query Prob(s2 interset s1)*Prob(s3intersect s1∧s3 intersect s2|s1 s2 mutually intersect) = Prob(s2 intersect s1)*Prob(s3 intersects common intersection area of s1 s2) • General Case: