950 likes | 1.18k Views
Spatial-temporal Database and moving objects management. Outlines. Introduction Background knowledge Spatial database Spatial-temporal database Moving objects management Conclusions. Introduction. Query. Location update. Location update. Answer. Query. Introduction.
E N D
Outlines • Introduction • Background knowledge • Spatial database • Spatial-temporal database • Moving objects management • Conclusions
Query Location update Location update Answer Query Introduction • Location-aware database services • GPS technology • GIS system • Vehicular database • Wireless communication technology
Introduction • Location-based applications • Traffic monitoring and management • Location-based store (services) find and advertisement • People cooperation and communication
Location-aware DatabaseServer How many cars in this area? Static Query over moving Object Keep talking with 3 nearest police cars Keep me updated by hospitals in 3 miles Moving Query over moving Object Continuous K-nearest Neigbor Keep updating how many airplanes within 100 miles How many cars in Highlight now? Moving Query over moving Object Snapshot Moving Query over moving Object Continuous Range Based Moving Query over Static Object Various Queries
Target Query • Moving Continual Queries over moving object (MCQ) • Generalization of Moving and Static Queries • Continuous nature 1 continuous query = a sequence of snapshot queries with some frequency • Range-based or kNN
History of Database Technology • 1960s: Data collection, database creation, IMS and network DBMS • 1970s: Relational data model, relational DBMS implementation • 1980s: RDBMS, advanced data models (extended-relational, OO, deductive, etc.) and application-oriented DBMS (spatial, scientific, engineering, etc.) • 1990s—2000s: Data mining and data warehousing, multimedia databases, and Web databases
Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB Structure of a RDBMS Modern Database Systems Extend these layers • A DBMS is an OS for data! • A typical RDBMS has a layered architecture.
Index Methods for RDBMS • Hashing Methods • B-tree family • Both of them are one-dimensional
B+-tree • Records must be ordered over an attribute • Queries • exact match or range queries over the indexed attribute • find the name of the student with SID=dr868301 • find all students with gpa between 3.00 and 3.5
B+-tree:properties • “B” for balance! • Each node contains up to n-1 search key values and n pointers • A nonleaf node may hold up to n pointers and must hold at least • Two types of nodes: index nodes and data nodes; each node is 1 page (disk based method)
Index node to keys to keys to keys to keys < 57 57£ k<81 81£k<95 95£ 57 81 95
Data node From non-leaf node to next leaf in sequence 57 81 95 To record with key 57 To record with key 81 To record with key 85
EX: B+ Tree of order 3. (a) Initial tree 60 Index level 20 , 40 80 5,10 20 40,50 60 80,100 Data level
Query Example Root 100 Range[32, 160] 120 150 180 30 3 5 11 120 130 180 200 100 101 110 150 156 179 30 35
Insertion • Find correct leaf L. • Put data entry onto L. • If L has enough space, done! • Else, must splitL (into L and a new node L2) • Redistribute entries evenly, copy upmiddle key. • Insert index entry pointing to L2 into parent of L. • This can happen recursively • To split index node, redistribute entries evenly, but push upmiddle key. (Contrast with leaf splits.) • Splits “grow” tree; root split increases height. • Tree growth: gets wider or one level taller at top.
Deletion • Start at root, find leaf L where entry belongs. • Remove the entry. • If L is at least half-full, done! • If L has only d-1 entries, • Try to re-distribute, borrowing from sibling (adjacent node with same parent as L). • If re-distribution fails, mergeL and sibling. • If merge occurred, must delete entry (pointing to L or sibling) from parent of L. • Merge could propagate to root, decreasing height.
Create a B+ tree • Insertion order: 9, 6, 1, 8, 4, 13 6 6 , 8 6, 9 1 6, 9 1 6 8, 9 8 6 , 8 9 6 1, 4 6 8, 9 1, 4 6 8 9,13
insert 4 6 6 1 6, 9 1, 4 6, 9 Insert a key • Insert a key into a leaf which still has some room (not overflow). • Put the keys of this leaf in order. No changes are made in the index level.
If a key is inserted into a full leaf (overflow) • Split, the new leaf node is included in the sequence set, keys are distributed evenly between the old and the new leaves, and the first key from the new node is copied (not moved, as in B-tree) 6 Insert 10 Insert 3 6 6, 9 3 6, 9 9,10 1, 4 6, 9 1, 4 6 1 3, 4 6 9,10 The parent is not full The parent is full
6, 9 delete 4 6, 9 9,10 1, 4 6 9,10 1 6 delete 9 6, 10 6, 9 10 10 1, 4 1, 4 6 6 Delete a key • Delete a key from a leaf leading to no underflow • Delete the leaf and keep remaining keys in order • index level !
Introduction • A common technology for some Applications: • GIS (geographic/geo-referenced data) • VLSI design (geometric data) • modeling complex phenomena (spatial data) • All need to manage large collections of relatively simple spatial objects
SDBMS Definition A spatial database system: • Is a database system • A DBMS with additional capabilities for handling spatial data • Offers spatial data types (SDTs) in its data model and query language • Structure in space: e.g., POINT, LINE, REGION • Relationships among them: (l intersects r) • Supports SDT in its implementation providing at least • spatial indexing (retrieving objects in particular area without scanning the whole space) • efficient algorithms for spatial joins (not simply filtering the cartesian product)
Modeling Assume 2-D and GIS application, two basic things need to be represented: • Objects in space: cities, forests, or rivers single objects • Coverage/Field: say something about every point in space (e.g., partitions, thematic maps) spatially related collections of objects
Modeling: spatial primitives for objects • Point: object represented only by its location in space, e.g. center of a state • Line (actually a curve or ployline): representation of moving through or connections in space, e.g. road, river • Region: representation of an extent in 2d-space, e.g. lake, city
Modeling: spatial relationships • Topological relationships: e.g. adjacent, inside, disjoint. • Are invariant under topological transformations like translation, scaling, rotation • Direction relationships: e.g. above, below, or north_of, sothwest_of, … • Metric relationships: e.g. distance
Spatial Queries • Given a collection of geometric objects (points, lines, polygons, ...) • organize them on disk, to answer efficiently • point queries • range queries • k-nn queries
Spatial Queries • Given a collection of geometric objects (points, lines, polygons, ...) • organize them on disk, to answer • point queries • range queries • k-nn queries
Spatial Queries • Given a collection of geometric objects (points, lines, polygons, ...) • organize them on disk, to answer • point queries • range queries • k-nn queries
Spatial Queries • Given a collection of geometric objects (points, lines, polygons, ...) • organize them on disk, to answer • point queries • range queries • k-nn queries
Access Methods • Discussed in the course • Grid file • K-d tree • Z curve • R-tree
The problem • Given a point set and a rectangular query, find the points enclosed in the query (range) • Given a point set and a point query q, find the point nearest to q (NN,KNN) Query
Grid File • Idea: Use a grid to partition the space each cell is associated with one page • Two disk access principle
Grid File • Start with one bucket for the whole space. • Select dividers along each dimension. Partition space into cells • Dividers cut all the way.
Grid File • Each cell corresponds to 1 disk page. • Many cells can point to the same page. • Cell directory potentially exponential in the number of dimensions
Grid File Implementation • Dynamic structure using a grid directory • Grid array: a 2 dimensional array with pointers to buckets (this array can be large, disk resident) G(0,…, nx-1, 0, …, ny-1) • Linear scales: Two 1 dimensional arrays that used to access the grid array (main memory) X(0, …, nx-1), Y(0, …, ny-1)
Example Buckets/Disk Blocks Grid Directory Linear scale Y Linear scale X
Grid File Search • Exact Match Search: at most 2 I/Os assuming linear scales fit in memory. • First use liner scales to determine the index into the cell directory • access the cell directory to retrieve the bucket address (may cause 1 I/O if cell directory does not fit in memory) • access the appropriate bucket (1 I/O) • Range Queries: • use linear scales to determine the index into the cell directory. • Access the cell directory to retrieve the bucket addresses of buckets to visit. • Access the buckets.
K-d tree • K-d tree is a main memory binary tree for indexing k-dimensional points • The kd-tree is a data structure that is based on recursively subdividing a set of points with alternating axis-aligned hyperplanes. • K-d tree is not necessarily balanced
l6 4 6 l1 7 l9 l5 8 l3 l2 5 9 10 3 l8 l10 2 4 5 11 6 2 1 l7 l4 11 1 3 9 10 8 7 Kd-trees l1 l3 l2 l4 l5 l7 l6 l10 l8 l9
l1 2 4 5 11 6 1 3 9 10 8 7 Kd-trees. Construction 4 6 l6 7 l5 l9 l1 8 l3 l2 5 9 l3 l2 10 l10 3 l8 2 1 l7 l4 l5 l7 l6 l4 11 l10 l8 l9
Z-ordering • Map points from 2-dimensions to 1-dimension. Use a B+-tree to index the 1-dimensional points • Basic assumption: Finite precision in the representation of each co-ordinate, K bits (2K values) • The address space is a square (image) and represented as a 2K x 2K array
Z-ordering • Impose a linear ordering on the pixels of the image 1 dimensional problem A ZA = shuffle(xA, yA) = shuffle(“01”, “11”) 11 = 0111 = (7)10 10 ZB = shuffle(“01”, “01”) = 0011 01 00 00 01 10 11 B
Z-ordering • Given a point (x, y) and the precision K find the pixel for the point and then compute the z-value • Given a set of points, use a B+-tree to index the z-values • A range (rectangular) query in 2-d is mapped to a set of ranges in 1-d
Queries • Find the z-values that contained in the query and then the ranges QA QA range [4, 7] 11 QB ranges [2,3] and [8,9] 10 01 00 00 01 10 11 QB
R-trees • [Guttman 84] Main idea: allow parents to overlap! • => guaranteed 50% utilization • => easier insertion/split algorithms. • (only deal with Minimum Bounding Rectangles - MBRs)
R-trees • A multi-way external memory tree • Index nodes and data (leaf) nodes • All leaf nodes appear on the same level • Every node contains between m and M entries • The root node has at least 2 entries (children)