310 likes | 490 Views
Introduction to Spatial Database Research. Donghui Zhang CCIS Northeastern University. What is spatial database?. A database system that is optimized to store and query spatial objects: Point: a hotel, a car Line: a road segment Polygon: landmarks, layout of VLSI. Road Network.
E N D
Introduction to Spatial Database Research Donghui Zhang CCIS Northeastern University
What is spatial database? • A database system that is optimized to store and query spatial objects: • Point: a hotel, a car • Line: a road segment • Polygon: landmarks, layout of VLSI Road Network Satellite Image VLSI Layout
Are spatial databases useful? • Geographical Information Systems • e.g. data: road network and places of interest. • e.g. usage: driving directions, emergency calls, standalone applications. • Environmental Systems • e.g. data: land cover, climate, rainfall, and forest fire. • e.g. usage: find total rainfall precipitation. • Corporate Decision-Support Systems • e.g. data: store locations and customer locations. • e.g. usage: determine the optimal location for a new store. • Battlefield Soldier Monitoring Systems • e.g. data: locations of soldiers (w/wo medical equipments). • e.g. usage: monitor soldiers that may need help from each one with medical equipment.
Shortest-Path Query Fastest-Path Query MapQuest.com
Driving directions as you go. • Find nearest Wal-Mart or hospital. NN Query
Range query ArcGIS 9.2, ESRI
Are spatial databases useful? • Geographical Information Systems • e.g. data: road network and places of interest. • e.g. usage: driving directions, emergency calls, standalone applications. • Environmental Systems • e.g. data: land cover, climate, rainfall, and forest fire. • e.g. usage: find total rainfall precipitation. • Corporate Decision-Support Systems • e.g. data: store locations and customer locations. • e.g. usage: determine the optimal location for a new store. • Battlefield Soldier Monitoring Systems • e.g. data: locations of soldiers (w/wo medical equipments). • e.g. usage: monitor soldiers that may need help from each one with medical equipment.
Are spatial databases useful? • Geographical Information Systems • e.g. data: road network and places of interest. • e.g. usage: driving directions, emergency calls, standalone applications. • Environmental Systems • e.g. data: land cover, climate, rainfall, and forest fire. • e.g. usage: find total rainfall precipitation. • Corporate Decision-Support Systems • e.g. data: store locations and customer locations. • e.g. usage: determine the optimal location for a new store. • Battlefield Soldier Monitoring Systems • e.g. data: locations of soldiers (w/wo medical equipments). • e.g. usage: monitor soldiers that may need help from each one with medical equipment.
Are spatial databases useful? • Geographical Information Systems • e.g. data: road network and places of interest. • e.g. usage: driving directions, emergency calls, standalone applications. • Environmental Systems • e.g. data: land cover, climate, rainfall, and forest fire. • e.g. usage: find total rainfall precipitation. • Corporate Decision-Support Systems • e.g. data: store locations and customer locations. • e.g. usage: determine the optimal location for a new store. • Battlefield Soldier Monitoring Systems • e.g. data: locations of soldiers (w/wo medical equipments). • e.g. usage: monitor soldiers that may need help from each one with medical equipment.
NN(Bob) = George George John Bob Bill Mike
Who will seek help from me? RNN(Bob) = {John, Mike} George John Bob Bill Mike RNN query
And beyond the “space” … • 2004 NBA dataset*: each player has 17 attributes • “Spatial Data”: an object is a point in a 17-dimensional space • Who are the best players? • i.e. not “dominated” by any other player. Skyline query * www.databaseBasketball.com
And beyond the “space” … • 2004 NBA dataset*: each player has 17 attributes • “Spatial Data”: an object is a point in a 17-dimensional space • Who are the best players? • i.e. not “dominated” by any other player. Skyline query * www.databaseBasketball.com
And beyond the “space” … • 2004 NBA dataset*: each player has 17 attributes • “Spatial Data”: an object is a point in a 17-dimensional space • Who are the best players? • i.e. not “dominated” by any other player. Skyline query * www.databaseBasketball.com
And beyond the “space” … • 2004 NBA dataset*: each player has 17 attributes • “Spatial Data”: an object is a point in a 17-dimensional space • Who are the best players? • i.e. not “dominated” by any other player. Skyline query * www.databaseBasketball.com
And beyond the “space” … • 2004 NBA dataset*: each player has 17 attributes • “Spatial Data”: an object is a point in a 17-dimensional space • Who are the best players? • i.e. not “dominated” by any other player. Skyline query * www.databaseBasketball.com
And beyond the “space” … • 2004 NBA dataset*: each player has 17 attributes • “Spatial Data”: an object is a point in a 17-dimensional space • Who are the best players? • i.e. not “dominated” by any other player. Skyline query * www.databaseBasketball.com
Subspace Skyline Queries u3 u3 t2 t2 t4 t3 t4 t7 1 2 3 4 5 6 7 8 t3 t7 1 2 3 4 5 6 7 8 t5 t5 t5 t6 t1 t1 t6 u1 1 2 3 4 5 6 7 8 9 u2 Skyline in u1, u3 1 2 3 4 5 6 7 8 9 Skyline in u2, u3 • In an online skyline processing system, the users may ask skyline queries on any subspace, i.e. a subset of attributes. • Different subspace skylines can be very different! u1 u2 u3 u4 t1 3 4 2 5 t2 4 6 7 2 t3 9 7 5 6 t4 4 3 6 1 t5 2 2 3 1 t6 6 1 1 3 t7 1 3 4 1 Objects of 4-dimensions
Straightforward Solutions • On-the-fly computation • Slow query processing • Pre-compute and store all subspace skylines: high update costs • No update support • Waste of storage
The Compressed Skycube [XZ06] • Compact storage • Represent all skylines in a very concise way, by preserving only essential information of subspace skylines. • Efficient query support • Efficiently answer arbitrary subspace skyline queries without accessing the original data. • Efficient update scheme • Avoid unnecessary data access and subspace skyline computation upon updates.
The complete pre-computation Subspace Skyline u1 t7 u2 t6 u3 t6 u4 t4 , t5 , t7 u1 , u2 t5 , t6, t7 , t9 u1 , u3 t1 , t5 , t6, t7 , t9 u1 , u4 t7 u2 , u3 t6 u2 , u4 t5 , t6 u3 , u4 t5 , t6 u1 , u2 , u3 t1 , t5 , t6, t7 , t9 u1 , u2 , u4 t5 , t6, t7 u1 , u3 , u4 t1 , t5 , t6, t7 u2 , u3 , u4 t5 , t6 u1 , u2 , u3 , u4 t1 , t5 , t6, t7 u1 u2 u3 u4 t1 3 4 2 5 t2 4 6 7 2 Skycube t3 9 7 5 6 t4 4 3 6 1 t5 2 2 3 1 t6 6 1 1 3 Contains many duplicates, e.g. t6 appears 12 times t7 1 3 4 1 t8 6 5 3 8 t9 2 2 3 7
Minimum Subspace (mss) Minimum Subspaces t1u1, u3 t4u4 t5 u4, u1, u2, u1, u3 t6u2, u3 t7u1, u4 t9u1, u2, u1, u3 Subspace Skyline • Object t6 appears in the skylines of 12subspaces. • The number of minimum subspaces of t6 is only 2. u1 t7 u2 t6 u3 t6 u4 t4 , t5 , t7 u1 , u2 t5 , t6, t7 , t9 u1 , u3 t1 , t5 , t6, t7 , t9 u1 , u4 t7 u2 , u3 t6 u2 , u4 t5 , t6 u3 , u4 t5 , t6 u1 , u2 , u3 t1 , t5 , t6, t7 , t9 u1 , u2 , u4 t5 , t6, t7 u1 , u3 , u4 t1 , t5 , t6, t7 u2 , u3 , u4 t5 , t6 u1 , u2 , u3 , u4 t1 , t5 , t6, t7
The Compressed Skycube (CSC) CSC Subspace Skyline Minimum Subspaces u1 t7 t1u1, u3 u2 t6 t4u4 u3 t6 t5 u4, u1, u2, u1, u3 u4 t4 , t5 , t7 t6u2, u3 u1 , u2 t5 , t9 t7u1, u4 u1 , u3 t1 , t5 , t9 t9u1, u2, u1, u3 • Definition: The Compressed Skycube (CSC) consists of non-empty subspace U, such that an object t is stored in a subspace U if and only if U is a minimum subspace of t, i.e. U mss(t).
Querying CSC t6 Find the skyline in subspace u2, u3, u4. t5 Only visit CSC, not whole dataset • Theorem 1: Given a query space Uq and an object t, if for any subspace Ui in mss(t), UiUq, then t is not in the skyline of Uq. • Search the subspaces which are subsets of the query space. • Theorem 2 (Local Comparison): To check a candidate t in a subspace V Uq, we only need to compare t with the objects within the same subspace. • Compare candidates within their own subspaces. Output is non-blocking! CSC Subspace Skyline u1 u2 u3 u4 u1 t7 t1 3 4 2 5 u2 t6 t4 4 3 6 1 u3 t6 t5 2 2 3 1 u4 t4 , t5 , t7 t6 6 1 1 3 u1 , u2 t5 , t9 t7 1 3 4 1 u1 , u3 t1 , t5 , t9 t9 2 2 3 7
Updating CSC • sky(full): the skyline regarding to all dimensions. • t: the object to be updated. • Theorem: upon update, no need to access the original data if tsky(full). • Efficient algorithms in both cases.
Performance • (Full-space) Dimensionality: 6 • Object cardinality: [100K, 500K]. • Distribution: Uniform Update efficiency Storage efficiency Query efficiency
Summary • Spatial database has many practical applications. • Spatial database research aims to design efficient algorithms for various queries. • The talk mentioned a few (range query, aggregation query, NN query, RNN query, optimal-location query, fastest-path query, and skyline query). • There are much more -- an on-going research field.