300 likes | 423 Views
Approximate querying about the Past, the Present, and the Future in Spatio-Temporal Databases. Jimeng Sun, Dimitris Papadias, Yufei Tao, Bin Liu. Motivation. Spatio-temporal databases vs. Data streams The monitoring applications Traffic supervision Mobile users monitoring
E N D
Approximate querying about the Past, the Present, and the Futurein Spatio-Temporal Databases Jimeng Sun, Dimitris Papadias, Yufei Tao, Bin Liu
Motivation • Spatio-temporal databases vs. Data streams • The monitoring applications • Traffic supervision • Mobile users monitoring • Weather forecasting • Example: • find the number of vehicles in the city center now • The challenge is to provide fast query responsein highly intensive environment
Problems and methods • Problems: • How to efficiently store/summarize the spatio-temporal information? • How to approximately answer the query about the past, the present, and the future? • Methods: • Adaptive multi-dimensional histogram (AMH) • Historical synopsis • Stochastic prediction method
Related work • Histograms • Static multi-dimensional histograms • Equi-depth, Mhist, Minskew, Genhist, SQ • Query-adaptive multi-dimensional histograms • STGrid, STHoles, SASH • Other approximation methods • DCT, Wavelet, Sketch • Spatio-temporal databases • Historical retrieval • Future prediction
Outline • Introduction • Problem and proposed methods • Adaptive multi-dimensional histogram • Historical synopsis • Prediction model • Experiment • Conclusion
Query types Queries location Present Time (PT) Historical Time (HT) Future Time (FT) time past current future
System Overview Historical Synopsis AMH Queries Spatio-temporal updates PT Past Index HT FT Prediction Model
Histogram • Partition the space into buckets • Data within a bucket summarize by the mean • The properties of a good histogram: • Uniformity within each bucket • Incremental updateable bad good
Regular cells n1 1 1 3 3 3 5 2 1 3 6 4 4 n3 n2 1 1 5 3 4 5 n4 b5 4 5 4 6 5 5 5 6 10 9 9 4 b1 b2 Buckets n5 b6 5 6 1 1 1 1 BPT b1 b3 b6 b3 b4 b2 b4 b5 Adaptive Multi-dimensional Histogram (AMH) • Objective: minimize WVS=(areai∙vari) (Minskew [Acharya, Poosala, Ramaswamy 99])
Dynamic Maintenance of AMH • Our scheme: record the information during the construction and modify the structure as needed. • 1. information update • Update the bucket count • 2. bucket reorganization • Merge: to claim buckets • Split: to reduce WVS
Information update of AMH Buckets n1 n1 b1 b1 b3 b6 n3 n2 n2 mapping n4 b5 b2 b1 b1 b2 n5 b6 b4 BPT b5 b3 b4
Bucket reorganization -Merge • Merge the subtree that leads to minimal WVS increase BPT n1 n3 n2 n1 n1 b* b5 b1 b2 n3 n3 n2 n2 Buckets n4 n4 n4 b5 b5 b1 b* Merge b1 b1 b2 b2 n5 n5 b6 b6 BPT b3 b3 b4 b4 Bucket Info: 1. region [x-, x+][y-,y+] 2. frequency: count/area 3. 2nd moment: (for variance calculation) b2 b5
n5 b*3 b*4 Bucket reorganization -Split • Split the bucket that leads to maximal WVS decrease n1 n1 n3 n2 Split n3 b* b5 n2 n4 b2 b* b5 b*1 b*2 b1 b2
Features of AMH • Bucket information is updated as new data arrive • Bucket extents continuously adapt the data distribution changes • The maintenance does not affect the normal query processing • It is interruptible at any moment of time • It is performed at the CPU idle time
Outline • Introduction • Problem and proposed methods • Adaptive multi-dimensional histogram • Historical synopsis • Prediction model • Experiment • Conclusion
Historical Synopsis • AMH maintains the current buckets. • Past index stores the obsolete buckets. • Past index: • Packed B-tree • 3D R-tree
Prediction Model • Prediction based on velocity doesn’t work! • It is not realistic to assume velocity remains constant between current time and query time • Velocity is highly dynamic • We suggest to use only the past and present location information to do prediction.
Historical Synopsis Prediction Model (cont.) FT PT Parse Prediction Model HT results forecast the future using any time series prediction method: we use AR
Outline • Introduction • Related work • Problem and proposed methods • Adaptive multi-dimensional histogram • Historical synopsis • Prediction model • Experiment • Conclusion
Experiment settings • Datasets • 2.5M updates for each dataset • spatial: 50K mobile objects from 2 spatial dataset • road: from a spatio-temporal generator (described in [Brinkhoff 2002] ) initial final median Road network Data distribution
Robustness with time Query: qlength = 6% of the data space; 25K queries uniformly distribute along space and time spatial road
Minskew (a static spatial histogram) is rebuilt every 50k location updates tp is the proportion between the cost of AMH and that of Minskew The re-organization operations of AMH are uniformly distributed among the 50k location updates. Comparison with conventional histogram minskew spatial AMH minskew road AMH
B-tree performs better at the high update rate. R-tree provides much faster query response. In general, when query/update ratio is large (>30%), R-tree performs better. The effect of update intensity road spatial Query type b-tree 3D r-tree
Conclusion • We present a comprehensive approach for processing queries that refer to any time in history. • The proposed architecture maintains • an incremental multi-dimensional histogram; • a past index structure for storing the outdated buckets. • Future queries are answered by a stochastic method that uses the recent history to predict the future.
Summary Historical Synopsis AMH 0. goal: min(WVS) 1. Info update 2. Reorganization happens when CPU is idle Prediction Model Old buckets Forecast based on the present and past. Past Index 1.Recent buckets in memory 2.Old buckets dump to the disk
Related work • Static multi-dimensional histograms • Query-adaptive multi-dimensional histograms • Other multi-dimensional approximation methods • Spatio-temporal prediction methods • Spatio-temporal aggregation methods
Evaluation over different query types spatial road
Motivation (cont.) • Spatio-temporal database (STDB) research: • historical retrieval • future prediction
n5 b*3 b*4 Bucket reorganization -Split n1 n1 n3 n2 n3 b* b5 n2 n4 b2 b* b5 b*1 b*2 b1 b2 Buckets Split Buckets b*1 b*2 b*3 b* b1 b* b2 b2 b*4 b5 b5