190 likes | 324 Views
Optimal insert methods of geographical information to Spatio-temporal DB. June 17,2012. Industrial Project 234313. Final Presentation. Students: Michael Tsalenko & Belal Kabat Supervisor: Flora Gilboa. Introduction.
E N D
Optimal insert methods of geographical information to Spatio-temporal DB June 17,2012 Industrial Project 234313 Final Presentation Students: Michael Tsalenko & Belal Kabat Supervisor: Flora Gilboa
Introduction • Research ways of inserting and querying geographical information related to video represented as polygons or circles into geographic DB.
Goals • Finding the optimal way of inserting Geo-data by determining the best representation of the captured polygon. • What representation is the optimal when querying the Data Base. • Determine whether amount of vertices of the polygon affect the querying efficiency.
Methodology: Main Steps • Data creation • Reading data into memory (from file system). • Measuring insertion times. • Measuring querying times. • Adding tables with index using R-tree. • Parallelizing insertion and querying. • Statistical examination of the results.
Methodology-Data creation (1): We chose 10 different locations, and randomly created 10,000 quad polygons around each location. Our data set consists of 100,000 polygons in total.
Methodology-Data creation (2): Snapshot of the random data A closer look at Haifa
Methodology-Data creation (3): • Intuitively circle data manipulation should be more efficient than quad data as it has less fields (3 vs. 8). • So we bounded each quad of the 100,000 with a circle and later improved it to a minimal bounding circle (to compare between different data representations of same locations). (32.75762277268422, 34.97107551062562), (32.660576514078066, 34.97748364184535), (32.828346159124294, 35.086360632091974), (32.91352284093535, 35.023992366269674) ((32.78704967750671, 35.028718071358796), (0.13898966103824473))
Methodology- Reading into memory • In order to improve the insertion time we have created a pool which included all the data, so when we insert the data to the data-base we read immediately from the pool and not from the files, to make as less operations as possible after the connection to the server was made.
Methodology- Measuring insertion time. • We have created 2 tables, one for quad polygons and one for circles. For instance, the circles table: id Asset name Area Circle#1 1 ((32.7870496671, 35.02871358796), (0.138989664473)) 2 Circle#2 ((29.927900734077, 31.2508816016), (0.27799441977)) 3 Circle#3 ((33.87026129199, 35.5400182072), (0.2372748702324))
Methodology- Measuring insertion and querying time. • We started to insert 100,000 objects to each table, but after few hours of running we stopped the process (single thread), and reduced the data to 10,000 objects, the difference between the times of both insertion and querying was not significant, so we decided to check with indexed tables* for both circles and polygons. • (and started to plan parallelizing the research due to the 100,000 objects insertion inefficiency)
Methodology- Measuring insertion time. • Indexed tables: Are tables which are built using R-tree Data structure, which is a tree data structure used for spatial access methods, similar to the B-tree, R-tree is also a balanced search tree, so all leaf nodes at the same height.
Methodology- Measuring insertion time. • The difference between the results of the querying and insertion to indexed and non-indexed tables were not significant: • (*) Hybrid circle in polygon means the query was to select a circle intersections in polygons table, and Hybrid polygon in circle means the opposite.
Parallelizing inserting. • We created 10 threads, each thread inserted 10,000 objects to each one of the 2 indexed tables (quads, circles). • This parallelizing enabled us to insert 100,000 objects in a reasonable time (~15mins instead of more than 2 hours when single thread). We stopped inserting into the non-indexed tables since we saw that indexed tables give better results. We can see that the circles insertion is better than quads insertion by ~13%.
Parallelizing querying. • We created 10 threads, each thread inserted 10,000 object to each one of the 4 tables (quads, circles) • The difference between the results of the tables were not significant.
Methodology- vertices amount • In order to check if the number of vertices in a polygon makes a difference, we have converted each polygon to an octagon in the following way: We added a vertex in the middle of each edge of original quadrate and added a factor of 0.001, which will prevent DB vendor optimizations
Methodology- vertices amount • Insertion results: We can see that the insertion to octagon is worse than to the quad polygons by ~7% and to the cycles by ~18%
Methodology- vertices amount • Queries results: The results show that there is no significant difference between the querying to the tables. But it may be expected that on average intersecting a circle with quad polygons geo-data yields the best results.
Achievements & Conclusions: • We helped the IBM researchers to decide that the best way to represent the geo-data (for both insertion and querying) is just to keep its original representation (no mistake areas) – as long as it is up to 8 vertices (or a circle), and if it has more than 8 vertices – further research is needed. • There is no need to look for other heuristics for data manipulation (bounding or bounded circles, dividing into several geo-objects, etc…).
Achievements & Conclusions: • We learned about: • The importance of making a good work plan – founding the basics, ensure all works and building on them the needed extensions. • How the indexed tables work • Using JDBC with geodetic extension • Statistical tools and analyzing results • It’s very helpful to have a supervisor who is an expert in the project field and can direct us properly and save lots of time.