310 likes | 324 Views
Explore the database shootout and benchmarking of spatial DBMSs at FOSS4G2007, with insights on performance, functionality, and user classifications.
E N D
Database Shootout: Benchmarking spatial DMBSs Wim de Haas Wilko Quak FOSS4G2007
FOSS4G2007 You’re in the eye of the storm! Early morning: Brock Anderson about WMS/PostGIS/Shapefile performance This afternoon: Kevin Neufeld about tips for the PostGIS power user Now: Reflect on the factor 10
FOSS4G2007 Overview • Introduction • What are the problems • A classification of Spatial DBMS users • How can we help them • Benchmark proposal • First test results • Next steps
FOSS4G2007 Introducing the Ministry of Transport, Public Works and Water Management Our core tasks are: • to offer protection against floods • to guarantee safeand reliable connections over land, water and through the air • to ensure clean and sufficient water • Rijkswaterstaat (RWS) is the executive branche of the Ministry of Transport
FOSS4G2007 Business drivers • How to keep track of all these assets? • How to ensure consistency & coherence in operations and change of Rijkswaterstaat? • How to facilitate decisionmaking and communication • Enter the Digitaal Topografisch Bestand • 3D • 1:1000 • EUR 60M
FOSS4G2007 DTB waterway and highway
FOSS4G2007 DTB Birds eye view
FOSS4G2007 DTB Amsterdam Airport
FOSS4G2007 Enter IVRI • The new system for data acquisition and maintenance for the DTB • Oracle 10g • ArcGIS 9.2 • Summit Evolution
FOSS4G2007 First comment on Murphy’s Law • Murphy was an optimist • Oracle and ESRI in a blame game • Took extra time in the project • Triggered to be less dependent on Oracle • Oracle Spatial is not cheap, so can we use PostGIS as the main datastore for Spatial data?
FOSS4G2007 Why bother … Stonebraker2007: • Where to find dramatic differences in Spatial DBMSs? We define “dramatically outperform” to mean at least a factor 10 advantage […then] customers will be inclined to try the new architecture
FOSS4G2007 Where to expect Dramatic differences? • Linux vs Windows. (No) • MySQL vs PostGIS (Yes) • Choice of FileSystem (Maybe) • Functionality Difference (Yes) • Choice of Parameters (Maybe)
FOSS4G2007 Problems with testing • DBMS vendors do not want published results • Oracle explicitly forbids publishing benchmark results • Hardware • Moore’s Law • I/O • Release Frequency of Software • Spatial testing cannot be done on synthetic data • Too many parameters Benchmark results are outdated before they are publised
flat query geometry FOSS4G2007 Benchmark consideration: Weird Cases department diagonal query geometry
FOSS4G2007 Benchmark consideration: Hot vs Cold
FOSS4G2007 Solution • Do not publish the result of the benchmark • Publish a framework that lets people do their own benchmarking • No “One size fits all”: Buyer’s guide • Help different users to find best DBMS
FOSS4G2007 Classification of spatial DMBS users Four classes: • Server Builders: publish spatial data via web server • GIS User: Load various datasets and perform complex analyses • Data Maintainer: Maintain one core dataset • Power Users: All of the above and more
FOSS4G2007 Class 1: Web Server Builders • You do not really need a DBMS for this (You use a fraction of DBMS functionality) • Only one query counts: Find everything within BBOX
FOSS4G2007 Class 2: GIS users • Main interest is functionality • Spend more time on loading data • Need a good query optimiser • Analysis
FOSS4G2007 Class 3: Dataset Maintainers • Limited number of queries • Transactions are an issue • Clustering of data after updates is interesting
FOSS4G2007 Class 4: Power users • Do their own testing • Need a platform to discuss their findings
FOSS4G2007 Benchmark components • Functionality test • Literature review • Factual testing • Very simple performance test script with few parameters • BBOX Query • Fixed Dataset (Propasal OpenStreetMap dataset) • Configurable test suite • Full Suite that tests every corner of DBMS • For specialists only
MySQL Almost all operations in MySQL return the same result as the corresponding MBR-based functions. PostGIS Full OpenGIS support. FOSS4G2007 Test 1 – Functionality:MySQL vs PostGIS Functionality of MySQL is only suited for simple WMS support and no spatial operations are done on geometry
FOSS4G2007 Test 2: simple BBOX select Write simple script that generates a lot of rectangle queries. Paremeter: • DBMS size • query box size • experiment length
FOSS4G2007 Test 2: grow DBMS size • Question: Does query response time depend on DBMS size or on core memory? • Experiment: Run same test on more an more copies of same database.
FOSS4G2007 Test 2 – result: PostGIS vs MySQL
FOSS4G2007 Test 2 – result: Conclusions • As long as dataset fits in core memory differences are small. • MySQL can do more with less memery. • MySQL degrades faster if you run out of memory
FOSS4G2007 Test 3: Comprehensive Test Suite • Create set of killer polygons so that every line of source code will be touched by running operations. • Test Query optimizer • Test Join Operator • Must be done with Skewed Data
FOSS4G2007 Conclusions (overall) • This is a work in progress • The factor 10 is not within reach
FOSS4G2007 How to proceed • Finish the work and publish these • TU Delft wiki or osgeo.org wiki? • Start a Special Interest Group a.k.a. Chapter?
FOSS4G2007 Questions