1 / 34

Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm

Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm.com. 07/09/2014. Agenda. History Current methods Problems with current methods Solution – extend database to support new data types Timeseries NAG Spatial including Geodetic and GRID Conclusion.

aggie
Download Presentation

Extending a Relational Database Intelligent storage of geospatial data John.Pickford@uk.ibm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Extending a Relational Database Intelligent storage of geospatial dataJohn.Pickford@uk.ibm.com 07/09/2014

  2. Agenda • History • Current methods • Problems with current methods • Solution – extend database to support new data types • Timeseries • NAG • Spatial including Geodetic and GRID • Conclusion

  3. IBM - Informix Dynamic Server • It was the Informix-IDS Database • Informix database business bought by IBM in 2001 • IDS and DB2 UDB are similar and are getting closer • Extensibility in Informix came from Stonebraker and Brown • Ingres -> Postgres -> Illustra added to Informix • DB2 UDB extensibility the same in theory, DB2 Extenders equivalent to IDS DataBlades

  4. Current Methods • Data held in a RDBMS • Easy to manage • SQL • Easy to access from applications • C • ESQL/C, CLI • ODBC • JDBC • Easy to administer • Standard tools • Trained staff • Many 3rd party products use RDBMS

  5. Problems with RDBMS • Not good at loading lots of data items • OK with high volume of data • Not so good with lots of small rows, e.g. ticks, readings • High latency (time between data arriving and being visible through SQL queries) • Not good with complex data • A row is just a row, no concept of an ordered set of rows • Restricted set of data types • Not good with “unusual” functions • What is unusual ? • Seasonal averaging • Multivariate analysis • VWAP

  6. SERVER CLIENT CLIENT MEMORY DISK Standard client-server • Supports common data types • Character • Numeric • Integer • Float • Decimal • Datetime/interval • Large object • Binary • Text • Collections • List, set, multiset

  7. Database can be extended – new data types • Distinct • e.g. lengths, mass etc • Row • e.g. address = houseNumber + road + town + postCode • Opaque • User defines how the data is stored • In row • Binary large object • External (file or some external process) • Once created it is like an additional built-in type

  8. Opaque data type implementation(1) • Opaque types are implemented by a series of functions • Cast functions • lvarchar -> myType • myType -> lvarchar • Comparison function • B-tree index support • R-tree index support • Mathematical operators • + - x /

  9. Opaque data type implementation(2) • Functions implemented in • C • Java • SPL • Made known to the server by an SQL statement • Functions usually execute as part of the main database server process • Types handled correctly by SQL layer • Optimiser • Indexing • Mathematical operations • Aggregates

  10. Opaque data type implementation(3) • Simple example, complex numbers • Stored as 2 double precision numbers • Function to convert the string “12.34 + 56.78i” to a C structure containing 2 floats • Function to convert a C structure with 2 floats into the string “12.34 + 56.78i” • Functions to add, subtract, multiply and divide complex numbers • complex_t *Plus(complex_t, complex_t) • Can now be used in SQL statements: • create table (u complex_t); • insert into table values (“1 + 2i”); • select u * u from tab; • “-3 + 4i” • Comparisons OK, what about ordering ?

  11. Virtual Table/Index Interface • Tables can be replaced by functions (VTI) • Insert function • Select function • First, next, previous, nth, last • Delete function • Can create own indexing methods • Specialist area • Example of VTI – can make Google look like a database table: select * from google where search = “IBM+IDS+TimeSeries”;

  12. Problems storing TimeSeries data in RDBMS • “Tall-thin” tables, primary key roughly same size as data • Each row selected can cause a disk read • Extra indexing often added for index only reads • Solutions include • storing more than one element per row - makes management difficult • Using a separate timeseries database – expensive, difficult to manage

  13. Timeseries – the problem • Ordinary RDBMs store the values in rows, these rows are stored randomly in tables even though access is usually in time order. You need to create an index to store the key value for each row in time order. This takes more space, data access can be slow with over complicated SQL statements.

  14. Timeseries – the solution • Store elements as vectors in time order

  15. CLIENT CLIENT MEMORY DISK SERVER ORDBMS - Extended to Support TimeSeries • Supports common data types • Supports extended data types and functionality • TimeSeries TimeSeries

  16. Performance (time/space) Write SQL functions that work on TimeSeries Advantages of TimeSeries

  17. TimeSeries • Create a type to hold the data: create row type rt_t ( ts datetime year to fraction(5), minTemp temperature, maxTemp temperature); • Create the table: create table tab1 ( sensorId integer, temps timeseries(rt_t)); • Use the data: select temps, getLastElem(temps), getnelems(temps), clip(temps, t1, t2), func(temps, t1, t2) from tab1 where sensorId = 99;

  18. Timeseries functions • Built in SQL functions • Used in SQL statements and SPL functions • Server and Client C API • Common way of writing TimeSeries functions • Server and Client Java Class Library

  19. CLIENT MEMORY DISK SERVER CLIENT ORDBMS - Extended to Support Complex Analysis TimeSeries • Supports common data types • Supports extended data types and functionality • TimeSeries • NAG types and functions • Moves processing closer to the data • Data reduction at earliest opportunity • Functionality common to very different clients NAG

  20. Vector and Matrix data types NAG Functions Roots of equations Curve and surface fitting Matrix factorisation Eigenvalues and Eigenvectors Linear algebra support Simple correlation on statistical data Random number generation Linear statistical modeling Smoothing Time series analysis Sorting Optimisation PDE Black-Scholes NAG DataBlade

  21. Timeseries data can be analysed using NAG functions in the server (1) • Timeseries and NAG functions can be combined: select tstovec(clip(temp, t1, t2), “maxTemp”) from tab1 where sensorid = 99; select f001(tstovec(clip(temp, t1, t2), “maxTemp”)) from tab1 where sensorid = 99; • f001(timeseries, time, time) could be a stored procedure that calls the NAG function G13AAF() that carries out seasonal differencing of a timeseries.

  22. Timeseries data can be analysed using NAG functions in the server (2) • The function f001() could be a stored procedure calling NAG functions: create function f001(x vecDblType) returning vecDblType; define ifail, integer; define xd vecDblType;   -- The output vector xd needs to be the same size -- as the input vector let xd = copy(x);   -- Call g13aaf() let ifail = udrg13aaf(x, 2, 1, 4, xd, ‘0’);   return xd; end function;

  23. CLIENT SERVER CLIENT Data Feed RTL CLIENT MEMORY RTL MEMORY DISK ORDBMS - Extended to Support RealTime Data TimeSeries • Supports common data types • Supports extended data types and functionality • TimeSeries • NAG • RTL • Ticks made available to client programs in < .01s • High load rates, up to 100,000 s-1 NAG RealTime Loader

  24. RealTime Loader Memory ORDBMS Disk RealTime Loader • Ticks in the RealTime Loader memory appear as part of the ORDBS time series • SQL access to the RTL data

  25. CLIENT CLIENT SERVER Data Feed RTL CLIENT MEMORY RTL MEMORY DISK ORDBMS - Extended with Customer Functionality TimeSeries • Supports common data types • Supports extended data types and functionality • TimeSeries • NAG • RTL • Customer types and functionality NAG RealTime Loader Custom

  26. Spatial extensions • Spatial DataBlade – Implements the Open GIS Consortium types • Spatial datatypes • Point • X, Y, Z, M • Line, Polygon • Set of points • MultiPoint, MultiLineString, MultiPolygon • Spatial functions • Area, BoundingBox, Contains, Crosses, Distance, Overlap, Union, Within • R-Tree index • Geodetic DataBlade • Uses longitude, latitude, height, time, measure • Grid DataBlade

  27. Spatial query • Spatial data types can be added to the table: create table tab1 ( sensorId integer, location ST_Point, temps timeseries(rt_1)); • and used in queries: select sensorId, location, temps, getLastElem(temps), getnelems(temps), clip(temps, t1, t2), func(temps, t1, t2) from tab1, counties where ST_Within(counties.border, location) and counties.name = “Cheshire”;

  28. GRID Datablade (1) • Written, supplied and supported by an IBM partner, Barrodale Computing Services Ltd, Victoria, Canada • Stores data in “Smart Binary Large Objects” • Provides functions to load data from common data formats • Provides functions to extra data from the grids • Can store: • 1D: timeseries, vectors • 2D: raster images, arrays • 3D: spatial volumes, images at different times • 4D: volumes at different times • 5D: 4D grids with a set of variables at each 4D point

  29. GRID Datablade (2) • Operations: • Interpolation • Affine transformation • Projection • Windowing • Update • Storage formats: • CDF • GRIB • HDF • NetCDF • SDTS

  30. BCS GRID Slicer Demo

  31. Other DataBlades • Node • Stores tree structures • Ancestor, parent, sibling, descendants • Period • Uses R-Tree index • Same problem as spatial search • N-dimensional index • Just the R-tree index (up to 5 dimensions) • GMP extended precision floats and integers • Text • Image

  32. DataBlade Tools • BladeSmith • Generates templates for the functions • Maintains SQL to manage types/functions • Blade Packager • Blade Manager

  33. References • “Object-relational DBMs: Tracking the Next Great Wave”, Michael Stonebraker, Paul Brown, Dorothy Moore • ISBN 1-55-860452-9 • “Informix Dynamic Server.2000 – Server-Side Programming in C”, Jacques Roy • ISBN 0-13-013709-X • “Storing and Manipulating Gridded Data in Databases” • http://www.barrodale.com/grid_Demo/gridInfo.pdf • IBM website • http://www-306.ibm.com/software/data/informix/blades/spatial/

  34. Conclusion • ORDBMS extensibility • Better performance • Faster • Less space • More functionality • Different packages can be combined into complex systems • Already supports useful types and functions: • Geodetic • GRID (from BCS) • Image • NAG • Spatial • Text • Timeseries • User additions • Easy to do

More Related