260 likes | 530 Views
The Analytic DBMS Market(s) New opportunities with new technology by Curt A. Monash, Ph.D. President, Monash Research Editor, DBMS2 contact @ monash.com http://www.monash.com http://www.DBMS2.com. Curt Monash. Analyst since 1981 Covered DBMS since the pre-relational days
E N D
The Analytic DBMS Market(s) New opportunities with new technology by Curt A. Monash, Ph.D. President, Monash Research Editor, DBMS2 contact @monash.com http://www.monash.com http://www.DBMS2.com
Curt Monash Analyst since 1981 Covered DBMS since the pre-relational days Also analytics, search, etc. Own firm since 1987 Publicly available research Blogs, including DBMS2 (www.DBMS2.com -- the source for most of this talk) Feed at www.monash.com/blogs.html White papers and more at www.monash.com User and vendor consulting
Our agenda • Why there are specialty analytic DBMS • It’s not just the analytic area • Hardware issues • Tips for choosing among them • Segments and priorities • The selection process
Database diversity • High-end e-commerce • 100-terabyte analytics • High-volume call center • Media-heavy web startup • Simple departmental application • (and many more)
11 kinds of data management software • High-end OLTP/general-purpose DBMS • Mid-range OLTP/general-purpose DBMS • Row-based analytic RDBMS • Column- or array-based analytic RDBMS • Text search engines • XML and OO DBMS (but these may merge with search) • RDF and other graphical DBMS (but these may merge with relational) • Event/stream processing engines (aka CEP) • Embedded DBMS for devices • Sub-DBMS file managers (e.g. SimpleDB, some MySQL uses) • Science DBMS
Why are there specialized analytic DBMS? • General-purpose database managers are optimized for updating short rows … • … not for analytic query performance • 10-100X price/performance differences are not uncommon At issue is the interplay between storage, processors, and RAM
Moore’s Law, Kryder’s Law, and a huge exception Growth factors: • Transistors/chip: >100,000 since 1971 • Disk density: >100,000,000 since 1956 • Disk speed: 12.5 since 1956 The disk speed barrier dominates everything!
The “1,000,000:1” disk-speed barrier • RAM access times ~5-7.5 nanoseconds • CPU clock speed <1 nanosecond • Interprocessor communication can be ~1,000X slower than on-chip • Disk seek times ~2.5-3 milliseconds • Limit = ½ rotation • i.e., 1/30,000 minutes • i.e., 1/500 seconds = 2 ms Tiering brings it closer to ~1,000:1 in practice, but even so the difference is VERY BIG
Hardware strategies to optimize analytic I/O • Lots of RAM • Parallel disk access!!! • Lots of networking Tuned MPP (Massively Parallel Processing) is the key
Software strategies to optimize analytic I/O • Minimize data returned • Classic query optimization • Minimize index accesses • Page size • Precalculate results • Materialized views • OLAP cubes • Return data sequentially • Store data in columns • Stash data in RAM
16 contenders • Aster Data • Dataupia • Exasol • Greenplum • HP Neoview • IBM DB2 BCUs • Infobright • Kickfire • Kognitio • Microsoft Madison • Netezza • Oracle Exadata • ParAccel • Sybase IQ • Teradata • Vertica
Varied approaches • 3 are trying to meld OLTP and analytic processing • 2 have very specialized hardware • 1 is purely RAM-centric • Several use Infiniband; several stress gigE switches • 6 are columnar • 2 stress cloud/DaaS
Segmentation made simple • One database to rule them all • One analytic database to rule them all • Frontline analytic database • Very, very big analytic database • Big analytic database handled very cost-effectively
7 more precise segmentation issues • What is your tolerance for specialized hardware? • What is your tolerance for set-up effort? • What is your tolerance for ongoing administrative burden? • What are your insert and update requirements? • At what volumes will you run fairly simple queries? • What are your complex queries like? and, most important, • Are you madly in love with your current DBMS?
Custom or unusual chips (rare) Custom or unusual interconnects Fixed configurations of common parts Specialized hardware
Hardware acquisition and installation Database and index design Data cleaning and integration Porting of existing applications Set-up effort
Part of the set-up effort also translates to an ongoing administrative burden Indexes, materialized views, cubes, etc. … … unless the DBMS architecture minimizes their use Ongoing administration
Finally we get to the performance criteria Batch load ELT (or ETLT) vs. pure ETL Mini-batches or trickle feeds True transactional updates Inserts and updates
Major use cases Traditional BI Customer-facing apps Product maturity is often key Concurrent queries
This is where the glamour is MPP to speed up I/O Clever answers to the data redistribution problem Table scans vs. random access Columns vs. rows Aggressive use of RAM Compression (saving on disk cost isn’t the point) … and fast analytics even beyond the queries Complex queries
The analytic DBMS selection process • Figure out what you’re trying to buy • Make a short list • Do free POCs • Evaluate and decide
Figure out what you’re trying to buy • Inventory your use cases • Current • Known future • Wish-list/dream-list future • Set constraints • People and platforms • Money • Establish target SLAs • Must-haves • Nice-to-haves
Short list basics • You might as well consider the incumbent(s) • Cash cost is an easy filter to apply • What is the crux of the deployment effort? • References can be scarce
Free POCs are a great invention • Most of the effort is in the set-up • The better you match your use cases, the more reliable the POC is • You might as well do POCs for several vendors – at (almost) the same time! • Where is the POC being held? Can you plan this yourself, or do you need outside help?
Evaluate and decide It all comes down to • Cost • Speed • Risk and in some cases • Time to value • Upside
Further information Curt A. Monash, Ph.D. President, Monash Research Editor, DBMS2 contact @monash.com http://www.monash.com http://www.DBMS2.com