1 / 40

How to Select an Analytic DBMS DRAFT!! by Curt A. Monash, Ph.D. President, Monash Research

How to Select an Analytic DBMS DRAFT!! by Curt A. Monash, Ph.D. President, Monash Research Editor, DBMS2 contact @ monash.com http://www.monash.com http://www.DBMS2.com. Curt Monash. Analyst since 1981, own firm since 1987 Covered DBMS since the pre-relational days

vida
Download Presentation

How to Select an Analytic DBMS DRAFT!! by Curt A. Monash, Ph.D. President, Monash Research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. How to Select an Analytic DBMS DRAFT!! by Curt A. Monash, Ph.D. President, Monash Research Editor, DBMS2 contact @monash.com http://www.monash.com http://www.DBMS2.com

  2. Curt Monash Analyst since 1981, own firm since 1987 Covered DBMS since the pre-relational days Also analytics, search, etc. Publicly available research Blogs, including DBMS2 (www.DBMS2.com -- the source for most of this talk) Feed at www.monash.com/blogs.html White papers and more at www.monash.com User and vendor consulting

  3. Our agenda • Why are there such things as specialized analytic DBMS? • What are the major analytic DBMS product alternatives? • What are the most relevant differentiations among analytic DBMS users? • What’s the best process for selecting an analytic DBMS?

  4. Why are there specialized analytic DBMS? • General-purpose database managers are optimized for updating short rows … • … not for analytic query performance • 10-100X price/performance differencesare not uncommon At issue is the interplay between storage, processors, and RAM

  5. Moore’s Law, Kryder’s Law, and a huge exception Growth factors: • Transistors/chip: >100,000 since 1971 • Disk density: >100,000,000 since 1956 • Disk speed: 12.5 since 1956 The disk speed barrier dominates everything! DRAFT!! THIRD TEST!!

  6. The “1,000,000:1” disk-speed barrier • RAM access times ~5-7.5 nanoseconds • CPU clock speed <1 nanosecond • Interprocessor communication can be ~1,000X slower than on-chip • Disk seek times ~2.5-3 milliseconds • Limit = ½ rotation • i.e., 1/30,000 minutes • i.e., 1/500 seconds = 2 ms Tiering brings it closer to ~1,000:1 in practice, but even so the difference is VERY BIG

  7. Software strategies to optimize analytic I/O • Minimize data returned • Classic query optimization • Minimize index accesses • Page size • Precalculate results • Materialized views • OLAP cubes • Return data sequentially • Store data in columns • Stash data in RAM

  8. Hardware strategies to optimize analytic I/O • Lots of RAM • Parallel disk access!!! • Lots of networking Tuned MPP (Massively Parallel Processing) is the key

  9. Custom or unusual chips (rare) Custom or unusual interconnects Fixed configurations of common parts Appliances or recommended configurations And there’s also SaaS Specialty hardware strategies

  10. 18 contenders (and there are more) • Aster Data • Dataupia • Exasol • Greenplum • HP Neoview • IBM DB2 BCUs • Infobright/MySQL • Kickfire/MySQL • Kognitio • Microsoft Madison • Netezza • Oracle Exadata • Oracle w/o Exadata • ParAccel • SQL Server w/o Madison • Sybase IQ • Teradata • Vertica

  11. General areas of feature differentiation • Query performance • Update/load performance • Compatibilities • Advanced analytics • Alternate datatypes • Manageability and availability • Encryption and security

  12. Major analytic DBMS product groupings Architecture is a hot subject • Traditional OLTP • Row-based MPP • Columnar • (Not covered tonight) MOLAP/array-based

  13. Traditional OLTP examples • Oracle (especially pre-Exadata) • IBM DB2 (especially mainframe) • Microsoft SQL Server (pre-Madison)

  14. Analytic optimizations for OLTP DBMS • Two major kinds of precalculation • Star indexes • Materialized views • Other specialized indexes • Query optimization tools • OLAP extensions • SQL 2003 • Other embedded analytics

  15. Drawbacks • Complexity and people cost • Hardware cost • Software cost • Absolute performance

  16. Legitimate use scenarios • When TCO isn’t an issue • Undemanding performance (and therefore administration too) • When specialized features matter • OLTP-like • Integrated MOLAP • Edge-case analytics • Rigid enterprise standards • Small enterprise/true single-instance

  17. Row-based MPP examples • Teradata • DB2 (open systems version) • Netezza • Oracle Exadata (sort of) • DATAllegro/Microsoft Madison • Greenplum • Aster Data • Kognitio • HP Neoview

  18. Typical design choices in row-based MPP • “Random” (hashed or round-robin) data distribution among nodes • Large block sizes • Suitable for scans rather than random accesses • Limited indexing alternatives • Or little optimization for using the full boat • Carefully balanced hardware • High-end networking

  19. Tradeoffs among row MPP alternatives • Enterprise standards • Vendor size • Hardware lock-in • Total system price • Features

  20. Columnar DBMS examples • Sybase IQ • SAND • Vertica • ParAccel • InfoBright • Kickfire • Exasol • MonetDB • SAP BI Accelerator (sort of)

  21. Columnar pros and cons • Bulk retrieval is faster • Pinpoint I/O is slower • Compression is easier • Memory-centric processing is easier

  22. Segmentation – a first cut • One database to rule them all • One analytic database to rule them all • Frontline analytic database • Very, very big analytic database • Big analytic database handled very cost-effectively

  23. Basics of systematic segmentation • Use cases • Metrics • Platform preferences

  24. Use cases – a first cut • Light reporting • Diverse EDW • Big Data • Operational analytics

  25. Metrics – a first cut • Total user data • Below 1-2 TB, references abound • 10 TB is another major breakpoint • Total concurrent users • 5, 15, 50, or 500? • Data freshness • Hours • Minutes • Seconds

  26. Basic platform issues • Enterprise standards • Appliance-friendliness • Need for MPP? • (SaaS)

  27. The selection process in a nutshell • Figure out what you’re trying to buy • Make a shortlist • Do free POCs* • Evaluate and decide *The only part that’s even slightly specific to the analytic DBMS category

  28. Figure out what you’re trying to buy • Inventory your use cases • Current • Known future • Wish-list/dream-list future • Set constraints • People and platforms • Money • Establish target SLAs • Must-haves • Nice-to-haves

  29. Use-case checklist -- generalities • Database growth • As time goes by … • More detail • New data sources • Users (human) • Users/usage (automated) • Freshness (data and query results)

  30. Use-case checklist – traditional BI • Reports • Today • Future • Dashboards and alerts • Today • Future • Latency • Ad-hoc • Users • Now that we have great response time …

  31. Use-case checklist – data mining • How much do you think it would improve results to • Run more models? • Model on more data? • Add more variables? • Increase model complexity? • Which of those can the DBMS help with anyway? • What about scoring? • Real-time • Other latency issues

  32. SLA realism • What kind of turnaround truly matters? • Customer or customer-facing users • Executive users • Analyst users • How bad is downtime? • Customer or customer-facing users • Executive users • Analyst users

  33. Short list constraints • Cash cost • But purchases are heavily negotiated • Deployment effort • Appliances can be good • Platform politics • Appliances can be bad • You might as well consider incumbent(s)

  34. Filling out the shortlist • Who matches your requirements in theory? • What kinds of evidence do you require? • References? • How many? • How relevant? • A careful POC? • Analyst recommendations? • General “buzz”?

  35. A checklist for shortlists • What is your tolerance for specialized hardware? • What is your tolerance for set-up effort? • What is your tolerance for ongoing administrative burden? • What are your insert and update requirements? • At what volumes will you run fairly simple queries? • What are your complex queries like? and, most important, • Are you madly in love with your current DBMS?

  36. Proof-of-Concept basics • The better you match your use cases, the more reliable the POC is • Most of the effort is in the set-up • You might as well do POCs for several vendors – at (almost) the same time! • Where is the POC being held?

  37. The three big POC challenges • Getting data • Real? • Politics • Privacy • Synthetic? • Hybrid? • Picking queries • And more? • Realistic simulation(s) • Workload • Platform • Talent

  38. POC tips • Don’t underestimate requirements • Don’t overestimate requirements • Get SOME data ASAP • Don’t leave the vendor in control • Test what you’ll be buying • Use the baseball bat

  39. Evaluate and decide It all comes down to • Cost • Speed • Risk and in some cases • Time to value • Upside

  40. Further information Curt A. Monash, Ph.D. President, Monash Research Editor, DBMS2 contact @monash.com http://www.monash.com http://www.DBMS2.com

More Related