400 likes | 550 Views
IBM DB2 Data Analytics Accelerator Technology Exploration - User Group Road Show - Detroit Columbus Cincinnati - April 08 09 10. 1. Trans. Latency. Number of Users. Trans. Volume. Trans. Type. Availability. Less Important. Less Important. Few. Few. Small. Complex. C Level
E N D
IBM DB2 Data Analytics Accelerator Technology Exploration - User Group Road Show - Detroit Columbus Cincinnati - April 08 09 10 1
Trans. Latency Number of Users Trans. Volume Trans. Type Availability Less Important Less Important Few Few Small Complex C Level Mgt Analysts (e.g. Mktg, Research) Company Management Customer Service & Support (e.g. call centers, sales personnel) Customers (e.g. external, Web) Many Very Large Simple Critical Critical Traditional Data Warehousing, & Business Analytics Market Segmentation User Community Traditional Distributed Market
Typical Current State of Affairs for Analytics: Execution by Department Executive Management Research & Dev. Marketing Finance Sales Operations Finance Customer Care IT 3
Multiply that Across each Department Development Development Sales Finance Quality Assurance Quality Assurance Production Production Disaster Recovery Disaster Recovery Development Marketing Development R&D Quality Assurance Quality Assurance Production Production Disaster Recovery Disaster Recovery
Scoring Rules Operational Systems Optimized Business Processes Analytical Foresight Sales Effectiveness Customer Support Fraud Management Claims Processing Underwriting Marketing x/p server Data Mining Segmentation Prediction Statistical Analysis Analyze OLTP Bulk x/p/z server Departmantal Data Marts x/p server Data Mover Analytics Server x/p server x/p/z server x/p/z server Transformation Server Enterprise Data Warehouse (RDBMS) ODS (RDBMS) Hourly/daily Batch Process Multi- Dimensional Analysis Staging Area Staging Area Cleanse Transform Warehouse Report MIS System Budgeting Campaign management Financial Analysis Selling Platforms Customer Profit Analysis CRM What this Looks Like in the Bigger Picture Continuous feed Batch Process Online Queries & Reporting BA Tooling
World’s Best Companies Run DB2 for z/OS and System zUndisputed leader in total system availability, scalability, security and reliability
Creating the Hybrid Data Server – PureData System for Analytics (Netezza) and System z Best in OLTP and Transactional Analytics Industry recognized leader in mission critical transaction systems Data Mart Data Mart Data Mart Data Mart Consolidation Best in Deep Analytics Proven appliance leader in high speed analytic systems Transaction Processing Systems (OLTP) Best in Consolidation Unprecedented mixed workload flexibility and virtualization providing the most options for cost effective consolidation Transactional Analytics PureData System for Analytics Recognized leader in cost-effective high speed deep analytics DB2 z/OS: Recognized leader in transactional workloads with security, availability and recoverability Deep Analytics Together: Destroying the myth that transactional and decision support workloads have to be on separate platforms 8
Fast • Complex queries run up to 2000x faster while retaining single record lookup speed • Cost Saving • Eliminate costly query tuning while offloading complex query processing • Appliance • No applications to change, just plug it in, load the data, and gain the value IBM DB2 Analytics Accelerator for z/OSBlending zEnterprise and Netezza technologies A high performance analytics accelerator appliance for IBM zEnterprise, delivering dramatically faster complex business analysis transparently to all users. 9 9
IBM DB2 Analytics Accelerator for z/OS What is it? What is the value? What is the value? • Breakthrough technology enabling new opportunities • Extreme performance for complex analytics (aka Train of Thought Analysis) • Improve price/performance for analytic workloads • Minimize the need to create data marts for performance • Transparent to DB2 applications and users • Highly secure environment for sensitive data analysis What is it? The IBM DB2 Analytics Accelerator is a dedicated workload optimized, appliance add-on to a DB2 for z/OS environment that services long-running, complex queries. z Enterprise z196, z114, EC12, BC12 OLTP and Transactional Analytics IDAA Deep Analytics DB2 for z/OS Versions Supported • IDAA V2 - DB2 for z/OS V9 & V10 • IDAA V3 - DB2 for z/OS V10 & V11 • IDAA V4 (GA 11/29) - DB2 for z/OS V10 & V11 10 10 10
DB2 Analytics Accelerator V3 Features • Lowering the cost of historical data • Better decisions through lower latency of data • Dramatic improvement in scale and growth opportunities • Lowering the cost of analytic computing High Performance Storage Saver • Significantly reduces the cost for storage resources • Option to store data only once in the accelerator Incremental Update • Data changes are propagated to the accelerator as they happen • Uses change data capture technology • Extends the accelerator use to reporting on operational data New optimization • Tables or partitions refresh much faster and less resources intensive • Optimized unloading data from DB2 High Capacity • Capacity has been extended to 1.28 PB for a single Accelerator New functions • More queries eligible for acceleration PureData System for Analytics N2001 11
NEW DB2 Analytics Accelerator V4 Features More Query Acceleration Enhanced Capabilities Improved Transparency Static SQL DB2 Version 11 Support added Multi-row fetch from local applications EBCDIC & Unicode in same DB2 system & accelerator • Incremental Update • Greatly improved scalability • Better performance • Improved performance for large result sets • HPSS • Archive to multiple accelerators • Better access control for archived partitions • Extend WLM support to local applications • Richer system scope monitoring • Report prospective CPU cost & Elapsed time savings • Separation of duties for accelerator system administration operations • Automatic workload balancing over multiple accelerators • New RTS ‘last-changed-at’ timestamp • Automated NZKit installation • HPSS • Built-in restore • Protection for image copies of archived partitions • Profile controlled special registers • Improved continuous operations for Incremental Update E n a b l i n g n e w u s e c a s e s
Deep DB2 Integration within zEnterprise Applications DBA Tools, z/OS Console, ... Application Interfaces (standard SQL dialects) Operational Interfaces (e.g. DB2 Commands) DB2 for z/OS IBM DB2 Analytics Accelerator Log Manager IRLM Buffer Manager Data Manager . . . Superior performance on analytic queries Superior availability reliability, security, Workload management z/OS on System z PureData System for Analytics 13
Large Insurance CompanyAdding value by Accelerating the Delivery of Business Reporting Customer Quote: “we had this up and running in days with queries that ran over 1000 times faster” With Accelerated Time to Value • IBM DB2 Analytics Accelerator (PureData System for Analytics 1000-12) • Production ready - 1 person, 2 days • Table Acceleration Setup in 2 Hours • - DB2 “Add Accelerator” • - Choose a Table for “Acceleration” • - Load the Table (DB2 Loads Data to the Accelerator) • - Knowledge Transfer • - Query Comparisons • Initial Load Performance • 400 GB Loaded in 29 Minutes • 570 Million Rows (Actual: Loaded 800 GB to 1.3 TB per hour) • Extreme Query Acceleration - 1908x faster • 2 Hours 39 minutes to 5 Seconds • CPU Utilization Reduction • 35% to ~0% 14 Load times may vary based other workload running on z/OS
IBM DB2 Analytics Accelerator Product Components PureData System for Analytics Technology zEnterprise CLIENT Data Studio Foundation DB2 Analytics Accelerator Admin Plug-in Network Primary OSA-Express3/4/5 10 GbE 10Gb Backup IBM DB2 Analytics Acelerator Data Warehouse application DB2 for z/OS enabled for IBM DB2 Analytics Accelerator Users/ Applications 15 15 15
User control and DB2 heuristic DB2 for z/OS and IBM DB2 Analytics Accelerator OLTP-like query DB2 Native Processing Light ODS-query Light BI Query Heavy BI Query Optimized processing for BI Workload Query routing analysis • Values for CURRENT QUERY ACCELERATION • Single and unique system for mixed query workloads • Dynamic decision for most efficient execution platform • New special register QUERY ACCELERATION • New heuristic in DB2 optimizer 16 16 16
Routing Criteria The entire query can be accelerated, i.e. the unit of acceleration is a whole query Individual query blocks are no longer units of acceleration The whole query will either run in DB2 or in the accelerator The associated cursor is not defined as a scrollable or a rowset cursor The query is defined as read-only The query is dynamic (V4 will support static) The query is a SELECT statement. The private protocol is not in effect. Routing to Accelerator is considered more efficient for performance than to execute the query in DB2 mainline The decision is based on some heuristic rules A query can be routed to Accelerator if is one of these types: 17 17
Heuristic Routing Criteria – not just based on “elapsed time” … • DB2 Optimizer uses a set of rules to determine whether a given query is better off being executed in DB2 core engine or routed to the accelerator, such as: • In general, typical OLTP access path patternsare not routed to the accelerator • e.g. Equal unique access, One fetch access • If none of these: WHERE, GROUP BY, ORDER BY, aggregate functions is specified (i.e. all rows are to be returned), the query is not routed • Threshold specified by the DB2 Profile (1) mechanism: • If all thetables referred in the query are “small”, the query is not routed • ACCEL_TABLE_THRESHOLD determines total table cardinality for a query • The default value is 1,000,000 • If a “large” resultset is expected, the query is not routed • ACCEL_RESULTSIZE_THRESHOLD (number of rows) determines what is a “large” result set. • The default value is -1, which means that this check is ignored • If estimated total cost for a query is treated as “short running”, the query is not routed • ACCEL_TOTALCOST_THRESHOLD determines estimated total cost for a query • The default value is 5,000 (value is in milliseconds = 5 seconds) • Recommendation: Use default values. Change only after rigorous testing! (1)http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=/com.ibm.db2z10.doc.perf/src/tpc/db2z_profiles.htm
Query Off-load Applicability Accelerator is based on Netezza which supports rich set of SQL and data types BI tools such as Cognos has run on Netezza for years and will run on Accelerator as well Due to very large number of query types and SQL functions not all of them could be processed in V3. Key restrictions include: No static SQL (Lifted in V4) Not all DB2 functions, No Mathematical functions such as SIN, COS, TAN. No advanced string functions such as HEX, POSITION, LOCATE, LEFT, OVERLAY No advanced OLAP functions such as RANK, ROLLUP, CUBE No User Defined Functions No correlated table expressions or recursive correlated table expressions No correlated subquery in the SELECT list Not UTF-16 and MIXED/DBCS EBCDIC No multiple encoding schemes in the same statement (fixed in V4) Not all DB2 special registers: CURRENT PATH, SERVER, SQLID, SCHEMA, APPLICATION ENCODING SCHEME Not all DB2 data types: LOBs, ROWID, XML, DECFLOAT, BINARY None of these restrictions is a design problem, IBM plans to lift them in future releases based on customer feedback and needs 19 19
Query Execution Process Flow DB2 for z/OS DB2 Analytics Accelerator ApplicationInterface Optimizer SPU IDAA DRDA Requestor CPU FPGA Memory SPU CPU FPGA Application SMP Host Memory SPU Query execution run-time for queries that cannot be or should not be off-loaded to IDAA CPU FPGA Memory SPU CPU FPGA Memory Heartbeat (DB2 Analytics Accelerator availability and performance indicators) Queries executed without DB2 Analytics Accelerator Queries executed with DB2 Analytics Accelerator 20 20 20 Queries executed with value of “ALL” may receive a SQL Error Code if the query cannot run on the accelerator
Accelerator Administrative Stored Procedures • ACCEL_ADD_ACCELERATOR • ACCEL_TEST_CONNECTION • ACCEL_REMOVE_ACCELERATOR • ACCEL_UPDATE_CREDENTIALS • ACCEL_ADD_TABLES • ACCEL_ALTER_TABLES • ACCEL_REMOVE_TABLES • ACCEL_GET_TABLES_INFO • ACCEL_GET_TABLES_DETAILS • ACCEL_LOAD_TABLES • ACCEL_SET_TABLES_ACCELERATION • ACCEL_SET_TABLES_REPLICATION • ACCEL_CONTROL_ACCELERATOR • ACCEL_UPDATE_SOFTWARE • ACCEL_ARCHIVE_TABLES • ACCEL_GET_QUERIES • ACCEL_GET_QUERY_DETAILS • ACCEL_GET_QUERY_EXPLAIN * For a list of the description of each stored procedure refer to the IDAA Stored Procedure Reference Manual 21 21
The Key to the Speed select DISTRICT, PRODUCTGRP, sum(NRX) from MTHLY_RX_TERR_DATA where MONTH = '20091201' and MARKET = 509123 and SPECIALTY = 'GASTRO' FPGA Core CPU Core Restrict, Visibility WHERE Complex ∑ Joins, Aggs, etc. GROUP BY Uncompress FROM Project SELECT Slice of table MTHLY_RX_TERR_DATA (compressed) sum(NRX) where MONTH = '20091201' and MARKET = 509123 and SPECIALTY = 'GASTRO' select DISTRICT, PRODUCTGRP, sum(NRX) 22
IBM DB2 Analytics Accelerator Table Definition and Deployment IBM Data Studio Client DB2 for z/OS IDAA PureData Catalog IDAA Studio IDAA Administrative Stored Procedures DB2 Catalog • The tables need to be defined and deployed to IDAA before data is loaded and queries sent to it for processing • Definition: identifying tables for which queries need to be accelerated • Deployment: making tables known to DB2, i.e. storing table metadata in the DB2 and PureData catalog • IDAA Studio guides you through the process of defining and deploying tables, as wall as invoking other administrative tasks • IDAA Stored Procedures implement and execute various administrative operations such as table deployment, load and update, and serve as the primary administrative interface to IDAA from the outside world including IDAA Studio 23
Synchronization Options with IBM DB2 Analytics Accelerator 24 24 24
Accelerator Data Load Accelerator DB2 for z/OS Table A Table B Accelerator Administrative Stored Procedures CPU FPGA Part 1 USS Pipe Unload Memory Table C Accelerator Studio CPU FPGA Coordinator Part 2 Unload USS Pipe Table D Memory Part 1 . . . . . . . . . CPU FPGA Memory Part 2 CPU FPGA Part 3 USS Pipe Unload Part m Memory • 1 TB / h – can vary, depending on CPU resources, table partitioning, … • Update on table partition level, concurrent queries allowed during load • Unload in DB2 internal format, single translation by accelerator
Incremental Update – IDAA V3 for DB2 V10 • Changes in data warehouse tables typically driven by replication or an other update process • Corrections after a bulk-ETL-load of a data warehouse table • Continuously changing data (e.g. trickle-feed updates from a transactional system to the warehouse or ODS) • Reporting and analysis based on most recent data • May be combined with a full/partition table refresh Operational Analytics, Reports, OLAP, … Continuous Query Processing DB2 z/OS Query Optimizer DB2 for z/OS database Accelerator processing DB2 native processing • Incremental update can be configured per table • Initial full table load is required Application Replication Incremental Update Changes 26
Incremental Update Details DB2 Analytics Accelerator (Host Node) DB2 for z/OS Accelerator Database Accelerator Stored Procedures ACCEL_SET_TABLES_REPLICATION ACCEL_CONTROL_ACCELERATOR ... Accelerator SERVER Controller Catalog information JCL Accelerator Studio Automation code (creates data sources, subscriptions, etc.) insert delete Apply Agenton NPS host (Receives log events) update Capture Agent IFI Log Reads (private network) 27
User Interface • Incremental update UI elements only visible if it has been enabled on the DB2 subsystem via IBM DB2 Analytics Accelerator configuration console • Start / stop replication process (per subsystem-accelerator pair) • Enable / disable replication (per table) • Trace collection • Information on replication latency and events 28
Save Over 95% of Host Disk Space for Historical Data 1Q 1Q 1Q 1Q 1Q 1Q 2Q 2Q 2Q 2Q 2Q 2Q 3Q 3Q 3Q 3Q 3Q 3Q 4Q 4Q 4Q 4Q 4Q 4Q Historical Data Year Year -1 Year -2 Year -3 Year -4 Year -5 Year -7 1Q 2Q 3Q Current Data One Quarter = 3.57% of 7 years of data One Month = 1.19% of 7 years of data One month = 2.78% of 3 years of data 4Q 29
High Performance Storage Saver Reducing the cost of high speed storage Time-partitioned tables where: only the recent partitions are used in a transactional context (frequent data changes, short running queries) the entire table is used for analytics (data intensive, complex queries). High Performance Storage Saver’s “Archive” Process: Data is loaded into Accelerator if not already loaded Automatically takes Image Copy of Each Partition to be Archived Automatically Remove data from DB2 archived tablespace partitions DBA starts archived partitions as read-only DB2 Query from Application Part #1 No longer present on DB2 Storage Or Accelerator Part #1 Part #2 Part #3 Part #4 Part #5 Part #6 Part #7 Active Archive 30
High Performance Storage SaverReducing the cost of high speed storage Store historic data on the Accelerator only Applications Tables can be resident on: • DB2 Only • DB2 and Accelerator • Archive to Accelerator SQL When data no longer requires updating, reclaim the DB2 storage DB2 Table A Active DB2 Table A DB2 Table A Managed by zPARMs Controlled by Special Registers: • CURRENT QUERY ACCELERATION • CURRENT GET_ACCEL_ARCHIVE Accelerator Table A Active & Archive Accelerator Table A • Best for OLTP • High Speed Indexed queries • Mixed Workload • Active Only • Archive Only • Active & Archive • Mixed Workload 31
Key Elements of the Implementation Approach • Older partitions are moved to IDAA and their data no longer exists in DB2 • DB2 is still solely responsible for the recovery and maintains all backups (copy images) • The most recent partitions exist in both DB2 and IDAA • They are synchronized by existing means • Partition refresh or incremental update (replication-based propagation of changes) • Having the most recent partition in both, DB2 and IDAA, provides IDAA-driven performance acceleration for analytical queries that access most recent partitions only • The data move process is encapsulated in a stored procedure • The stored procedure can be invoked directly or via IDAA Studio • The SQL statements do not change • The fact that some partitions have been moved to IDAA is transparent • By default, queries access only the data from the most recent partitions • The queries can be executed in DB2 or IDAA based on the standard routing criteria • If all the data need to be accessed, one of the following mechanisms is used: • Setting a zparm which activates the 'all data' scope for the DB2 subsystem/data sharing group. This way, none of the applications need to be changed (but this setting has global impact). • Setting a special register “CURRENT GET_ACCEL_ARCHIVE”, which allows switching between the 'all data' scope and the 'most recent data' scope at any time. This way the application can use both scopes within the same execution at choose scope at SQL statement level.
Work Load Management - Usage scenarios Workload Isolation: Ensure that the workload of one DB2 subsystem doesn’t monopolize the resources of a shared accelerator. A development subsystem, attached to the same accelerator as a production subsystem, should not be able to drain all accelerator resources. Query Prioritization:More important queries should be executed before and faster than less important queries that are sent from the same DB2 subsystem against the accelerator. 33
IBM DB2 Analytics Accelerator Instrumentation 34
Product Documentation Links Information Center for IBM DB2 Analytics Accelerator for z/OS V3.1 http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/topic/com.ibm.datatools.aqt.doc/welcome/topics/idaa_start.html Quick Start Guide http://publibfp.dhe.ibm.com/epubs/pdf/h1269820.pdf Installation Guide http://publibfp.dhe.ibm.com/epubs/pdf/h1269830.pdf Stored Procedure Reference http://publibfp.dhe.ibm.com/epubs/pdf/h1269840.pdf Analytics Accelerator Studio Users Guide http://publibfp.dhe.ibm.com/epubs/pdf/h1269850.pdf Getting Started Guide http://publibfp.dhe.ibm.com/epubs/pdf/h1269860.pdf Program Directory http://publibfp.dhe.ibm.com/epubs/pdf/i1950060.pdf Redbook – Optimizing DB2 Queries with IBM DB2 Analytics Accelerator for z/OS http://www.redbooks.ibm.com/redpieces/abstracts/sg248005.html?Open Redbook – Hybrid Analytics Solution using IBM DB2 Analytics Accelerator for z/OS V3.1 http://www.redbooks.ibm.com/redpieces/abstracts/sg248151.html?Open 39 39
40 40 40 8/20/2014