1 / 76

SAP Sybase IQ 15 Technical Overview

SAP Sybase IQ 15 Technical Overview. Courtney Claussen ANALYTICS Product Management Courtney.claussen@sAP.com APRIL, 2012. AGENDA. Product Success Customer Success Market Drivers Product Overview Architecture Details Ecosystem Summary. PRODUCT SUCCESS. SAP Sybase IQ.

keanu
Download Presentation

SAP Sybase IQ 15 Technical Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SAP Sybase IQ 15 Technical Overview Courtney Claussen ANALYTICS Product Management Courtney.claussen@sAP.com APRIL, 2012

  2. AGENDA • Product Success • Customer Success • Market Drivers • Product Overview • Architecture Details • Ecosystem • Summary

  3. PRODUCT SUCCESS

  4. SAP Sybase IQ Mature, industrial strength analytic DBMS LEADERSHIP ADOPTION MOMENTUM • Industry leading performance & scale benchmarks • Recognized EDW market leader by Gartner, Forrester • Pioneering technology with 10+ patents • 4500+ installations in 2150+ accounts • ~200 new customer wins per year (last 4 years) • Consistently 96%+ customer satisfaction rates • 2 x DW market growth rate (last 4 years) • Fast paced product releases • v15, v15.1 (2009), v15.2 (2010), v15.3, v15.4 (2011) Ericsson • Sungard • nielsen • BNP Paribas • Telefonica • hmv.com • comScore • Agricultural Bank of China

  5. SAP Sybase IQ Widespread success Stands out as the leading enterprise data warehouse amongst the largest banks, insurance agencies, and telecom operators worldwide Manage and analyze statistical measures for the entire nation of Canada Analyze ALL Federal tax returns in the US Analyze complex models in more than 200 financial institutions worldwide Store and Analyze massive amounts of industry segment data in 30 of the largest information providers in the world, including Transunion, Nielsen and Axiom

  6. Sybase positioned in Leaders quadrant for Data warehouse DBMS Gartner 2012 Data Warehouse Database Management Systems Magic Quadrant This Magic Quadrant graphic was published by Gartner, Inc. as part of a larger research note and should be evaluated in the context of theentire report. The Gartner report is available upon request fromSybase. Source: Gartner Published: 6 February 2012

  7. Customer Success

  8. SAP Sybase IQ AND INFORMATION PROVIDERS Aggregating data and delivering timely answers to complex queries Timely, accurate and detailed information provided by Nielsen Media Research governs the $65 billion TV advertising industry in USA. Data is its lifeblood. Most important to Nielsen Media Research's clients is SAP Sybase IQ's ability to deliver answers to even the most complex queries 10 to 100 times faster than traditional data warehouse technologies. This enables them to sort through enormous volumes of data (the warehouse currently contains 20 terabytes of data and is growing rapidly) to get the intelligence they need to make the most effective and profitable business decisions. ”I would venture to say SAP Sybase IQ is the fastest in the industry.” —Craig Silver, senior database architect of the data technology group at Nielsen

  9. SAP Sybase IQ IN TELECOMMUNICATIONS Reliable operational data store for demanding operators Reducing the complexity of the legacy systems in one of Europe’slargest Telco operator — integrate 11 disparate and independent operating systems and handle rapid increase in data volume totaling more than70 TB and 15000 queries per day from more than 1000 active users. “The model saved between four and six times storage volume required compared to others in the market, with attendant reduction in hardware, support, administration”. —Pedro Romera, Systems Engineering Manager, Telefonica, Spain

  10. SAP Sybase IQ IN BANKING Powering robust enterprise wide analytics ICICI Bank — India's largest private sector bank with presence in 19 countries faced dramatic growth in its users, amount of data, and source stations. The increasing cost of scaling, maintenance, and mounting system unavailability posed difficulties for the bank. To resolve these recurring problems, the bank undertook a migration of the enterprise data warehouse from Teradata to SAP Sybase IQ. The success of this project provides the bank with an always available system, visibly increased query performance, and lower TCO among a host of many other benefits. “Our business requirements were addressed well with minimum infrastructure. SAP Sybase IQ is an excellent product.” —PravirVohra, Group Chief Technology Officer, ICICI Bank

  11. SAP Sybase IQ IN RETAIL Improving per customer revenue via multi-channel analysis HMV Japan K.K., supports entertainment life of everyone interested in music or videos by supplying a plethora of products, including DVDs, Blue-ray discs, books, and games via stores or on-line services. HMV set out to build an integrated analytics repository of all data from stores, eCommerce sites, recommendation engines, and marketing databases to increase its per customer revenue from a base of 2.8 million customers. The SAP Sybase IQ data warehouse contains more than 60 million itemized transaction data, and parts of which are regularly loaded into SAP Sybase IQ as necessary. Queries are handled by SAP Sybase IQ in about 30 seconds that took more than a day in the previous system. “We were able to reduce the time it took to convert click stream data collected by the new system and load it into SAP Sybase IQ by a huge margin. The processing that used to take over 20 hours is now done in about four hours. Overall performance has increased by about 40%”. —Hideki Ichikawa, information systems development manager in HMV Japan’s IT department

  12. SAP Sybase IQ IN GOVERNMENT Analyzing insurance claims data to predict and manage risk The São Paulo State Treasury Department (SEFAZ) manages financial activities across the State of São Paulo, Brazil, including taxation administration. SEFAZ needed to increase tax collection and promote fiscal justice by empowering inspectors with accurate and quick information, helping them to proactively prevent tax evasion. SAP Sybase IQ provides a data warehouse for economic, fiscal and collection information on more than 800,000 companies as well as information on 13 million vehicles. “Before the new technology was deployed, a report used to take two days to be completed. Today, this same report is available in seconds, via the Web, to all authorized users. The success of our project is directly linked to the selection of SAP Sybase IQ. This quick and powerful database has made all the difference.” —Edson Gustavo de Souza, Data Warehouse Manager and Inspector, SEFAZ

  13. Market Drivers

  14. BIG DATA ANALYTICS ISSUES Dealing with volume, variety, velocity, costs, skills Volume • Managing and harnessing terabytes of data Skills Variety Velocity Costs • Lack of adequate skills for non-standard platforms and APIs • Harmonizing silos of structured and unstructured data • Keeping up with unpredictable data and query flows • Too expensive to acquire, operate, and expand BIG DATA ANALYTICS

  15. BIG DATA ANALYTICS MATURITY From jargon to transformational business value* New Strategies & Business Models DATA Business Value* Operational Efficiencies Revenue Growth *A McKinsey study titled “Big Data: Next frontier for innovation, competition, and productivity”, May 2011, has found huge potential for Big Data Analytics with metrics as impressive as 60% improvements in Retail operating margins, 8% reduction in (US) national healthcare expenditures, and $150M savings in operational efficiencies in European economies

  16. BIG DATA ANALYTICS IN THE REAL WORLD Prevalent in data intensive verticals and functional areas BIG DATA ANALYTICS Verticals Functional • Banking • Telcom • Global Capital Markets • Retail • Government • Healthcare • Information Providers • Marketing Analytics • Digital channels • Track visits, discover best channel mix: email, social media, search • Sales Analytics • Deep correlations • Predict risks based on deal DNA (emails, meetings) pattern match • Operational Analytics • Atomic machine data • Analyze RFIDs, weblogs, SMS, sensors — continuous operational inefficiency • Financial Analytics • Detailed simulations • Liquidity, portfolio simulations — Stress tests, error margins

  17. BIG DATA Analytics + DATA WAREHOUSING Deserves a new platform Data Loading OLAP Mobile Web Operational Reporting Workflow Integrated Data Mining Specialized Apps • Volume • Velocity • Variety • Costs • Skills • Platform accessible to all business processes and all business users • Requires data and algorithms together in the platform • Distribute interactions throughout the enterprise Support massive numbers of users and workloads • MapReduce • RDBMS • EDW In-DB Analytics Analyze massive volumesof complex data from many sources HDFS Web 2.0

  18. Product Overview

  19. What is SAP Sybase IQ? High performance analytics platform Standard Language: ANSI SQL Grid Based Column Store Fast Complex Queries High Performance Data Loads Storage, User Scalability Non-relational Data Support Standard Connectivity: ODBC, JDBC, OLE-DB, PERL, PYTHON, PHP Schema Independent: 3NF, Star, Flat Inside: Architected ground up for High Performance Analytics Platform Agnostic: Linux, Unix, Windows Outside: Standards based open interfaces enables best-of-breed eco-System

  20. SAP Sybase IQ 15 A comprehensive three-tier Big Data Analytics platform Eco-System Business Intelligence Tools, Data Integration Tools, DBA Tools, Packaged Apps SAP Sybase IQ With PlexQ™ Technology Application Services In-Database Analytics, Multi-lingual Client APIs, Federation, Web Enabled Data ManagementHigh Performance, Highly Scalable, Cloud Enabled

  21. SAP Sybase IQ 15 PLATFORM Reliable data management for Big Data Analytics Full Mesh High Speed Interconnect SAN Fabric • Industry leading column store technology for speed, compression, ad-hoc analysis • PlexQ™ framework • Massively parallel processing of complex queries • Cloud enabled elastic virtual data marts for user communities • Logical and physical partitioning for information lifecycle management • Productivity focused administration and monitoring DataManagement

  22. SAP Sybase IQ 15 PLATFORM Versatile application services for Big Data Analytics Programming APIs Python • Microsoft.net • PERL • PHP • Ruby-on-rails • JAVA • C/C++ Web Services API Application Services In-Database Analytics Plug-Ins: SQL, C++, JAVA SAN Fabric • PlexQ™ framework • Comprehensive ANSI SQL including ANSI 2008 specifications • In-database analytics plug-ins: SQL, C++, JAVA for data mining, statistical analysis • In-database web services with SOAP API • Query and data federation via SQL queries • Multi-lingual client APIs: C, JAVA, PHP, PERL, Python, Ruby-on-rails, ADO.NET

  23. SAP Sybase IQ 15 PLATFORM Rich eco-system for Big Data Analytics Source Answers Data Usage Data preparation Eco-System SAP Sybase IQ Data Warehousing Event Processing Data Modeling / Database Design Tool • Certified business intelligence tools: SAP BusinessObjects BIP 4.0, Cognos, Microstrategy,.. • Certified data integration tools: SAP BusinessObjects DS 4.0, Informatica, Syncsort, …. • Certified data mining tools: KXEN, SAS, SPSS, Qyte, … • Certified application tools: ZEND, Quest, Alteryx, BMMSoft, …. • Certified DBA tools: Sybase,Bradmark, Whitesands, Symantec, EMC, ….

  24. Architecture Details

  25. SAP Sybase IQ Key building blocks SAP Sybase IQ 15 Engine Web Enabled Analytics Communications & Security LoadingEngine Query Engine Admin & Monitoring Framework Multiplex Grid Architecture Information Lifecycle Management Text Search In-Database Analytics Column Indexing Sub-system Industry leading scale & performance Column StorageProcessor Storage Area Network

  26. COLUMN STORE PROCESSOR Introduction • Key characteristics • Data is stored vertically — Each column is stored separately • The data is the index • Large page sizes (128K — 512K) • Persistent row identifiers • Bitmap driven Key Benefits • Unsurpassed concurrent, mixed workload performance, storage efficiencies • Allows queries and updates to only access referenced columns and pages • Single data type and domain per page greatly enhances the effectiveness of dictionary-based compression • Large page size makes better use of modern disk and I/O subsystems • Allows queries to evaluate multiple predicates on the same table using index-based access methods • Allows rows to be uniquely identified without dragging all primary keys (e.g. for DELETES) • Bitmaps enable significant efficiencies — compact representation, easy to horizontally partition, intermediate results, allows perfect prefetch of rows avoiding cache misses and LRUs

  27. COLUMN STORE PROCESSOR Leading data compression technology Page level compression LZW compression with implicit dictionary Enumerated fast projection indexes Distinct values stored in vector Column stores vector ordinals Bitmaps ROWIDs stored in multiple formats Format based on locality of ROWIDs Column 2 1 Ordinal Value 1 1 “Wide Data” 2 2 “Wider Data” 1 1 1, {3-99}, {200-999}, {1295: 10001110101} Single Ranges Bit vector

  28. COLUMN INDEXING SUB-SYSTEM Powerful performance booster Key characteristics • Most columns will have at least one index • Index selection decisions based on column cardinality (number of unique values) • Indexes and columns are stored separately • Multiple indexes used to resolve a query • Indexes are self maintaining • No optimizer statistics to update • Indexes are compressed • Index advisor demystifies index selection

  29. COLUMN INDEXING SUB-SYSTEM Optimized Fast Projection (FP) indexes is the column store itself Unique values for a column stored in a lookup table and optimized FP requires 1, 2, or 3 bytes based on cardinality Optimized FP indexes covers a broad spectrum of data (> 3 bytes data) FP(1) = 2^8 = 256 unique values FP(2) = 2^16 = 65,536 unique values FP(3) = 2^24 = 16,777,216 unique values Reduced storage for columns with cardinality between 65,536 and 16,777,216 Reduced I/O Writes during data loading Reads during query processing Improved query processing Reduced memory in query execution as the expansion of FP’s is delayed More vector processing FP(3) More complete optimizer statistics on column data distributions and counts Fast project indexes

  30. COLUMN INDEXING SUB-SYSTEM Low fast indexes Low Fast (LF) indexes filters low cardinality fields with bitmaps Sample query:select count(*) from customers where state = ‘NY’ and class = ‘A’ • Filter low cardinality fields with bitmaps • Bit position correspond to fixed row ID • Bitmaps further reduce the amount of data read • Small number of bits rather than entire field • Bit algebra — ANDing and ORing bitmaps is very efficient with today’s processors • Note that even vertically stored data is not read

  31. COLUMN INDEXING SUB-SYSTEM High Group (HG) Indexes help with high cardinality data sets HG index characteristics Useful for high cardinality (> 1000 unique values) Ideal for GROUP BY, COUNT (DISTINCT), SELECT DISTINCT Frequently used as primary key and in multiple columns Examples: Zip codes, customer ids HG index structure B-tree structure High group index points to an array of record ids High group indexes

  32. COLUMN INDEXING SUB-SYSTEM High Non-Group (HNG) Indexes is ideal for data with large number of values stored in binary form Data sliced vertically so each bit position can be manipulated separately Many bit positions are either all on or all off so no storage space is required System only needs to store mixed bitmaps (1s and 0s) Typical storage is 10-20% of size of raw data High non-group indexes

  33. COLUMN INDEXING SUB-SYSTEM Word index The like query operator will call the Word Index as in the predicate below where company_namelike ‘% Sybase %’ Text index The contains query operator will call the Text Index as in the predicate below where contains (TextCol, ‘a AND NOT b’) Compare index is an index on relationship b/w two columns Stores comparison bitmap of (<, >, or =) of its two columns Date / Time / DateTime indexes Range searches Datepart searches Other indexing techniques

  34. COLUMN INDEXING SUB-SYSTEM Multiple indexes in query processing Example select sum(sales) from customers where state = ‘NY’ and class = ‘A’ SAP Sybase IQ will use the LF indexes to filter rows and then apply to HNG to compute the sum Minimal amount of data is read to resolve the query Using multiple indexes • Index advisor: • Run sample queries and index advisor provides index building advice • Index building is a relatively simple exercise

  35. SAP Sybase IQ Key building blocks SAP Sybase IQ 15 Engine Web Enabled Analytics Communications & Security LoadingEngine Query Engine Admin & Monitoring Framework Multiplex Grid Architecture Information Lifecycle Management Text Search In-Database Analytics Column Indexing Sub-system Industry leading scalable Multiplex grid for concurrent and MPP analytics Column StorageProcessor Storage Area Network

  36. MULTIPLEX GRID Architecture overview Shared Interconnect Full Mesh Interconnect Shared CPU, Memory Shared SAN Fabric SAN Fabric Shared Storage • Compute nodes • Can be 86/BM Power/HP Itanium/Solaris Sparc servers • Each node can have a different CPU core count, memory size • One node is designated as the coordinator for meta data management • Any node can be a write node • Any node can be a read node • Shared storage • Can be SSD, FC, SATA or a combination of all • Shared storage can be physical SAN with FC and/or iSCSI • Shared storage can be virtual SAN over DAS • Storage can be logically grouped with compute • Storage can be tiered for data aging

  37. MULTIPLEX GRID Architecture usage: workload balancing Foundry Server Iron Cisco 11501 Open Source software IP Load Balancer Full Mesh Interconnect Automatic Workload Balancer • Highly efficient mixed workload balance • Built in workload balancer • All work units participate in a democratic fashion i.e. they initiate participation when free • Makes task executions self-throttling • If work units fail, leader picks up and completes work • Workloads can be isolated by logical groupings via logical servers • External IP workload balancer • Client side job submission may be done via external IP load balancers • External load balancers initiate based on node utilization or on a round robin basis

  38. MULTIPLEX GRID Architecture usage: elastic virtual data marts Full Mesh High Speed Interconnect Virtual Shared CPU, Memory Logical Server 1 Logical Server 2 VDM1 VDM2 Shared SAN Fabric Virtual Shared Storage • Virtual data marts • VDM is logical binding of mutually exclusive nodes, memory, storage • Logical server (LS) is a mutually exclusive logical binding of nodes, memory • Logical Server (LS) is a subset of VDM • Bindings are elastic i.e. they can dynamically grow/shrink • VDM and works via login permission control • VDM can isolate applications, workload, user communities • DQP within VDM boundaries only

  39. SAP Sybase IQ Key building blocks SAP Sybase IQ 15 Engine Web Enabled Analytics Communications & Security LoadingEngine Query Engine Admin & Monitoring Framework Multiplex Grid Architecture Information Lifecycle Management Powerful bulk, trickle, client & server side loading Text Search In-Database Analytics Column Indexing Sub-system Column StorageProcessor Storage Area Network

  40. ROBUST LOAD ENGINE SAP BusinessObjects ETL Scale out Scale out ETL project 1 ETL project 1 ETL project 1 • Loading can be from multiple modes • Parallel load processing for both bulk (from files, remote databases) and incremental bulk loads • Load rates in excess of 250 GB/hr is common even with modest size hardware nodes • Continuous / trickle feed via microbatching (change data capture) • Page level snapshot versioning • No locking (only table-lock in-memory catalogs) — allows non-blocking concurrent loads/queries • Load from client machines Full Mesh Interconnect SAN Fabric

  41. CONTINUOUS REAL TIME LOADS Using SybaseReplicationServer SAP Sybase IQ ASE / Oracle Primary RS Replicate RS 1. Outbound queue 2. Read transaction from transaction cache and grouped 3. Compile a group 4. Apply group DSI module Transaction cache Group Compile Apply CDB

  42. SAP Sybase IQ Key building blocks SAP Sybase IQ 15 Engine Web Enabled Analytics Communications & Security Versatile, parallel processing & workload aware query processor LoadingEngine Query Engine Admin & Monitoring Framework Multiplex Grid Architecture Information Lifecycle Management Text Search In-Database Analytics Column Indexing Sub-system Column StorageProcessor Storage Area Network

  43. QUERY APIs • Pure ANSI SQL based • Covers SQL-92, SQL-99, SQL-2003, SQL-2008 specifications with few restrictions • Extensive OLAP support • Full text search support • Stored procedures • Extensive support — both ANSI SQL and T-SQL • Security enabled — access control, access grant • Customizable — login password verification logic • Customized for timed execution • Configurable, event based/scheduled execution of active content • Triggers not supported • Multi-lingual clients • Callable from JAVA, C, C++, PHP, PERL, Python, Ruby, ADO.Net

  44. QUERY PROCESSING STRENGTHS 4. Ordering 3. Grouping • Versatile query engine • Cost based, volcano model • Highly parallel plans: segregated tuple streams with parallel data flow, termination of parallel streams — both within SMP and across Multiplex Grid • Many access paths to indexes and columns • Concurrent querying aware, elastic CPU/memory usage, delayed projection • Many join types supported e.g. NestedLoop, Hash, SortMerge, Pushdowns with Bloom filters,.. 2. Joining 4 Quad-core sockets Main Memory T1 T2 T3 T4

  45. SAP Sybase IQ Query processing strengths: distributed query processing Query 1 5 node DQP Query 2 4 node DQP SAN Fabric • Massively parallel processing • Leader node: Receives and initiates queries, including UDFs • Any node can be a leader, one leader per query, many concurrent leaders possible • Leader node may satisfy query within itself • Worker node: Nodes picking up work units from leader • Many worker nodes per query, same worker node can serve multiple queries • Worker nodes are enlisted only if leader cannot satisfy query on its own

  46. QUERY PLAN ANALYZER Connector Width Varies With Row Count ToolTip for Estimated Row Count Double Bar Indicates Parallel Dataflow ToolTip for Max Thread Count Node Depth Varies with Max Thread Count Graphically rich query plan analyzer

  47. SAP Sybase IQ Key building blocks SAP Sybase IQ 15 Engine Web Enabled Analytics Communications & Security LoadingEngine Query Engine Comprehensive built-in full text search and analysis Admin & Monitoring Framework Multiplex Grid Architecture Information Lifecycle Management Text Search In-Database Analytics Column Indexing Sub-system Column StorageProcessor Storage Area Network

  48. TEXT SEARCH AND ANALYSIS • Enable analytics on textual data + structured relational data • Text Index — SQL based on terms/phrases, prefix, proximity, scoring • Interface to plug in 3rd party Document converters or Term Breakers • SAP BusinessObjects Data Services text analytics library • 3rd party text analytics library Optional 3rd party pre-filter/ Text Segmentation or Entity Extractors Text Search: Email-Archiving, E-discovery, E-library Text Analytics: Fraud detection, Risk analytics, News feed analysis Text Mining: Clustering, categorization, sentiment analysis

  49. TEXT SEARCH AND ANALYSIS • File ingestion into blob, clob • Text Load Example SAP Sybase IQ Table Full Text Queries Text Index Visualization • Categorization • Tokenization • Hierarchical to relational • Filtering to plain text, formatting • Entity • Extraction • Schema Transform • Text Filtering Full Text Queries SELECT * FROM myTable WHERE CONTAINS (TextCol, ‘d’); — returns rows SELECT * FROM myTable CONTAINS (TextCol, ‘d’); — returns rows and scoring SELECT * FROM myTable WHERE CONTAINS (TextCol, ‘a AND NOT b’); — Boolean SELECT *FROM myTable WHERE CONTAINS (TextCol, ‘a NEAR b’); — proximity

  50. SAP Sybase IQ Key building blocks SAP Sybase IQ 15 Engine Web Enabled Analytics Communications & Security LoadingEngine Query Engine Admin & Monitoring Framework Multiplex Grid Architecture Information Lifecycle Management Text Search In-Database Analytics Super fast In-kernel quantitative analytics Column Indexing Sub-system Column StorageProcessor Storage Area Network

More Related