330 likes | 596 Views
Maximize WebFOCUS Performance with Hyperstage. Apr, 2012. Agenda. Introduction to Hyperstage How does it work Recent results Demonstration Wrap Up and Q&A. Introducing Hyperstage. WebFOCUS Hyperstage Why?. Why Do BI Applications Fail? Typically 3 Reasons…. 1. Too Complicated
E N D
Agenda • Introduction to Hyperstage • How does it work • Recent results • Demonstration • Wrap Up and Q&A
WebFOCUS HyperstageWhy? Why Do BI Applications Fail? Typically 3 Reasons…. 1. Too Complicated Self-Service, Guided Ad hoc 2. Bad Data Data Quality 3. Too Slow Hyperstage Hyperstage will improve database performance for WebFOCUS applications with less hardware, no database tuning and easy migration.
What is WebFOCUSHyperstage • Embedded, columnar data store that can dramatically increase the performance of WebFOCUS applications • Columnar = reduced I/O (vs relational) • Easily implemented without the need for database administration • Disk footprint is reduced with a powerful compression algorithm • Includes embedded ETL for seamless migration of existing analytical databases • No change in query or application required • Data migrations are seamless and easy • WF 7.7.03M and higher includes optimized Hyperstage Adapter • Runs on commodity hardware (Intel based) • Windows 64 • Linux (Redhat, Centos, Suse, Debian)
Introducing WebFOCUSHyperstage …. Hyperstage is an integrated columnar oriented data store that helps WebFOCUS applications achieve outstanding query performance.
WebFOCUSHyperstage Engine How does it work? Column Orientation • Smarter Architecture • No maintenance • No query planning • No partition schemes • No DBA Knowledge Grid–statistics and metadata “describing” the super-compressed data Data Packs – data stored in manageably sized, highly compressed data packs Data compressed using algorithms tailored to data type
Pivoting Your Perspective: Columnar Technology Employee Id Name Location Sales 1 Smith New York 50,000 2 Jones New York 65,000 3 Fraser Boston 40,000 4 Fraser Boston 70,000 Data stored in rows Data stored in columns 1 Smith New York 50,000 1 Smith New York 50,000 2 Jones New York 65,000 2 Jones New York 65,000 3 Fraser Boston 40,000 3 Fraser Boston 40,000 4 Fraser Boston 70,000 4 Fraser Boston 70,000
Data Organization and the Knowledge Grid …. Data Packs - The data within each column is stored in groupings of 65,536 values called Data Packs • Data Packs improves data compression as the optimal compression algorithm is applied based on the data contents • An average compression ratio of 10:1 is achieved after loading data into Hyperstage. For example 1TB of raw data can be stored in about 100GB of space. Data Pack Data Pack Data Pack Data Pack Data Pack Data Pack
Data Organization and the Knowledge Grid …. Data Packs and Compression • Data Packs • Each data pack contains 65, 536 data values • Compression is applied to each individual data pack • The compression algorithm varies depending on data type and data distribution 64K 64K • Compression • Results vary depending on the distribution of data among data packs • A typical overall compression ratio seen in the field is 10:1 • Some customers have seen results have been as high as 40:1 64K • Patent Pending • Compression • Algorithms 64K
Data Organization and the Knowledge Grid …. Pack Row 1 The Knowledge Grid Knowledge Nodes Column A Column B Global Knowledge Pack Row 2 String and character data Built during LOAD Pack Row 3 Numeric data Pack Row 4 Distributions Pack Row 5 Pack Row 6 Built per-query e.g. for aggregates, joins Dynamic Knowledge
Data Organization and the Knowledge Grid …. Data Pack Nodes (DPN) A separate DPN is created for every data pack created in the database to store basic statistical information Character Maps (CMAPs) Every Data Pack that contains text creates a matrix that records the occurrence of every possible ASCII character Histograms Histograms are created for every Data Pack that contains numeric data and creates 1024 MIN-MAX intervals. Pack-to-Pack Nodes (PPN) PPNs track relationships between Data Packs when tables are joined. Query performance gets better as the database is used. This metadata layer = 1% of the compressed volume
WebFOCUSHyperstage Example: Query and Knowledge Grid salary age job city SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Toronto’; All values match Completely Irrelevant Suspect
WebFOCUSHyperstage Example: salary > 50000 salary age job city SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Toronto’; Find the Data Packs with salary > 50000 All values match Completely Irrelevant
WebFOCUSHyperstage Example: age<65 salary age job city SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Toronto’; Find the Data Packs with salary > 50000 Find the Data Packs that contain age < 65 All values match Completely Irrelevant Suspect
WebFOCUSHyperstage Example: job = ‘shipping salary age job city SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Toronto’; Find the Data Packs with salary > 50000 Find the Data Packs that contain age < 65 Find the Data Packs that have job = ‘shipping’ All values match Completely Irrelevant Suspect
WebFOCUSHyperstage Example: city = ‘Toronto salary age job city SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Toronto’; Find the Data Packs with salary > 50000 Find the Data Packs that contain age < 65 Find the Data Packs that have job = ‘shipping’ Find the Data Packs that have city = ‘Toronto’ All values match Completely Irrelevant Suspect
WebFOCUSHyperstage Example: Eliminate Pack Rows salary age job city SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Toronto’; All packs ignored All packs ignored Find the Data Packs with salary > 50000 Find the Data Packs that contain age < 65 Find the Data Packs that have job = ‘shipping’ Find the Data Packs that have city = ‘Toronto’ Eliminate All rows that have been flagged as irrelevant All packs ignored All values match Completely Irrelevant Suspect
WebFOCUSHyperstage Example: Decompress and scan salary age job city SELECT count(*) FROM employees WHERE salary > 50000 AND age < 65 AND job = ‘Shipping’ AND city = ‘Toronto’; All packs ignored All packs ignored Find the Data Packs with salary > 50000 Find the Data Packs that contain age < 65 Find the Data Packs that have job = ‘shipping’ Find the Data Packs that have city = ‘Toronto’ Eliminate All rows that have been flagged as irrelevant Finally we identify the pack that needs to be decompressed All packs ignored Only this pack will be de-compressed All values match Completely Irrelevant Suspect
POC Results (Internal Use Only) • Insurance Company • Query performance issues with SQL Server - Insurance claims analysis • 3 day POC - Compression achieved 40:1 • Most queries running 3X faster in Hyperstage • Large Bank • Query performance issues with SQL Server - Web traffic analysis • 3 day POC -Compression achieved 10:1 • Queries than ran for 10 to 15 mins in SQL Server ran sub-second in Hyperstage • Government Application • Query performance issues with Oracle – Federal Loan/Grant Tracking • 3 day POC -Compression achieved 15:1 • Queries than ran for 10 to 15 mins in Oracle ran in 30 secs in Hyperstage POCs can typically be completed with 3 days
Beyond WebFOCUS • Hyperstage is integrated in the WebFOCUS BI Architecture through the reporting server and is administered using the WebFOCUS console • WebFOCUS client applications communicate directly through the reporting server • Custom applications developed via Java or .Net can access the reporting server via WebFOCUS services and a supplied WebFOCUS connector • Hyperstage also supports connections from any application via industry standard JDBC or ODBC connections. There are also native drivers for .NET, C, or PHP applications to connect directly to the Hyperstage engine. • Data can be loaded and maintained in Hyperstage using iWay Data Integration or using any commercial ETL tool. Generic App Java C .Net PHP Perl WebFOCUS Client WebFOCUS Reporting Server WebFOCUS Hyperstage Server WFHyperstage Adapter Java WF Connector WF Service .Net
Hyperstage vs. OLAP • Many companies are looking to migrate from legacy OLAP solutions • Hyperstage can offer excellent query performance with a commonly understood star pattern database • WebFOCUS can offer navigation and drill path navigation • Hyperstage can support large numbers of dimensional attributes and can be easily updated
Hyperstage vs. In-Memory • WebFOCUSHyperstage is a viable alternative to BI tools that utilize an in-memory architecture like QlikView, Tableau, Cognos TM1 and Tibco/Spotfire • In-memory is limited to the amount of data you can store in RAM. • Hyperstage is a hybrid approach that efficiently uses disk I/O without sacrificing the performance achieved by in-memory • Tableau for example has approximately a 100GB limit on its in-memory cache.
NYSE Daily Stock Price History • Downloaded from internet daily history from 1970 to 2006 for 7000 stocks • 14 million rows • 1.4GB of raw data • Compressed to 70MB • Test query summarizes stock information for top tech companies in March 2000 and compares the information for the same period in March 2002 (dot com collapse) • Note: Hyperstage running on a Dell laptop 1 duo core processor with 4GB of RAM
NYSE Daily Stock Price History (exploded) • Simulated additional stock prices up to 2043 • 2 billion rows • 200GB of raw data • Compressed to 17GB • Test query summarizes stock information for top tech companies in March 2000 and compares the information for the same period in March 2002 (dot com collapse)
WebFOCUSHyperstageThe Big Deal… • No indexes • No partitions • No views • No materialized aggregates • Value proposition • Low IT overhead • Allows for autonomy from IT • Ease of implementation • Fast time to market • Less Hardware • Lower TCO No DBA Required!