710 likes | 938 Views
Business Intelligence & Data Warehousing Session April 2007. Robert Stackowiak, Vice President, Business Intelligence, Oracle Corp. Business Intelligence Today. Provides reporting and analyses capabilities Can deliver timely, accurate, and useable information
E N D
Business Intelligence & Data Warehousing SessionApril 2007 Robert Stackowiak, Vice President, Business Intelligence, Oracle Corp.
Business Intelligence Today • Provides reporting and analyses capabilities • Can deliver timely, accurate, and useable information • Enables better strategic and tactical decision making
“In the next five years, Business Intelligence as we know it will disappear from view. Apart from the important role of analytical BI to find new insight, operational BI will become a part of the core fabric of businesses - embedded into business processes with relevant insight available to everyone in the organization when they need it, where they need it.” Neil Raden, BI Megatrends, Intelligent Enterprise 2006
Spectrum of BI Functionality Visualize Analyze Publish Integrate Data Model Collaborate Monitor Secure & Manage Compare Predict Sense Respond
SOA Management Packs Identity Management/Security End-to-End Business Intelligence
BI Community / LOBs • Information Access • Being a catalyst for business change through rapid time to benefit • Heterogeneous & ubiquitous access • Information consumption represents value • Report centric approach • Change managed through product • Starting point is often BI Applications with well understood model DW Community / IT • Managing information over the longer term • Managing through business change • Single version of the truth • Information management represents value • Long term architected approach • Model centric approach • Starting point is probably a blank sheet of paper Why BI still so Difficult?IT & LOB and Competing Agendas
BI Focus Areas in Key Verticals • Financial • Cross-sell, targeted marketing • Manufacturing • Supply chain analysis / just in time inventory • Retail • Supply chain analysis / inventory control • Media • Revenue across multiple distribution channels • Transportation • Optimized logistics management • Healthcare • Cost containment / quality of care • Civilian Government • Budget spend and reporting to constituents • Military & Intelligence • Logistics management, threat analysis
Solution Development & Presentation Approval & Budget Organization Strategy Assessment & Scope Discovery Selling the Project • Pick a strategic business area • Should have potentially highly visible positive ROI • Build business / IT partnership • Limit scopes, deliver successes • Leverage previous work and data models if possible • Demonstrate business value in 90 – 180 day increments • Continue building on success
So, what is so hard? • Understanding business requirements & prioritizing • Getting through sponsorship & politics • Getting enough information to design and size the platform • Understanding the corporate vision well enough to include planning for future growth and requirements • Finding architects who can uncover business and technical requirements and design solutions • Unforeseen developments
Finding BenefitsSomewhat motivating – often IT led • ROI = Present Value of Benefits - Costs • Increasing ROI for information producers by • Reducing current costs of analysis for LOBs • Empower LOBs with tools and applications and automated analysis rather than manually creating custom reports • Reducing ongoing costs of analysis for IT • Faster and simpler development of analytic applications • Centralized management
Finding BenefitsThe big wins… led by the business • ROI = Present Value of Benefits - Costs • Increasing ROI for information consumers (LOBs) by • Enabling top-line growth • Identify new customers/markets, focus on most profitable customers, … • React faster than the competition • Enabling bottom-line savings • Optimize operations, financial management, … • Less time spent on gathering data; more time on analysis
Finding BI Opportunity???? St. Louis-area baby boomers mirror national trend of moving to smaller homes KRTBN Knight-Ridder Tribune Business News - St. Louis Post-Dispatch - Missouri22 November 2004 Unified Building Codes Offer Nationwide Growth Opportunities for Homebuilding Sector HomebuilderStocks.com Exclusive ArticleMarket Wire Incorporated17 November 2004 Housing caters to aging boomers in Cary, N.C. Knight-Ridder Tribune Business News07 November 2004 Volatile material prices and supplies plague San Antonio homebuilders, buyers. Knight-Ridder Tribune Business News02 November 2004
Oracle BI Products • Integrated Business Intelligence Database • Business Intelligence Functions Integrated in Database • Integrated Business Intelligence Tools • Best-of-Breed, Next Generation Business Intelligence Infrastructure • Integrated Analytic Applications • Enterprise Wide, Industry Specific BI and Corporate Performance Applications
Oracle & “Hot Pluggable”Open and best of breed Portals Analytic Tools Reporting Desktop Tools Any JSR 168 Portal BOBJ, COGN MSTR Actuate, Oracle Apache FOS Excel, Outlook, Lotus Notes Oracle BI Server Security Data Access ETL Oracle MSFT AD iPlanet Novell Others .. Oracle DB2 MS SQL Teradata Red Brick SQL Anywhere XML, Excel Others .. Oracle Informatica Ascential Sunopsis Others ..
Current Trends • Consolidation of data marts • Deployment of enterprise-wide data on lower cost platforms • Leveraging of applications to speed deployment
Vendor B Marketing Vendor A HR Vendor C SFA Vendor D CRM Typical IT Infrastructure Home Grown Systems A, B, C ….. and more. Home Grown Customer Hub Vendor E ERP e.g. Manufacturing, Order Management & Advanced Scheduling Data Warehouse & Business Intelligence Where is the source of truth?
Data Warehouse Consolidation? Global Companies with more than 1,000 users No Don’t Know Yes Source: 2005 ITtoolbox Data Warehouse Survey
Consolidation Considerations • Data definitions and politics • Hybrid schema and complexity • Advanced security • High availability • Information Lifecycle Management • Appearance of new data marts
Analytics Orchestration Master Data Master Data EAI Master Data DW ETL DW DW Transactional Data Analytical Data EAI Evolving Architecture Data Marts Reporting Applications Data Warehouse Sales Marketing Self Service DW Call Center Business Intelligence Financials Providers Eligibility Claims ETL
2003 Survey 2005 Survey 2001 Survey SBC Teradata 10.50 First Union Informix 4.50 Dialog Proprietary 4.25 Telecom Italia DB2 3.71 FedEx Teradata 3.70 Office Depot Teradata 3.08 AT & T Teradata 2.83 SK C&C Oracle 2.54 NetZero Oracle 2.47 Telecom Italia Informix 2.32 France Telecom Oracle 29.23 AT&T Daytona 26.27 SBC Teradata 24.81 Anonymous DB2/Unix 16.19 Amazon.com Oracle 13.00 Kmart Teradata 12.59 Claria Oracle 12.10 HIRA SybaseIQ 11.94 FedEx Teradata 9.98 Vodafone Teradata 9.91 Yahoo! Oracle 100.38 AT&T Daytona 93.87 KT-IT Group DB2/Unix 49.39 AT&T Daytona 26.71 LGR-Cingular Oracle 25.20 Amazon.com Oracle 24.77 Anonymous DB2/Unix 19.65 UPSS Microsoft 19.46 Amazon.com Oracle 18.55 Nielsen Media SybaseIQ 17.68 Winter VLDB SurveyLargest Database Size*, Decision Support + Overall *Database Size = Data + Indexes
Real Application Clusters • Scalable & Manageable Clusters • Cache Fusion architecture • Vendor Independent Architecture • Linux, Windows, HP, HP Alpha, IBM AIX, Sun Solaris • Integrated ‘clusterware’ in Oracle 10g • Transparent application failover
Amazon.com Financial Analysis Supply Chain Planning Apps Fulfillment Centers Data Warehouse Partners Web Servers Customers Internal End Users Personalization Campaign Generation Email Servers Content Servers
Oracle & High Availability Solutions Grid Clusters Computer Failures UnplannedDowntime ASMFlashbackRMAN & Flash Recovery AreaH.A.R.DData Guard Data Failures Oracle MAA Best Practices System Changes Online ReconfigurationRolling Upgrades PlannedDowntime Data Changes Online Redefinition
Amazon.com DW Growth*Data volumes Data Volume has grown 2x year over year for the past 4 years.Projecting higher growth in upcoming years*: - additional lines of business / product lines supported - huge standard reporting growth with more partners supported
Amazon.com Data Warehouse GrowthUsers & Query growth About 3 concurrent queries per node (8 way parallel queries)
Amazon DW Modular Architecture Oracle10g RAC Amazon’s RAC is so cost-effective they run 2 concurrently and still save money!! 1. Extract from source systems 2. Integrate, transform, and denormalize 3. Query and analyze 4. Data access and publishing Users STG1 (ETL/Staging) ADS1 Query DW (Atomic Data Store) Extract Servers DSS UI Client 16 nodes 4 cpus 8 nodes 4 cpus ETL Manager 2nd pair of identical RAC clusters means ‘no need for backup’ for active online data STG2 (ETL/Staging) ADS2 Query DW (Atomic Data Store) 16 nodes 4 cpus 8 nodes 4 cpus
Oracle Information Appliance Foundation • You determine the potential size and workload type for the data warehouse • You determine possible initial sizing possibilities • IBM, HP, EMC, Panta, and Sun • Platform vendors determine final configuration http://www.oracle.com/solutions/business_intelligence/oiai.html
Customers CxO Process Workers Managers Sales Teams Partners Suppliers Packaged Analytic Applications Custom Analytic Applications Fusion BI Infrastructure and Middleware Blueprint for Greater Business InsightOne Technology for Packaged & Custom Analytic Apps Oracle Hot-Pluggable BI Other Operational & Analytic Sources
Oracle Database EvolutionScalability and Analytics • Cost-based optimizer (Oracle7) • 64-bit SMP and NUMA support (Oracle7) • Static bit-mapped indexing (Oracle7) • Parallel bit-mapped star joins (Oracle8) • Materialized Views (Oracle8i) • Real Application Clusters (Oracle9i) • SQL Analytic Functions (Oracle8i, Oracle9i, Oracle 10g) • OLAP Services (Oracle9i, Oracle 10g) • Data Mining Algorithms (Oracle9i, Oracle 10g)
Oracle ParallelizationIndependent of Partitioning Parallel Query Parallel DDL Parallel DML SQL*Loader
Oracle Multi-Version Read ConsistencyRow Level Locking • Readers don’t lockout updates and updates don’t lockout readers • Reports see only committed data via Multi-Versioning • Queries yield maximum throughput with correct results - no waiting and no dirty reads! • Row locks never escalate - the most scaleable solution available report Budget Table update Rollback Segment Before Image update accurate report
Star Schema Channel Geography Region District Branch Sales Distributor OEM Fact Table: Sales Transactions Time Product Year Quarter Month Type Size Color
Index on COLOR 0 1 0 0 1 0 0 1 0 0 1 0 color = ‘BLUE’ 0 1 0 1 0 0 1 0 1 0 1 0 0 0 1 0 color = ‘RED’ 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 1 color = ‘GREEN’ 1 0 0 0 1 1 0 0 0 1 0 1 0 1 0 0 size = ‘SMALL’ 0 0 1 0 1 1 0 1 0 1 0 0 0 1 0 1 size = ‘MED’ 1 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 size = ‘LARGE’ 0 1 0 1 0 0 0 0 1 0 1 0 1 0 1 0 Oracle Bit-Mapped Indexes partno color size weight PARTS table SELECT count(*) FROM parts WHERE size = ‘MED’ a AND color = ‘RED’ 001 GREEN MED 98.1 002 REDMED 1241 003 REDSMALL 100.1 004 BLUELARGE 54.9 005 REDMED 124.1 006 GREENSMALL 60.1 ... .... ..... ... Index on SIZE
Oracle Parallel Bitmap Star-Joins • Dramatic Performance Gains • Innovative use of bitmap indexes • Complex Star Schemas • Multiple fact tables • Many dimensions • Unconstrained dimensions • Large dimension tables • Optimized for sparse fact tables • Parallel execution
Materialized ViewsSummary Management • Stored summaries are maintained, for faster warehouse query processing • Optimizer transparently redirects queries to use summary, instead of detail data • Intelligent, robust query rewrite capabilities SALES PER MONTH BY REGION (2 GB) What were sales in the West and South regions for the last three quarters? x SALES (30 GB)
Materialized ViewsAutomatic Query Rewrite SELECT p.brand, r.country, t.month, SUM(s.amt) tot_sales FROM sales s, region r, time t, product p WHERE s.region_id = r.region_id AND s.sdate = t.curdate AND s.prod_code = p.prod_code GROUP BY p.brand, r.country, t.month HAVING SUM(s.amt) > 5000000; Optimizer automatically rewrites query on detailed data ... … to access data in the materialized view SELECT brand, country, month, tot_sales FROM sales_sumry WHERE tot_sales > 5000000;
COUNT ORDERS by PRODUCT AVG ORDERS by REGION SUM ORDERS by QUARTER Materialized ViewsData Refresh Incremental Refresh Insert Update Delete • Maintain the timeliness of stored result to preference • Several options for refreshing data in materialized views • Manage the effect of performance on operational data • No impact on load performance ORDERS Full Refresh Deferred Refresh
Data Consolidation with Oracle Oracle BI User’s tool of choice Microsoft Excel Open standards-based platform for non-Oracle tool users + native Excel support thru add-in Hybrid schema Relational & OLAP databases unite In-place data mining Oracle10g
Ranking functions Window Aggregate functions (moving and cumulative) LAG/LEAD functions Reporting Aggregate functions Statistical Aggregates Linear regression Descriptive Statistics Correlations Cross Tabs Hypothesis Testing Distribution Fitting Pareto Analysis (documented) Statistics & SQL Analytics in Oracle
Compare Profit this Quarter versus Last Quarter What is a Product’s Profit Share of its Category? CD Profit DVD Product USA Audio West SF Market Q1 Q2 Q3 Time Why Oracle OLAP is so FastData Access Method • Data stored in dense arrays • Offset addressing – no joins • More powerful analysis • Better performance
Oracle’s Complete Analysis SolutionOracle Data Mining Option • Key Algorithms • Naïve Bayes, Associations, Adaptive Bayes Network, Clustering algorithms, Support Vector Machines, Nonnegative Matrix Factorization, Decision Trees • Data Mining Tools Available • Oracle Data Miner to build applications • Popular Data Mining tools partners • SPSS Clementine • InforSense • Leverage output in other tools such as OBI EE & RTD
Oracle Database EvolutionManageability • Oracle Enterprise Manager (Oracle7) • Oracle Parallel Server high availability (Oracle7) • Range, Composite Partitioning (Oracle8, Oracle8i) • Adaptive Degree of Parallelism (Oracle8i) • Resource Manager: CPU (Oracle8i) • List Partitioning, Composite Range-List (Oracle9i) • Resource Manager: query governor, automatic queuing, reprioritization (Oracle9i) • XML Database, Virtual Private DB (Oracle9i) • Automatic DB Diagnostics Monitor (Oracle 10g) • Automatic Storage Manager (Oracle 10g)
Oracle Enterprise Manager • Browser-based console manages single server, multiple servers, or computing grid • Multi-user repository
Self Managing & Tuning • Adaptive degree of parallelism • Database Resource Manager • CPU allocation by groups of users • Proactive query governing • Automatic queuing • Dynamic re-prioritization • Automatic Database Diagnostics Monitor • Automatic Shared Memory Management • Mean Time to Recovery Advisor • Segment Advisor • SQL Access Advisor • SQL Tuning Advisor • Single Command Flashback
Automated Storage Manager (ASM) Database 1 • Portable, high performance, cluster file system & volume manager • Partitions disk into uniform MB units • Combines units into files • No intermediate volumes • Units assigned to a file are tracked using indexing techniques • Files spread across disks to balance load • Integrated mirroring File #1 File #2 File #1 Database 2
Add order table Nov 2006 Jan 2006 Feb 2006 Mar 2006 Apr 2006 Drop Local Index Other data is not affected Partitioning • Partition level management • On-line addition and removal of partitions • Data management operations (loading, index builds) • Range, hash, composite range-hash, list, composite range-list • Improved availability • Localized disk failures, backup and recovery
RANGE (sales_date) NOV 2006North NOV1998 OCT 2006 North OCT1998 SEP 2006 North SEP1998 DEC 2005 North DEC1997 NOV 2005 North NOV1997 ... LIST (geography) West West West West West ... South South South South South Composite Range-List Partitioning Range partition across time List partition across another major attribute ...
Table Compression 2-7X Compression Observed • Queries on compressed tables may observe minor performance degradation • Performance impact depends upon the query • Many queries will be faster • Compression reduces IO but increases CPU utilization • For a set of queries, performance should degrade by no more than 5% • Load and direct-path INSERT performance will be slower • Data must be compressed as it is added to the table
Encryption Snooping Tampering Authentication Management Secure Data Protection Stored Data EncryptionProtect ultra-sensitive data Virtual Private Database Oracle Label SecurityEnforce row level security Security andPrivacy Data Security Fine-Grained AuditingIdentify misuse of data access rights Auditing Encrypt All Protocols into the Database Prevent wire tapping and tampering Network Security PKI and LDAP Centralized Management, Strong Authentication Unified User Identity
VPD POLICY Where territory_id = 12 Where product_id - 22 Virtual Private DatabaseTransparent Security inside the database • Server dynamically rewrites SQL • Query modification based on PL/SQL package assigned to an object • “Where” clause appended to SQL statement SELECT * FROM orders Western Sales Rep SELECT * FROM orders orders Product Manager