330 likes | 561 Views
Oracle 11g Data Warehousing. IOUG collaborate 08 Paper # 204. Vincent Chazhoor Data Warehouse Architect Lafarge North America Vincent_chazhoor@hotmail.com. The Pillars of Oracle Data Warehousing. Partitioning Parallel Processing Data Compression Summary Management.
E N D
Oracle 11g Data Warehousing IOUG collaborate 08 Paper # 204 Vincent Chazhoor Data Warehouse Architect Lafarge North America Vincent_chazhoor@hotmail.com
The Pillars of Oracle Data Warehousing • Partitioning • Parallel Processing • Data Compression • Summary Management
Data Compression • Introduced in Oracle9i Release 2 • Supports compression during bulk load operations only • Data modified using conventional operations not compressed • Improved performance for queries accessing large amounts of data • Fewer IOs • Efficient IO operations • Buffer Cache efficiency • Data Can be compressed at the table level or partition level • Compressing unmodified partitions are common in DW environment • Compressing data saves both storage space and CPU resources • Savings in storage space cascade to standby, backup and development, test and other database copies
Data Compression – 11g New Features • Oracle Database 11g extends compression for OLTP data • Support for conventional DML Operations (INSERT, UPDATE, DELETE) • New algorithm significantly reduces write overhead • Batched compression ensures no impact for most OLTP transactions • Typical compression ratio of 2x to 3x
Using OLTP Table Compression • Requires database compatibility level at 11.1 or greater • New Syntax extends the ‘COMPRESS’ keyword • COMPRESS [FOR {ALL | DIRECT_LOAD} OPERATIONS] • DIRECT_LOAD (DEFAULT) • Refers to Bulk load operations from 10g and prior releases • ALL • OLTP + Direct loads • Enable compression for a new table • CREATE TABLE ( ……………………. COMPRESS FOR ALL OPERATIONS • Enable only direct load compression on existing table • alter table <tablename> compress • only new rows are compressed, existing rows are uncompressed
Data Compression – 11g New Features • Fast RMAN Compression • Compresses the backup contents before writing them to disk or tape • No extra decompression is required during recovery • Performance improvement • 40% faster backup compression versus Oracle Database 10g • Improved Data pump Compression • Metadata compression available in 10g • ALL, DATA_ONLY, METADATA_ONLY, NONE options in 11g • No need to decompress before import • Single step compression of both data and metadata • Complete Data pump functionality available oncompressed file • suitable for fast, incremental daily backups
Partitioning • Partitioning is the Key functionality for managing large volumes of data in Oracle Database • One logical object and Many physical objects • Many features directly and indirectly depend on partitioning • No bitmap indexes or star transformation without partitioning! • Partition exchange is key in providing longer availability of the data warehouse and fast ETL
Oracle 11g PartitioningEnhancements • Composite Partitioning enhancements • Data is partitioned along two dimensions • A distinct value pair for the two dimensions uniquely determines the target partitioning • In 11g More types of Composite partitioning options • Enhancements to Partition Management • Virtual column based partitioning • REF partitioning • Interval partitioning • Partition advisor
Composite partitioning enhancements Oracle Partitioning – 11g Enhancements ORDERS ORDERS 200802 Cement ORDERS 200802 200801 Gypsum 200801 CAN CAN 200802 USA USA 200801 LIST-RANGECountry byGL Post Date RANGE-RANGEOrder Date by GL Post Date LIST-LISTCountry by Product Type
Range- Range Composite Partitioning • Data is coming by Invoice date and Queries are by GL_date • ETL team want to partition by Invoice_date to use Partition exchange loading and Reporting team wants to Partition by GL_Date • For the records the Invoice_date will be in one moth and GL_date in another Invoice_date Jan 2008 ... ... Feb 2008 ... ... ... ... May 2008 ... ... ... ... Mar 2008 Jan 2008 Feb 2008 Jun 2008 GL_date
Mar 2008 Range- Range Composite Partitioning • In 10g and before Range partition on one date column and Hash on other • DBAs use other work around to address the issue • In 11g the DBA can make both ETL and Reporting Teams happy Invoice_date • All records with GL_DATE in March 2008 Jan 2008 ... ... Feb 2008 ... ... ... ... May 2008 ... ... ... ... Jan 2008 Feb 2008 Mar 2006 Jun 2008 GL_DATE
Virtual Columns Oracle Database 11g introduces virtual columns • purely virtual, meta-data only • Do not take storage space • Virtual Columns are ‘Column Expressions’ • Derived from other columns in the table • Deterministic functions • Constants • Treated as real columns • Can compute statistics. • Can create Index • Eligible as partitioning key • Improved performance
Virtual Columns • Virtual Columns can be added using CREATE TABLE or ALTER TABLE commands • Eg CREATE TABLE Transaction (Accunt_number number(18) not null, Account_name varchar2(50) not null, .................... .................... Bank_branch number(2) generated always as (to_number(substr(to_char(Account_number),1,3))) partition by list (Bank_branch)
Interval Partitioning • Adding Range Partitions when needed is a challenge for DBAs • New partitions must be created on-time for new data • alter table <tablename> split partition MAX_PARTITION at (TO_DATE(' 2008-05-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) into (partition P200804, partition MAX_PARTITION); • “Managing Partitioned Tables While You Sleep” by David Burns is the most downloaded article from IOUG “knowledge library” and it is in “top 5 most downloaded articles” for years • Automating partition management was a request from DBAs for years and Interval partitioning finally does that
Interval Partitioning • Interval Partitioning • This is an extension to range partitioning • Automatically create equi-sized range partitions • Support Composite Partitioning • Interval, Interval-list, Interval-hash, Interval-range • Partitions are created as metadata information • Start partition is created • Other partitions are created when data arrives • Segments are allocated as soon as new data arrives • No need to add new partitions • Local indexes are created and maintained as well
Table SALES_ORDER ... ... ... Jan 2008 Feb 2008 Mar 2008 Jun 2008 Oct 2009 Nov 2009 First segment is created Interval Partitioning CREATE TABLE Sales_Order (order_date DATE, ...) PARTITON BY RANGE (order_date) INTERVAL(NUMTOYMINTERVAL(1,'month') (PARTITION p_first VALUES LESS THAN ('01-FEB-2008'); New segment is automatically allocated when data arrives INSERT INTO Sales_Order (order_date,...) VALUES ('02-JUN-2008',...);
REF Partitioning Partitioning in “Relational Database” • If a Table is partitioned and a child table of that table is not partitioned or partitioned by a different column, then that can create issues for maintenance • How do you truncate an old partition from the parent table when associated records exist in the child table? • Deleting records from the child table will be a maintenance issue Partitioned Parent Table Non-Partitioned Child Table Child Table Partitioned by different partition strategy
Partitioning in “Relational Database” • This issue is not very common in star-schema design following Kimball approach. The fact tables will not be associated with another fact table. • This is an issue in “relational” data warehouse designs following Inmon’s approach • DBAs generally try to solve this issue by partitioning the child table exactly like the parent table using the same partition key. • Many companies follow this as a standard or best practice • The child table will have the PK of the parent table as a Foreign Key but the partition key may be missing • Redundant storage of Parent Table Partition Key • Redundant maintenance
REF Partitioning • Oracle Database 11g introduces REF Partitioning • There is no need for specifically following the partition strategy of the parent table while designing child table • Child table inherits the partitioning strategy of parent table through PK-FK relationship • The partition key of the parent will not be stored redundantly in the child table • Enhanced Performance and Manageability
PartitioningAdvisor • Advises on Partitioning Strategy • Advises on partitioning method, partitioning key and interval • Partition advise integrated with advises for indexes and MVs SQL Advisor SQL Plan Tuning Access Analysis Partition Analysis SQLStructure Analysis Oracle DB
Partition Merging – No change in Behavior • In Oracle 10g and before, when two partitions are merged, data is actually moved. This is different from Partition exchange which is just a data dictionary update without any actual movement of data • There is no change to the behavior of partition merging in 11g – still there is data movement • SQL> drop table test_fct; Table dropped. • SQL> create table test_fct 2 (col1 number) 3 partition by range (col1) 4 (partition P1 values less than (200), 5 partition P2 values less than (300), 6 partition P3 values less than (400)); Table created.
Partition Merging in 11g • SQL> insert into test_fct values (100); 1 row created. • SQL> insert into test_fct values (201); 1 row created. • SQL> insert into test_fct values (305); 1 row created. • SQL> commit; Commit complete.
Partition Merging in 11g • SQL> select subobject_name,object_id, DATA_OBJECT_ID 2 from user_objects 3 where object_name = 'TEST_FCT' 4 and subobject_name is not null; SUBOBJECT_NAME OBJECT_ID DATA_OBJECT_ID ------------------------------ ---------- -------------- P1 69855 69855 P2 69856 69856 P3 69857 69857 • SQL> alter table TEST_FCT merge partitions p1,p2 into partition p2; Table altered.
Partition Merging • SQL> select subobject_name,object_id, DATA_OBJECT_ID 2 from user_objects 3 where object_name = 'TEST_FCT' 4 and subobject_name is not null; SUBOBJECT_NAME OBJECT_ID DATA_OBJECT_ID ------------------------------ ---------- -------------- P2 69856 69858 P3 69857 69857 • A new data object id is assigned to the new partition because a new segment is created for it • No change from 10g behavior
Pivot • A great feature for data Warehousing • Common Query requirement to display the data for current month, previous month and same month last year as one record • These values will be stored as different records in the database • The challenge of converting rows into columns in the report • Developers used to do this using workarounds • Pivot function will do the trick • Pivot makes this coding lot easier
Optimizer Statistics • Oracle Database 11g enhancements improve Optimizer Statistics • Oracle Database 11g makes the collection of optimizer statistics faster • The computed optimizer statistics are more thorough, providing better information to the CBO by correlating statistics, such as Number of Distinct Values (NDV) and histograms, on multiple columns. • Oracle Database 11g makes gathering statistics safer. Newly generated statistics can remain in a private ‘statistics workspace’ until they can be validated and published for general use • In Oracle 11g we can set individual thresholds on table level with DBMS_STATS. dba_tab_stat_prefs • New data dictionary view dba_tab_stat_prefs to view these thresholds. • Extended statistics on expressions can be computed in 11g exec DBMS_STATS.GATHER_TABLE_STATS(’schema’, ‘Tablename’, METHOD_OPT => ‘for all columns size 1 columns (upper(last_name)) size 5‘)
Cube Organized Materialized Views • The 11g introduces Cube Organized Materialized Views • This new feature permits standard SQL queries (such as those generated by non-OLAP aware reporting tools) to access OLAP cube data through query rewrite • Cube Organized Materialized Views had many advantages compared to conventional MVs • A Cube can replace hundreds of MVs
Q A &
Thank You • Please complete the evaluation form • Speaker Name : Vincent Chazhoor • Session name : Oracle 11g Data Warehousing • Session# : 204