800 likes | 1.2k Views
Oracle Data Archiving Taming the Beast. Dave Moore Neon Enterprise Software. Agenda. Archiving Defined Requirements and Solutions Oracle Archiving Strategies Oracle Row Removal Options Oracle Post Archive Operations. Dave. Oracle ACE Using Oracle since 1991
E N D
Oracle Data ArchivingTaming the Beast Dave Moore Neon Enterprise Software
Agenda Archiving Defined Requirements and Solutions Oracle Archiving Strategies Oracle Row Removal Options Oracle Post Archive Operations
Dave • Oracle ACE • Using Oracle since 1991 • Product Author at Neon Enterprise Software • Creator of OracleUtilities.com • Author of “Oracle Utilities” from Rampant Tech Press • Core competencies include performance, utilities and data management
Database Archiving Database Archiving: The process of removing selected data records from operational databases that are not expected to be referenced again and storing them in an archive data store where they can be retrieved if needed. Purge
Compliance Protection Amount of Data 0 30+ Yrs Time Required Trends Impacting Archive Needs Data Retention Issues: Volume of data Length of retention requirement Varied types of data Security issues
Archiving All Types of Data Paper Blueprints Forms Claims Word Excel PDF XML IMS DB2 ORACLE SYBASE SQL Server IDMS VSAM Programs UNIX Files Outlook Lotus Notes Attachments Sound Pictures Video
Data Archiving and ILM Create Discard Operational Reference Archive Needed for completing business transactions Needed for reporting or expected queries Needed for compliance and business protection Mandatory Retention Period
What Does It All Mean? • Enterprises must recognize that there is a business value in organizing their information and data. • Organizations that fail to respond run the risk of seeing more of their cases decided on questions of process rather than merit. • (Gartner, 20-April-2007, Research Note G00148170: Cost of E-Discovery Threatens to Skew Justice System)
Operational Efficiency • Database Archiving can be undertaken to improve operational efficiency • Large volumes of data can interfere with production operations • efficiency of transactions • efficiency of utilities: BACKUP/RESTORE, REORG, etc. • Storage • Gartner: databases copied an average of 6 times!
What Solutions Are Out There? • Keep Data in Operational Database • Problems with authenticity of large amounts of data over long retention times • Store Data in UNLOAD files (or backups) • Problems with schema change and reading archived data; using backups poses even more serious problems • Move Data to a Parallel Reference Database • Combines problems of the previous two • Move Data to a Database Archive
Production Database Data Extract Recall Database Archive Data Store and Retrieve Archive Data Query Access Captured Structure Archive Policies Data Retention Archive Store Data & Metadata Metadata Policies History Archive Administration Components of aDatabase Archiving Solution Data Recall
Archiving Requirements • Policy based archiving: logical selection • Keep data for very long periods of time • Store very large amounts of data in archive • Maintain Archives for ever changing operational systems • Become independent from Applications/DBMS/Systems • Protect authenticity of data • Access data when needed; as needed • Discard data after retention period automatically
Policy based archiving • Why : • Business objects are archived, not files • Rules for when something is ready can be complex • Data ready to be archived is distributed over database • Implications: • User must provide policies for when something is to be archived • How: • Full metadata description of data • Flexible specification of policy : “WHERE clause”
SUMMARY BY QUARTER DISBURSEMENT ORDER INFO STORAGE INFO Part Number Year Q1 Disbursed Q2 Disbursed Q3 Disbursed Q4 Disbursed Part Number PO Number Vendor ID Quantity Ordered Unit Cost Date Ordered Date Received Part Number Bin Number Qty on Hand Qty on Order Qty Backorder Part Number Dept. ID CHIT ID Qty Disbursed Date Disbursed For Example… Parts Master is the parent table to all other tables PARTS MASTER Part Number Type Description Unit Type Cost Price Substitute Parts
Keep Data for a Long Time • Why: retention requirements in decades • Implications: • Archive will outlive applications/DBMS/systems that generated them • Archive will outlive people who designed and managed operational systems • Archive will outlive media we store it on • How: • Unique data store • Application/DBMS/system independence • Metadata independence • Continuous management of storage • Continuous management of archive content
Maintain Archive for Changing Operational Systems • Why : • Metadata changes frequently • Applications are re-engineered periodically • Change DBMS platform • Change System platform • Replace with new application • Consolidate after merger or acquisition • Implications: • Archive must support multiple variations of an application • Archive must deal with metadata changes • How: • Manage applications as major archive streams having multiple minor streams with metadata differences • Achieve independence from operating environment
Achieve Metadata Independence • Why : • Operational metadata is inadequate • Operational metadata changes • Operational systems keep only the “current” metadata • Data in archive often does not mirror data in operational structures • Implications: • Archive must encapsulate metadata • Metadata must be improved • How: • Metadata Capture, Validate, Enhance capabilities • Store structure that encapsulates with data • Keeps multiple versions of metadata
Protect Authenticity of Data • Why : • Potential use in lawsuits/ investigations • Potential use in business analysis • Implications: • Protect from unwanted changes • Show original input • Cannot be managed in operational environment • How: • SQL Access that does not support I/U/D • Do not modify archive data on metadata changes • Encryption as stored • Checksum for detection of sabotage • Limit access to functions • Audit use of functions • Maintain offsite backup copies for restore if sabotaged
Access Data Directly From Archive • Why : • Cannot depend on application environment • Implications: • Full access capability within archive system • How: • Industry standard interface (e.g. JDBC) • LOAD format output for • For load into a database • May be different from source database • Requires full and accurate metadata • Ability to review metadata • Ability to function across metadata changes
Discard Function • Why : • Legal exposure for data kept too long • Implications: • Data cannot be kept in archive beyond retention period • Must be removed with no exposure to forensic software • How: • Policy based discard • System level function • Tightly controlled and audited • True “zero out” capability • Discard from backups as well
Database or Archive? Keep in DB Keep in Archive Performance Space Compliance
Based on Data Availability Purge Keep in DB Keep in Archive Must be Available to App Must be Available Must Be Secure Not Needed
Oracle Archiving Strategies • Designed Up Front (Yeah, right) • Determined by Application Owner • Implemented by ____________ • Utilize Oracle Features
Finding Large Tables • DBA_SEGMENTS (bytes) • DBA_TABLES (num_rows) • or based on I/O
Rolling Windows • Self Managing • Mostly based on DATE • Utilize DBMS Features • Partitioning • Transportable Tablespaces • Exchange Partition • Set tablespace read only • Expdp • Copy export file and data file
Rolling Windows via Partitioning P1 … … P47 Probably Never Accessed Rarely Accessed Heavily Accessed Data Profile Cheap as you can get Not so fast or expensive Fast, expensive Storage Profile Read / Write Read Only / Compressed Read Only
Why not use transportable tablespaces or Oracle exports for data retention?
The Problem with Oracle Files • Transportable Tablespaces • Exports • Backups Oracle Export Files & Datafiles Import Trans Tsp Version 16Z Year 2007 Year 2030 Not a good method for LT Data Retention
Partitioning (Old ways) • Range Partitioning Data is distributed based on partition key range of values – usually a date. Good When: Data is date-based.
Partitioning (Old Ways) • Hash Partitioning Uses hash algorithm to create equally sized buckets of data. Good When: No natural partition key and desire I/O balancing (hot spots).
Partitioning (Old Ways) • List Partitioning Data is distributed based on LIST of values in partition key. Good When: Have short list of values (States, Regions, Account Types)
Partitioning (New Ways – 11G) • Interval Partitioning Initial Partition is created manually, the rest are automatically created as new data arrives. Good When: Need a rolling window!
Partitioning (New Ways – 11G) • REF partitioning Related Tables benefit from same partitioning strategy, whether column exists in children or not! Good When: Desire related data to be partitioned in the same manner.
Partitioning (New Ways – 11G) • Virtual Column Partitioning Partition key may be based on virtual column Good When: Virtual column is required for partition key.
Row Removal Options • SQL DELETE • CTAS / DROP / RENAME • TRUNCATE • Row Marking
SQL DELETE • Good for small number of rows • RI handled automatically • Oracle was born to DELETE, better than any PL/SQL that you write. • Issue with Un-indexed Foreign Keys
DELETE Optimization • Work in batches, committing (only when programmatically DELETING) • Use parallel DML (Partitioned tables only) • Drop Indexes before (if possible) • Index FK columns
CTAS • Works well for PURGE, not archive • Perfect when you want to keep low percentage of rows in the table • Doesn’t handle RI – no DELETE was issued. • Process • Create table with rows you want to keep • Drop old table • Rename table • Recreate indexes create table new_table unrecoverable as select * from old_table where ...
TRUNCATE • Congratulations if your application lends itself to TRUNCATE without losing new data • What about RI? • May truncate or drop individual partitions
DROP • DROP PARTITION • What would you do before you drop it? • Exchange partition with table • Transportable tablespace.
Things to Remember • Benchmark the best way for you • Benchmark against real data if possible • Use parallel DML
Design Summary • Create an architecture that lends itself to aging, archiving, deleting • This architecture should compensate for business requirements • For instance, customer orders not accessible after 6 months … or • top query performance needed for all ‘ACTIVE’ accounts … etc • Implement it – THE EASY PART
Post Archive Challenges “I have successfully deleted 10 billion rows from the table. HoooAhhhh! Performance will be great, space will be available, and I will get credit for optimizing our data warehouse application, saving the company billions of dollars”
Post Archive Challenges Hmmmmm. It looks like … - Queries are not any faster . . . - The Select count(*) took the same amount of time . . . - Space was not freed in Oracle (DBA_FREE_SPACE) . . . - Space was not freed in the operating system . . . WHY NOT ????? Where are the benefits ???
From Swiss to Provolone After Maintenance After DELETE
Post Archive Challenges • Statistics are not fresh • High Water Marks are very high • Space has not been freed within Oracle (if that’s what you want) • Space has not been freed to the OS