1 / 70

Oracle Data Archiving Taming the Beast

Oracle Data Archiving Taming the Beast. Dave Moore Neon Enterprise Software. Agenda. Archiving Defined Requirements and Solutions Oracle Archiving Strategies Oracle Row Removal Options Oracle Post Archive Operations. Dave. Oracle ACE Using Oracle since 1991

jaden
Download Presentation

Oracle Data Archiving Taming the Beast

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Oracle Data ArchivingTaming the Beast Dave Moore Neon Enterprise Software

  2. Agenda Archiving Defined Requirements and Solutions Oracle Archiving Strategies Oracle Row Removal Options Oracle Post Archive Operations

  3. Dave • Oracle ACE • Using Oracle since 1991 • Product Author at Neon Enterprise Software • Creator of OracleUtilities.com • Author of “Oracle Utilities” from Rampant Tech Press • Core competencies include performance, utilities and data management

  4. Database Archiving Database Archiving: The process of removing selected data records from operational databases that are not expected to be referenced again and storing them in an archive data store where they can be retrieved if needed. Purge

  5. Compliance Protection Amount of Data 0 30+ Yrs Time Required Trends Impacting Archive Needs Data Retention Issues: Volume of data Length of retention requirement Varied types of data Security issues

  6. Archiving All Types of Data Paper Blueprints Forms Claims Word Excel PDF XML IMS DB2 ORACLE SYBASE SQL Server IDMS VSAM Programs UNIX Files Outlook Lotus Notes Attachments Sound Pictures Video

  7. Data Archiving and ILM Create Discard Operational Reference Archive Needed for completing business transactions Needed for reporting or expected queries Needed for compliance and business protection Mandatory Retention Period

  8. Some Sample Regulations Impacting Data Retention

  9. What Does It All Mean? • Enterprises must recognize that there is a business value in organizing their information and data. • Organizations that fail to respond run the risk of seeing more of their cases decided on questions of process rather than merit. • (Gartner, 20-April-2007, Research Note G00148170: Cost of E-Discovery Threatens to Skew Justice System)

  10. Operational Efficiency • Database Archiving can be undertaken to improve operational efficiency • Large volumes of data can interfere with production operations • efficiency of transactions • efficiency of utilities: BACKUP/RESTORE, REORG, etc. • Storage • Gartner: databases copied an average of 6 times!

  11. What Solutions Are Out There? • Keep Data in Operational Database • Problems with authenticity of large amounts of data over long retention times • Store Data in UNLOAD files (or backups) • Problems with schema change and reading archived data; using backups poses even more serious problems • Move Data to a Parallel Reference Database • Combines problems of the previous two • Move Data to a Database Archive

  12. Production Database Data Extract Recall Database Archive Data Store and Retrieve Archive Data Query Access Captured Structure Archive Policies Data Retention Archive Store Data & Metadata Metadata Policies History Archive Administration Components of aDatabase Archiving Solution Data Recall

  13. Archiving Requirements • Policy based archiving: logical selection • Keep data for very long periods of time • Store very large amounts of data in archive • Maintain Archives for ever changing operational systems • Become independent from Applications/DBMS/Systems • Protect authenticity of data • Access data when needed; as needed • Discard data after retention period automatically

  14. Policy based archiving • Why : • Business objects are archived, not files • Rules for when something is ready can be complex • Data ready to be archived is distributed over database • Implications: • User must provide policies for when something is to be archived • How: • Full metadata description of data • Flexible specification of policy : “WHERE clause”

  15. SUMMARY BY QUARTER DISBURSEMENT ORDER INFO STORAGE INFO Part Number Year Q1 Disbursed Q2 Disbursed Q3 Disbursed Q4 Disbursed Part Number PO Number Vendor ID Quantity Ordered Unit Cost Date Ordered Date Received Part Number Bin Number Qty on Hand Qty on Order Qty Backorder Part Number Dept. ID CHIT ID Qty Disbursed Date Disbursed For Example… Parts Master is the parent table to all other tables PARTS MASTER Part Number Type Description Unit Type Cost Price Substitute Parts

  16. Keep Data for a Long Time • Why: retention requirements in decades • Implications: • Archive will outlive applications/DBMS/systems that generated them • Archive will outlive people who designed and managed operational systems • Archive will outlive media we store it on • How: • Unique data store • Application/DBMS/system independence • Metadata independence • Continuous management of storage • Continuous management of archive content

  17. Maintain Archive for Changing Operational Systems • Why : • Metadata changes frequently • Applications are re-engineered periodically • Change DBMS platform • Change System platform • Replace with new application • Consolidate after merger or acquisition • Implications: • Archive must support multiple variations of an application • Archive must deal with metadata changes • How: • Manage applications as major archive streams having multiple minor streams with metadata differences • Achieve independence from operating environment

  18. Achieve Metadata Independence • Why : • Operational metadata is inadequate • Operational metadata changes • Operational systems keep only the “current” metadata • Data in archive often does not mirror data in operational structures • Implications: • Archive must encapsulate metadata • Metadata must be improved • How: • Metadata Capture, Validate, Enhance capabilities • Store structure that encapsulates with data • Keeps multiple versions of metadata

  19. Protect Authenticity of Data • Why : • Potential use in lawsuits/ investigations • Potential use in business analysis • Implications: • Protect from unwanted changes • Show original input • Cannot be managed in operational environment • How: • SQL Access that does not support I/U/D • Do not modify archive data on metadata changes • Encryption as stored • Checksum for detection of sabotage • Limit access to functions • Audit use of functions • Maintain offsite backup copies for restore if sabotaged

  20. Access Data Directly From Archive • Why : • Cannot depend on application environment • Implications: • Full access capability within archive system • How: • Industry standard interface (e.g. JDBC) • LOAD format output for • For load into a database • May be different from source database • Requires full and accurate metadata • Ability to review metadata • Ability to function across metadata changes

  21. Discard Function • Why : • Legal exposure for data kept too long • Implications: • Data cannot be kept in archive beyond retention period • Must be removed with no exposure to forensic software • How: • Policy based discard • System level function • Tightly controlled and audited • True “zero out” capability • Discard from backups as well

  22. Database or Archive? Keep in DB Keep in Archive Performance Space Compliance

  23. Based on Data Availability Purge Keep in DB Keep in Archive Must be Available to App Must be Available Must Be Secure Not Needed

  24. Oracle Archiving Strategies • Designed Up Front (Yeah, right) • Determined by Application Owner • Implemented by ____________ • Utilize Oracle Features

  25. Finding Large Tables • DBA_SEGMENTS (bytes) • DBA_TABLES (num_rows) • or based on I/O

  26. Rolling Windows • Self Managing • Mostly based on DATE • Utilize DBMS Features • Partitioning • Transportable Tablespaces • Exchange Partition • Set tablespace read only • Expdp • Copy export file and data file

  27. Rolling Windows via Partitioning P1 … … P47 Probably Never Accessed Rarely Accessed Heavily Accessed Data Profile Cheap as you can get Not so fast or expensive Fast, expensive Storage Profile Read / Write Read Only / Compressed Read Only

  28. Why not use transportable tablespaces or Oracle exports for data retention?

  29. The Problem with Oracle Files • Transportable Tablespaces • Exports • Backups Oracle Export Files & Datafiles Import Trans Tsp Version 16Z Year 2007 Year 2030 Not a good method for LT Data Retention

  30. Partitioning (Old ways) • Range Partitioning Data is distributed based on partition key range of values – usually a date. Good When: Data is date-based.

  31. Partitioning (Old Ways) • Hash Partitioning Uses hash algorithm to create equally sized buckets of data. Good When: No natural partition key and desire I/O balancing (hot spots).

  32. Partitioning (Old Ways) • List Partitioning Data is distributed based on LIST of values in partition key. Good When: Have short list of values (States, Regions, Account Types)

  33. Partitioning (New Ways – 11G) • Interval Partitioning Initial Partition is created manually, the rest are automatically created as new data arrives. Good When: Need a rolling window!

  34. Partitioning (New Ways – 11G) • REF partitioning Related Tables benefit from same partitioning strategy, whether column exists in children or not! Good When: Desire related data to be partitioned in the same manner.

  35. Partitioning (New Ways – 11G) • Virtual Column Partitioning Partition key may be based on virtual column Good When: Virtual column is required for partition key.

  36. Rows Gotta Go

  37. Row Removal Options • SQL DELETE • CTAS / DROP / RENAME • TRUNCATE • Row Marking

  38. SQL DELETE • Good for small number of rows • RI handled automatically • Oracle was born to DELETE, better than any PL/SQL that you write. • Issue with Un-indexed Foreign Keys 

  39. DELETE Optimization • Work in batches, committing (only when programmatically DELETING) • Use parallel DML (Partitioned tables only) • Drop Indexes before (if possible) • Index FK columns

  40. CTAS • Works well for PURGE, not archive • Perfect when you want to keep low percentage of rows in the table • Doesn’t handle RI – no DELETE was issued. • Process • Create table with rows you want to keep • Drop old table • Rename table • Recreate indexes create table new_table unrecoverable as select * from old_table where ...

  41. TRUNCATE • Congratulations if your application lends itself to TRUNCATE without losing new data • What about RI? • May truncate or drop individual partitions

  42. DROP • DROP PARTITION • What would you do before you drop it? • Exchange partition with table • Transportable tablespace.

  43. Things to Remember • Benchmark the best way for you • Benchmark against real data if possible • Use parallel DML

  44. Design Summary • Create an architecture that lends itself to aging, archiving, deleting • This architecture should compensate for business requirements • For instance, customer orders not accessible after 6 months … or • top query performance needed for all ‘ACTIVE’ accounts … etc • Implement it – THE EASY PART

  45. Post Archive Challenges

  46. Post Archive Challenges “I have successfully deleted 10 billion rows from the table. HoooAhhhh! Performance will be great, space will be available, and I will get credit for optimizing our data warehouse application, saving the company billions of dollars”

  47. … 2 Days Later …

  48. Post Archive Challenges Hmmmmm. It looks like … - Queries are not any faster . . . - The Select count(*) took the same amount of time . . . - Space was not freed in Oracle (DBA_FREE_SPACE) . . . - Space was not freed in the operating system . . . WHY NOT ????? Where are the benefits ???

  49. From Swiss to Provolone After Maintenance After DELETE

  50. Post Archive Challenges • Statistics are not fresh • High Water Marks are very high • Space has not been freed within Oracle (if that’s what you want) • Space has not been freed to the OS

More Related