140 likes | 165 Views
Learn about Oracle's features and techniques for efficient data archiving, including use cases, challenges, and solutions for managing large databases. Explore scenarios and considerations for effective data movement and purging strategies.
E N D
Oracle Solutions for Data Archiving Luca Canali, CERN/IT-DM Distributed Database Operations Workshop November 11th, 2008
Data life cycle management - WHAT • Use cases ultimately come from the experiments • Typically require removal of large chunks of data on a time-based selection • Related use case: put read only part of data
Data life cycle management - WHY • Manageability • A database should not become a ‘data dump’ • Very large DBs are hard to manage • Performance • Some application-critical SQL statements may not scale well (full scans, hash joins, etc) • More and more attention to SQL tuning is needed with very large tables • Cost • When the DB grows extra HW resources are needed to maintain the service levels • ex: extra spindles to provide IOPS to read stale data
Why it’s hard • There is no default Oracle mechanism to archive parts of applications • It must be custom implemented • Rules must be set in place by the ‘data owners’ on how to move and restore • What can be of help • Oracle features for bulk data movement • Typically require DBA privileges • Additional HW to hold ‘offline data’
Oracle and large data movements • Normal DML is too slow • delete operations in particular • Alternatives • DDL such as partition exchange/movement • Transportable tablespace • CTAS (create table a select) • Datapump
Oracle and large data sets • Data (tables) need to have the possibility to allow extracting ‘chunks’ in a time-based manner • Oracle partitioning by time is a possibility • Beware of many limitations of partitioning (it’s not a magic wand) • Do-it-yourself partitioning by generating multiple tables/schema every year (for example)
Additional challenge • Lookup tables/metadata is needed too • Besides the main table(s) to archive • Can be a large number of tables • Not easy to identify a ‘consistent set’ to archive
Technology • Main requirements • Must be able to restore the data if and when need (over a time scale of 10 years, i.e. 3-4 Oracle versions) • Must keep track of what is where • Should use one of the few Oracle technologies for moving large chunks of data • Possible solutions • Move data to a specialized Oracle database to collect ‘application chunks’ • Export to files (flat files or datapump) is not considered reliable given current experience
Possible Scenarios • Scenario 1 • Application owners identify consistent sets of data to be moved • Identified data sets are moved by the DBAs • Data is then dropped from production. • requires considerable effort from the developers • gives the cleaner result
Possible Scenarios • Scenario 2 • A clone (complete copy) of the selected applications is taken (for example at the end of each year) and archived according to the chosen archive technology • Application owners then take care of deleting from production trusting that data is archived and kept. • easy to implement • more data than strictly needed is copied
Conclusions • LHC databases are growing fast • Offline data for significant performance and cost gains • Mechanisms for offlining are application dependent • DB services and ideas to provide tools to move and restore data • A large Oracle ‘Archive DB’ could be part of the solution
Q&A Thank you