110 likes | 126 Views
Explore CERN's Oracle B&R policies, techniques, performance measurements, and recovery scenarios for maintaining data integrity. Learn about RMAN, custom scripts, incremental backups, and more.
E N D
Oracle B&R for Physics Databases LCG 3D Workshop, September 2006 Luca Canali, CERN IT
Outline • Oracle Backup & Recovery • Goals and B&R policies • Technology choices • CERN’s use of RMAN • Architecture of our backup solution • Deployment • Performances • Lessons learned LCG 3D Workshop, CERN 14-09-2006 - 2
B&R • ‘Thou shall not lose data’ • In case of failures and corruption • (HW, software, human error) • Full and ‘point in time’ recoveries • Disaster recovery • B&R: agree on a policy with the users • Retention period • Max Recovery time • Disaster recovery scenarios LCG 3D Workshop, CERN 14-09-2006 - 3
Oracle B&R • Many techniques, no out-of-the box solution • May change with ‘oracle secure backup’ and EM • Need additional effort • Techniques • RMAN + media manager (recommended) • ‘Manual scripts’ and manual tape archiving (low budget) • Snapshot copies (high-end storage) • Additional effort: • Develop, maintain and schedule backup scripts • Interface with the ‘tape group’ / sysadmin • Overall the DBA team must maintain specialized B&R knowledge LCG 3D Workshop, CERN 14-09-2006 - 4
RMAN @ CERN • RMAN is used to backup all production DBs • The Media Manager is Tivoli (IBM) • Run by a specialized group • Holds several 10s of TB of space • High-end architecture and currently growing • Physics DBs are the main clients • Tape drives are expensive • tradeoff performance - cost LCG 3D Workshop, CERN 14-09-2006 - 5
Custom Scripts • Scripts developed in house • Dedicated server for scheduling and monitoring • RMAN 10g backups scheduled: • Level 0 to tape • Level 1 cumulative to tape • Level 1 differential to tape • Extra protection (recommended): • Full backup to disk • Leverage 10g incremental recovery of copy to disk • We keep the backup to disk 2 days behind production • Note: need a large flash recovery area LCG 3D Workshop, CERN 14-09-2006 - 6
Backup scheduling and retention • Has to be agreed upon with the experiments • Current status for LHC DBs • Full, every 2 weeks • Level 1 cumulative, twice per week • Level 1 differential, every day • Archive logs, every hour • Retention • RMAN policy: retention window of 31 days • Guarantees backups are kept for recoveries at any point in time during last 31 days LCG 3D Workshop, CERN 14-09-2006 - 7
Recovery Scenarios • Typical scenarios: • Point in time recovery because of human error • Full recovery because of HW fault • Disaster recovery from tape • Point in time recoveries • Time consuming • Better done in a separate environment • 10g ‘flashback’ technology can often be used instead of backups to solve the underlying issue LCG 3D Workshop, CERN 14-09-2006 - 8
Test Recovery System A cluster dedicated to test recoveries • Installed as the production DBs • Used to restore production backups • All production procedure must tested • RMAN has an unfriendly syntax that needs practice on a regular basis (~ twice per year) • Used for point in time recoveries • To be exported back to production • Can be activated as production, if needed LCG 3D Workshop, CERN 14-09-2006 - 9
Performance • Measurements from a VLDB: Compass • Full backup • Backup with 2 channels • Every tape channel has a throughput of ~30 MB/sec • Full backup to tape: 3.5 TB in 16 hours • Incremental backups • We use block change tracking • Average daily activity for the month August 2006 • Block_change_tracking file = 420 MB • Incremental level 1 (differential): 9 minutes • Backup size to tape: 16 GB • Corresponding archived redo logs: 58 GB of redo, 230 files LCG 3D Workshop, CERN 14-09-2006 - 10
Conclusions • B&R for Oracle at CERN: • RMAN and Tivoli • Custom scripts + dedicated backup scheduler server • Incremental backup strategy • Backups to disk complementing tape • Running for many years • Define and test main recovery scenarios • With dedicated HW • Performance measurements • Block change tracking for lvl1 backups LCG 3D Workshop, CERN 14-09-2006 - 11