320 likes | 489 Views
Backup and restore of Oracle databases: introducing a disk layer. by Ruben Gaspar IT-DB-DBB. Agenda . CERN Oracle databases & Oracle backup basics Backup to disk implementation details Recovery platform Some bits of backup to disk backend Summary. Agenda .
E N D
Backup and restore of Oracle databases: introducing a disk layer by Ruben Gaspar IT-DB-DBB BR evolution: Backup to disk
Agenda • CERN Oracle databases& Oracle backup basics • Backup to disk implementation details • Recovery platform • Some bits of backup to disk backend • Summary BR evolution: Backup to disk- 2
Agenda • CERN Oracle databases& Oracle backup basics • Backup to disk implementation details • Recovery platform • Some bits of backup to disk backend • Summary BR evolution: Backup to disk- 3
Target Oracledatabases for backup to disk • ~70 Oracle databases, most of them running Oracle clusterware (RAC) • 49 are being backed up to disk and then tape • 21 are just backed up with snapshots. Test and development instances. • 15 Data Guard RAC clusters in Prod • Active Data Guard since upgrade to 11g • They are just backed up to tape • 10 Oracle single instance in DBaaS also backed up using snapshots. Redo Transport BR evolution: Backup to disk- 4
Oracle backup basics • The Oracle clock: System Change Number (SCN) • It will take 544 years to run out of SCN at 16K/s • smon_scn_time tracks time versus SCN • Type of backups • Consistent: taken while database has been cleanly shutdown. All redo applied to data files. Archive logs are not produced. • Inconsistent: taken while database is running. Database must be in archivelog mode. It means archive logs will be produced. Point in Time Recoveries (PITR) are possible. Drawback: clean-up of archivelogsis critical to avoid that database blocks → TSM was playing a critical role here • Backup sets: Oracle proprietary format for backups. Binary files. • Backup sets are containers for one or several backup pieces • Backup pieces contain blocks of 1 or several data files (multiplexing) • RMAN channels: disk or tape or proxy, read data files and write back to the backup media. We use SBT: serial backup to tape API, using IBM Tivoli Data Protection 6.3 (provided by TSM support) BR evolution: Backup to disk- 5
Oracle backup basics (II) • Backup jobs based on templates. Recovery Manager API --Full backup incremental level 0 database; --comulative backup incremental level 2 cumulative database; --Incremental backup incremental level 1database; --Archivelogs backup tag 'BR_TAG' archivelogall delete all input; • Retention policy from 60 to 90 days, depending on DB. CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 90 DAYS; e.g. LEMONRAC → [1xfull + 6xdifferential + archivelogs] * 13 weeks • Controlfile backup, automatically taken by each backup CONFIGURE CONTROLFILE AUTOBACKUP ON; e.g. LHCBSTG → [2xfull + 5xdifferential + 24x4 archivelogs] *13 weeks = 934GB 1 2 Arch Full Cum. Inc PITR BR evolution: Backup to disk- 6
What is there to be backed up ? • Backup jobs using RMAN API take care of : • Database files: user and system files • Control files: contain structure and status of data files. They have also all backup history • Archived logs: backup of redo logs. Needed for inconsistent backup strategies. They need to be backed up and removed from the active file system otherwise ifrunning out of space, database freezes/stops. • 5.1TB redo logs produced per day • ALL THREE ARE CRITICAL FOR A BACKUP/RECOVERY strategy BR evolution: Backup to disk- 8
Agenda • CERN Oracle databases& Oracle backup basics • Backup to disk implementation details • Recovery platform • Some bits of backup to disk backend • Summary BR evolution: Backup to disk- 9
Backup architecture • Custom solution: about 15k lines of code, Perl + Bash • Flexible: easy to adapt to new Oracle release, backup media • Based on Oracle Recovery Manager (RMAN) templates • Central logging • Easy to extend via Perl plug-ins: snapshot, exports, RO tablespaces,… • We send compressed: • 1 out of 4 full backups • All archivelogs BR evolution: Backup to disk- 10
Impact on TSM • Savings depend on database workload, e.g.: backup sets on disk for three databases + x 1/4 Sent to tape • + backup sets are compressed (see later) Source: TSM support Savings ~ 71% 17 5 BR evolution: Backup to disk- 11
Impact on TSM (II) 15 accounts: alicestg,atlasstg,cmsstg,castorns,.. Source: TSM support ~70% savings 29 accounts: pdb,wcernp,ITCORE,AISDBP,… ~47% savings BR evolution: Backup to disk- 12
Workflow for disk/tape backups • Same workflow as per tape backups → to ease maintenance • Disk or Tape templates are almost identical, just channel allocation differs • Disk channel allocation calculated on the fly considering available space in aggregate and file system: using Netapp management API called ZAPI • About 75 templates to adapt to all type of backup strategies • Tape and disk backup strategies co-exist • Reversible changing from one to another is a matter of changing templates. DISK BR evolution: Backup to disk- 13
Media Manager Server Typical DB architecture LAN Public interface Public interface 10GbE 10GbE RAC 10GbE Interconnect 10GbE 1 GbE C-mode Cluster interconnect Private network 10GbE 1 GbE 10GbE 6Gb/s 7-mode mgmt network 6Gb/s 01 02 03 04 backup01 backup02 IBM TSM Archivelogscontrofile datafiles At least 2 file systems for backup to disk: • /backup/dbsXX/DBNAME BR evolution: Backup to disk- 14
New C-mode features • Transparent file system movements: cluster01::> volume move start -destination-aggregate aggr1_c01n02 -vserver vs1 -volume castorns03 -cutover-window 10 • DNS load balancing inside the cluster • Automaticvirtual IP rebalancing(based on failover groups) • Access security via “export-policy” joins firewall + different authentication mechanisms: sys, krb5, ntlm • Global namespace • Compression and Deduplication • We strongly rely on compression as the way to satisfy 2.3PB of backup set storage needs using 1.1PB of disk BR evolution: Backup to disk- 15
Backup to disk configuration on database servers • Global namespace in use: /backup/dbsXX • Ease management: mount point unchanged as data moves. It’s a Netapp C-mode feature (see later) 7-mode: mount –o … priv-controllerIP:/vol/castorns03 /ORA/dbs03/CASTOR C-mode: mount -o … public-ip-cluster:/backup/dbs01/CASTORNS/backup/dbs01/CASTORNS /backup/dbs01/<DBNAME> →autobackupcontrolfile + backupsets /backup/dbsXX/<DBNAME> → backupsets • RMAN configuration parameters: minimal change • CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '/backup/dbs01/<DBNAME>/<DBNAME>_%F'; BR evolution: Backup to disk- 16
Particular cases • Solution also operational in a Data Guard configuration: full and incremental taken on standby (more while talking about restores) • Multiple channels: rman_channels_connectin order to distribute backup load • Plug-in for RO tablespaces backup (ACCLOG: size about 170TB, growth 70TB/year) • Automatic clean-up in case of tablespace state change • One backup set per tablespace • Extension to allow special mount points (ACCLOG) • rman_mounts_readonly full + incremental + controlfile archivelogs + controlfile Redo Transport Primary Database Active Data Guard for users’ access and for disaster recovery username/password@rac-node1 username/password@rac-node2 BR evolution: Backup to disk- 17
Backup to disk performance • Backups run faster ~ 50% than on tape ACCLOG full backup 5TB 34 hours ~ 35 MB/s Tape 14 hours ~ 100MB/s Disk • Sending backup sets from disk to tape needs optimisation • Work on progress with TSM support BR evolution: Backup to disk- 18
Backup to Disk space consumption • Channels order is important → storage management • Space distribution should be according planning to avoid miss balance. File systems should grow at same pace. • Emptiest volume is always selected on top Automatic size extension BR evolution: Backup to disk-19
Agenda • CERN Oracle databases& Oracle backup basics • Backup to disk implementation details • Recovery platform • Some bits of backup to disk backend • Summary BR evolution: Backup to disk- 20
Recovery platform • Only reliable proof of truth: run a recovery • Any change introduce in backup platform/backup strategy is always validated via test recoveries • Isolation • Run independently of the production database • Cant access any other system (database network links) • No user jobs must run • Flexibleand easy to customize • Maximizerecovery server: several recoveries at the same time • Exports taken after a successful recovery → help in support cases: mainly logical errors • Open source: http://sourceforge.net/projects/recoveryplat/ BR evolution: Backup to disk-21
Recovery platform (II) • Introducing disk buffer highly improves our recovery testing • Also tested with Data Guard configurations: • Data Guard: Oracle support ID 1070039.1 RMAN>set backup files for device type disk toaccessible • Restore from disk are usually 50% faster • More recoveries can be run, nowadays about 40 recoveries per week • No blocking of tape resources that could be used by backups BR evolution: Backup to disk- 22
Agenda • CERN Oracle databases& Oracle backup basics • Backup to disk implementation details • Recovery platform • Some bits of backup to disk backend • Summary BR evolution: Backup to disk- 23
Backup to disk cluster • 2xFAS6240 Netapp controllers • 24xdiskshelf DS4243 • 24x3TB SATA disks each (576 disks) • raid_dp (raid6) → 1.1 PB usable space split into 8 aggregates ~ 135TB each • 2xquad core 64bitIntel(R) Xeon(R) CPU E5540 @ 2.53GHz • 10gbps connectivity • Multipath SAS loops 3 gbps • Flash cache 512GB per node BR evolution: Backup to disk- 24
How fast, How compressed *Ontap 8.1.1. fas6240, 72x 3TB SATA disks. BR evolution: Backup to disk- 25 • Compression (datafiles) • Online compression of datafiles ~55% (saved by compression) • Backupsets compression of a 501 GB tablespace of random alphanumeric strings, dbms_random.
Compression: real values *Space used on controller side Logical space used: Used + Saved BR evolution: Backup to disk-26
NAS controllers throughput net_data_recv disk_data_written compression ratio BR evolution: Backup to disk- 27
Deduplication • When combined with compression, it doesn’t provide good results • Due to the way compression works: compression group: 32k, our Oracle block is 8k, Wafl block is 4k • Control files are a different story. Block size of 16k 4k 4k DB Type Location Size(GB) PAYP archives /backup/dbs01 0.91 PAYP archives /backup/dbs02 22.90 PAYP controlfile/backup/dbs01 456.92 PAYP fullinc /backup/dbs01 68.00 PAYP fullinc/backup/dbs02 81.10 Checksum BR evolution: Backup to disk- 28
Agenda • CERN Oracle databases& Oracle backup basics • Backup to disk implementation details • Recovery platform • Some bits of backup to disk backend • Summary BR evolution: Backup to disk- 29
Summary • Backup and Recovery testing is critical • Tape copies are essential but TSM became a critical point of failure for DB services • Adding a disk buffer • Removes TSM criticality • Reduces DB volume in TSM • Speeds up backups and restores • Better response time • Better resource utilization • Disk buffer plug-ins were easily integrated in our backup framework • First system to exploit Ontap C-mode features • Valuable experience for the future BR evolution: Backup to disk- 30
Questions ? BR evolution: Backup to disk- 31