Backup and restore of Oracle databases: introducing a disk layer

Backup and restore of Oracle databases: introducing a disk layer by Ruben Gaspar IT-DB-DBB BR evolution: Backup to disk

Agenda • CERN Oracle databases& Oracle backup basics • Backup to disk implementation details • Recovery platform • Some bits of backup to disk backend • Summary BR evolution: Backup to disk- 2

Target Oracledatabases for backup to disk • ~70 Oracle databases, most of them running Oracle clusterware (RAC) • 49 are being backed up to disk and then tape • 21 are just backed up with snapshots. Test and development instances. • 15 Data Guard RAC clusters in Prod • Active Data Guard since upgrade to 11g • They are just backed up to tape • 10 Oracle single instance in DBaaS also backed up using snapshots. Redo Transport BR evolution: Backup to disk- 4

Oracle backup basics • The Oracle clock: System Change Number (SCN) • It will take 544 years to run out of SCN at 16K/s • smon_scn_time tracks time versus SCN • Type of backups • Consistent: taken while database has been cleanly shutdown. All redo applied to data files. Archive logs are not produced. • Inconsistent: taken while database is running. Database must be in archivelog mode. It means archive logs will be produced. Point in Time Recoveries (PITR) are possible. Drawback: clean-up of archivelogsis critical to avoid that database blocks → TSM was playing a critical role here • Backup sets: Oracle proprietary format for backups. Binary files. • Backup sets are containers for one or several backup pieces • Backup pieces contain blocks of 1 or several data files (multiplexing) • RMAN channels: disk or tape or proxy, read data files and write back to the backup media. We use SBT: serial backup to tape API, using IBM Tivoli Data Protection 6.3 (provided by TSM support) BR evolution: Backup to disk- 5

Oracle backup basics (II) • Backup jobs based on templates. Recovery Manager API --Full backup incremental level 0 database; --comulative backup incremental level 2 cumulative database; --Incremental backup incremental level 1database; --Archivelogs backup tag 'BR_TAG' archivelogall delete all input; • Retention policy from 60 to 90 days, depending on DB. CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 90 DAYS; e.g. LEMONRAC → [1xfull + 6xdifferential + archivelogs] * 13 weeks • Controlfile backup, automatically taken by each backup CONFIGURE CONTROLFILE AUTOBACKUP ON; e.g. LHCBSTG → [2xfull + 5xdifferential + 24x4 archivelogs] *13 weeks = 934GB 1 2 Arch Full Cum. Inc PITR BR evolution: Backup to disk- 6

BR evolution: Backup to disk- 7

What is there to be backed up ? • Backup jobs using RMAN API take care of : • Database files: user and system files • Control files: contain structure and status of data files. They have also all backup history • Archived logs: backup of redo logs. Needed for inconsistent backup strategies. They need to be backed up and removed from the active file system otherwise ifrunning out of space, database freezes/stops. • 5.1TB redo logs produced per day • ALL THREE ARE CRITICAL FOR A BACKUP/RECOVERY strategy BR evolution: Backup to disk- 8

Backup architecture • Custom solution: about 15k lines of code, Perl + Bash • Flexible: easy to adapt to new Oracle release, backup media • Based on Oracle Recovery Manager (RMAN) templates • Central logging • Easy to extend via Perl plug-ins: snapshot, exports, RO tablespaces,… • We send compressed: • 1 out of 4 full backups • All archivelogs BR evolution: Backup to disk- 10

Impact on TSM • Savings depend on database workload, e.g.: backup sets on disk for three databases + x 1/4 Sent to tape • + backup sets are compressed (see later) Source: TSM support Savings ~ 71% 17 5 BR evolution: Backup to disk- 11

Impact on TSM (II) 15 accounts: alicestg,atlasstg,cmsstg,castorns,.. Source: TSM support ~70% savings 29 accounts: pdb,wcernp,ITCORE,AISDBP,… ~47% savings BR evolution: Backup to disk- 12

Workflow for disk/tape backups • Same workflow as per tape backups → to ease maintenance • Disk or Tape templates are almost identical, just channel allocation differs • Disk channel allocation calculated on the fly considering available space in aggregate and file system: using Netapp management API called ZAPI • About 75 templates to adapt to all type of backup strategies • Tape and disk backup strategies co-exist • Reversible changing from one to another is a matter of changing templates. DISK BR evolution: Backup to disk- 13

Media Manager Server Typical DB architecture LAN Public interface Public interface 10GbE 10GbE RAC 10GbE Interconnect 10GbE 1 GbE C-mode Cluster interconnect Private network 10GbE 1 GbE 10GbE 6Gb/s 7-mode mgmt network 6Gb/s 01 02 03 04 backup01 backup02 IBM TSM Archivelogscontrofile datafiles At least 2 file systems for backup to disk: • /backup/dbsXX/DBNAME BR evolution: Backup to disk- 14

New C-mode features • Transparent file system movements: cluster01::> volume move start -destination-aggregate aggr1_c01n02 -vserver vs1 -volume castorns03 -cutover-window 10 • DNS load balancing inside the cluster • Automaticvirtual IP rebalancing(based on failover groups) • Access security via “export-policy” joins firewall + different authentication mechanisms: sys, krb5, ntlm • Global namespace • Compression and Deduplication • We strongly rely on compression as the way to satisfy 2.3PB of backup set storage needs using 1.1PB of disk BR evolution: Backup to disk- 15

Backup to disk configuration on database servers • Global namespace in use: /backup/dbsXX • Ease management: mount point unchanged as data moves. It’s a Netapp C-mode feature (see later) 7-mode: mount –o … priv-controllerIP:/vol/castorns03 /ORA/dbs03/CASTOR C-mode: mount -o … public-ip-cluster:/backup/dbs01/CASTORNS/backup/dbs01/CASTORNS /backup/dbs01/<DBNAME> →autobackupcontrolfile + backupsets /backup/dbsXX/<DBNAME> → backupsets • RMAN configuration parameters: minimal change • CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '/backup/dbs01/<DBNAME>/<DBNAME>_%F'; BR evolution: Backup to disk- 16

Particular cases • Solution also operational in a Data Guard configuration: full and incremental taken on standby (more while talking about restores) • Multiple channels: rman_channels_connectin order to distribute backup load • Plug-in for RO tablespaces backup (ACCLOG: size about 170TB, growth 70TB/year) • Automatic clean-up in case of tablespace state change • One backup set per tablespace • Extension to allow special mount points (ACCLOG) • rman_mounts_readonly full + incremental + controlfile archivelogs + controlfile Redo Transport Primary Database Active Data Guard for users’ access and for disaster recovery username/password@rac-node1 username/password@rac-node2 BR evolution: Backup to disk- 17

Backup to disk performance • Backups run faster ~ 50% than on tape ACCLOG full backup 5TB 34 hours ~ 35 MB/s Tape 14 hours ~ 100MB/s Disk • Sending backup sets from disk to tape needs optimisation • Work on progress with TSM support BR evolution: Backup to disk- 18

Backup to Disk space consumption • Channels order is important → storage management • Space distribution should be according planning to avoid miss balance. File systems should grow at same pace. • Emptiest volume is always selected on top Automatic size extension BR evolution: Backup to disk-19

Recovery platform • Only reliable proof of truth: run a recovery • Any change introduce in backup platform/backup strategy is always validated via test recoveries • Isolation • Run independently of the production database • Cant access any other system (database network links) • No user jobs must run • Flexibleand easy to customize • Maximizerecovery server: several recoveries at the same time • Exports taken after a successful recovery → help in support cases: mainly logical errors • Open source: http://sourceforge.net/projects/recoveryplat/ BR evolution: Backup to disk-21

Recovery platform (II) • Introducing disk buffer highly improves our recovery testing • Also tested with Data Guard configurations: • Data Guard: Oracle support ID 1070039.1 RMAN>set backup files for device type disk toaccessible • Restore from disk are usually 50% faster • More recoveries can be run, nowadays about 40 recoveries per week • No blocking of tape resources that could be used by backups BR evolution: Backup to disk- 22

Backup to disk cluster • 2xFAS6240 Netapp controllers • 24xdiskshelf DS4243 • 24x3TB SATA disks each (576 disks) • raid_dp (raid6) → 1.1 PB usable space split into 8 aggregates ~ 135TB each • 2xquad core 64bitIntel(R) Xeon(R) CPU E5540 @ 2.53GHz • 10gbps connectivity • Multipath SAS loops 3 gbps • Flash cache 512GB per node BR evolution: Backup to disk- 24

How fast, How compressed *Ontap 8.1.1. fas6240, 72x 3TB SATA disks. BR evolution: Backup to disk- 25 • Compression (datafiles) • Online compression of datafiles ~55% (saved by compression) • Backupsets compression of a 501 GB tablespace of random alphanumeric strings, dbms_random.

Compression: real values *Space used on controller side Logical space used: Used + Saved BR evolution: Backup to disk-26

NAS controllers throughput net_data_recv disk_data_written compression ratio BR evolution: Backup to disk- 27

Deduplication • When combined with compression, it doesn’t provide good results • Due to the way compression works: compression group: 32k, our Oracle block is 8k, Wafl block is 4k • Control files are a different story. Block size of 16k 4k 4k DB Type Location Size(GB) PAYP archives /backup/dbs01 0.91 PAYP archives /backup/dbs02 22.90 PAYP controlfile/backup/dbs01 456.92 PAYP fullinc /backup/dbs01 68.00 PAYP fullinc/backup/dbs02 81.10 Checksum BR evolution: Backup to disk- 28

Summary • Backup and Recovery testing is critical • Tape copies are essential but TSM became a critical point of failure for DB services • Adding a disk buffer • Removes TSM criticality • Reduces DB volume in TSM • Speeds up backups and restores • Better response time • Better resource utilization • Disk buffer plug-ins were easily integrated in our backup framework • First system to exploit Ontap C-mode features • Valuable experience for the future BR evolution: Backup to disk- 30

Questions ? BR evolution: Backup to disk- 31

Backup and restore of Oracle databases: introducing a disk layer

Backup and restore of Oracle databases: introducing a disk layer

Presentation Transcript

Secondary Storage

Oracle Database Administration

Guide to Oracle 10 g

Oracle and NetApp

Oracle Administration and Monitoring Tools for Windows

Oracle Partitioning in Oracle Database 11g

NoCOUG 2005 Winter Conference Oracle 10g Backup and Recovery New Features

The Data Link Layer

DATALINK LAYER

Networking

Databases

Chapter 22: Distributed Databases

The Data Link Layer

ITRS 2005 Factory Integration Chapter Material Handling Backup Section

Oracle Join Techniques

Select Operation- disk access and Indexing *Some info on slides from Dr. S. Son, U. Va

Analyzing Your Data with Analytic Functions

Active Directory Disaster Recovery