DB On Demand A DB as a Service story

Daniel Gomez Blanco Ignacio Coterillo Coz David Collados Polidura Ruben Domingo Gaspar Aparicio DB On DemandA DB as a Service story ITTF - 13th June 2014

Agenda • Manifesto • Current status • Architecture • Demo • Management • Infrastructure • Monitoring • Data protection: backups and recoveries in detail • Future development • Summary

Manifesto • https://cern.ch/twiki/bin/view/DB/DBOnDemandManifesto • Making users database owners • Full DBA privileges • Covers a demand from CERN community not addressed by the Oracle service • Different RDBMS: MySQL, PostgreSQLand Oracle • No access to underlying hardware • Foreseen as single instance service • No DBA support or application support • No vendor support (except for Oracle) • It provides tools to manage DBA actions: configuration, start/stop, upgrades, backups & recoveries, instance monitoring

Current status • 139 databases • 115 MySQL • 12 Oracle 11g • 1 Oracle 12c • 11 PostgreSQL • 3 TB used and 8.5 TB reserved • 3000 user schemas • 11500 executions per second • Used by 38 groups in 10 departments • Versions: • MySQL Community Edition migrating to 5.6.17 • PostgreSQLmigrating to 9.2.8 • Oracle migrating to Oracle12c (12.1.0.2 still in beta)

Current status

Current status • Openstack • Puppetdb (MySQL) • Lhcb-dirac • Atlassian databases • LCG VOMS • Geant4 • Hammerclouddbs • Webcast • QC LHC Splice • FTS3 • DRUPAL • CernVM • VCS • IAXO • UNOSAT • …

Architecture Oracle VM & physical servers Syscontrol ORACLE EM Storage network FIM WEB APP RACMON DB DBOD DB FIM DB https://cern.ch/resources https://cern.ch/dbondemand Data Logs https://oem.cern.ch CERN AI MONITORING DBOD WS RACMON DAEMON Diag (Oracle) DB client

https://cern.ch/dbondemand

IT DB framework (Syscontrol) • IT-DB in-house developed management and distribution system (pre-AI) • Distributes configuration programs and resources • Currently of being deprecated in favor of Koji, Puppet, etc. • Based on an LDAP directory whose entries match physical resources and services • Every DBOD instance is represented by a Syscontrol LDAP entry • Used as a source of information for Puppet managed configurations

DBOD Daemon • Small program which: • Fetches to-be executed jobs from the database • Manage jobs execution (via Syscontrol) • Carries job post-execution tasks, if necessary • Updates the application DB with job results and instance status • Executes around 150 jobs run per day • Modular design with focus on expansion • Easy to add support for new systems (MWOD) • Reusable code

DBOD State Checker • Part of the daemon package. • Cronmanaged script which periodically checks each instance availability and accordingly updates its status in the DB • Necessary to correctly display externally caused changes to the status of the service instances (e.g. host downtime, network issues, etc) in the user interface

Infrastructure • 139 database instances • 83% MySQL, 9% Oracle, 8% PostgreSQL • 52 hosts: 60% VMs, 40% physical, 2 clusters of 4 nodes • Currently: • (deprecated) Virtualizationsolutionbasedon RHEL + OVM • Physicalshared servers onSafeHost • HA Clustersolutionbasedon Oracle CRS

High Availability • Driven by demand/need, not initially in the plans • Not relying on virtualization features so far (it may change in the future, as OpenStack evolves) • 4 node clusters (previously 2 nodes) • 3 nodes active + 1 as spare • Nowadays 1 cluster running under Oracle clusterware 12.1.0.1. Another one will be in production soon. • Clusterware controls: • Virtual IP • RDBMS instance • PostgreSQL and MySQL instances can co-exist, different versions supported.

High Availability Testing the cluster (MySQL & Postgresql instances) • For instances running on an Oracle cluster ware, care must be taken in case of server crash for MySQL instances. • "InnoDB: Unable to lock ./ibdata1, error: 11" Error Sometimes Seen With MySQL on NFS (Doc ID 1522745.1)

Infrastructure: Hardware servers NetApp cluster • Dell blades PowerEdge M610 • 2x Quad-Core Intel Xeon @ 2.53GHz • 48 GB RAM • Transtec Database server • 2x eight-coreIntel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz • 128 GB RAM Private Network Public Network 10GbE Next release

Migration to the CERN Agile Infrastructure • IT-DB Virtualization infrastructure is being migrated from RHEL + OVM to the standard CERN AI OpenStack setup (KVM + SLC) • Storage access performance is vital to DB applications • IT-DB runs its own OpenStackinstallation on servers physically connected to its storage servers for performance reasons • * NAS performance tests on this setup will be presented in the following weeks

Migration to the CERN Agile Infrastructure • DBOD customized RPM packages for MySQL and PostgreSQL servers already built using Koji • A Puppet module configures each host according to the instance-resource relations stored on the Syscontrol LDAP directory • NAS Volumes, service startup scripts, users, etc.

Monitoring • As mentioned, 139 database instances • 83% MySQL, 9% Oracle, 8% PostgreSQL • 52 hosts: 60% VMs, 40% physical, 2 clusters of 4 nodes • Different monitoring tools • Hardware, OS • NetAppOnCommand tools for Storage + home developed scripts • Oracle, MySQL • RACMon: Oracle, MySQL, PostgreSQL, Volumes

Monitoring • Considerable maintenance effort • Tools for DBOD service managers • Metrics, notifications, some analysis functionality • Tools for DBAs (our users)

Monitoring • Reduce effort • One single location for notifications: IT GNI service • Profit from IT Monitoring solutions: ElasticSearch • Identify the tool that provides most of these: • Automated configuration • Metrics’ scheduler/submission • Sufficient metrics/plug-ins at all our stack layers • Point in time analysis (incl. real time) • Open source with important community size

Monitoring • Evaluating different monitoring tools • Oracle Enterprise Manager, Zabbix, Nagios, Cacti, AppDynamics, and OMD

Monitoring • Oracle Enterprise Manager • Does not officially support MySQL and PostgreSQL • Very convenient to identify performance problems

Monitoring • AppDynamics • Very intuitive interface • Database – storage volumes – protocol correlation • DB activity view and SQL analysis

Monitoring

Monitoring • Monitoring tools • Oracle Enterprise Manager, Zabbix, Nagios, Cacti, AppDynamics, and OMD • Evaluation in progress!

Agenda • Manifesto • Current status • Architecture • Demo • Management • Monitoring • Data protection: backups and recoveries in detail • Future development • Conclusions

Storage evolution scaling up scaling out * Cluster made of 8 controllers (FAS8060 & FAS6220). Shared with other services.

Data protection • 2 file systems: data + redo logs on differentNetapp appliances • Storage is monitored: Netapp tools + home made tools • Multipath access to disks (redundancy + performance) → disks are seen by two controllers (HA pair) → Transparent interventions • RAID6 • Automatic scrubbing (based on checksum) • Rapid RAID Recovery + Disk Maintenance Center

Backup management • Same backup procedure for all RDBMS. Only data volume is snapshot. • Backup workflow: … some time later new snapshot resume snapshot mysql> FLUSH TABLES WITH READ LOCK; mysql> FLUSH LOGS; or Oracle>alter database begin backup; Or Postgresql> SELECT pg_start_backup('$SNAP'); mysql> UNLOCK TABLES; Or Oracle>alter database end backup; or Postgresql> SELECT pg_stop_backup(), pg_create_restore_point('$SNAP');

Snapshots • Taken programmatically via our API using ZAPI (NetappManagementSDK) • Logs can be controlled via DB On Demand site • It is a very fast operation. Example: pubstg: 280GB size, ~ 1 TB archivelogs/day adcr: 24TB size, ~ 2,5 TB archivelogs/day 9secs 8secs

Snapshots life cycle • Based on automatic autodeletionon the storage • Maximum 255 snapshots • By default: • 20% of active file system invested on snapshotting e.g. 20GB in a data file system of 100GB • Auto deletion triggered in general when snap reserved space is close to 98%, snapshots deleted till target free space is reached, usually 20%. E.g. 4GB in the above example • Daily reporting on snapshots: • Not enough snap reservation → 0 snapshots • No snapshots → DBA did not set backups • May lead to re-adjusting snap reservation percentage

Archivelogs/binlogs/WALS • Transaction life, used for backup & recovery procedures. Point In Time Recoveries (PITR) and consistent restores. • Hosted in their own file system • MYSQL: binlog.004118 • PostgreSQL: 0000000700000003000000B2 • Oracle: o1_mf_1_850__14509975064212_.arc • Default keep time for redo logs: • MySQL: 30 days • PostgreSQLand Oracle: 15 days • Service may need to re-adjust the retention period, depending on database activity (space consumption). Owner is notified if retention is reduced to 1 week.

Tape backups • Driven by demand/need, not initially in the plans • Likely to be removed • Possible only on PostgreSQL and MySQL • Oracle12c solution comes already with a tape backup! • Consistent snapshot + redo logs sent to tape • Database activity is not impacted • Tape backups are not validated • Manual process to set them up, need to contact us (DBOD + TSM service)

Instance restore Automatic snapshots Data files TIME Now Point-in-time recovery Binary logs Manual snapshot

Recoveries: PostgreSQL timelines • “The ability to restore the database to a previous point in time creates some complexities that are akin to science-fiction stories about time travel and parallel universes.” (Postgresql9.2 official documentation) • New timeline every time you do a recovery → recovery.conf • If we need to go to some point in timeline1 you need to contact us, by default you get timeline at the time snapshot was taken. • In case of recoveries don’t mess up with your timelines!! • E.g. don’t end up with 30 timelines, you don’t know anymore where and how to go • Contact DBoD admins (SNOW) timeline2 PITR timeline1 timeline0 9th June at 14:00 10th June at 14:00 12th June at 14:00 11th June at 14:00 13th June at 14:00 13th June at 16:00 13th June at 17:30

Recoveries: Oracle12c incarnations @Oracle12c documentation (e17630) • Timelines = Incarnations in Oracle terms • Incarnations avoid confusion among redo streams (equal System Change Number’s) • Oracle selects by default the Direct Ancestral Path • Incarnations are defined as: [Container incarnation (CDB), Pluggable incarnation (PDB)] • RESET DATABASE TO INCARNATION only applies to CDB • In case of recoveries don’t mess up with your incarnations!! • E.g. don’t end up with 30 incarnations, you don’t know anymore where and how to go • Contact DBoD admins (SNOW)

Agenda • Manifesto • Current status • Architecture • Demo • Management • Monitoring • Data protection: backups and recoveries in detail • Future development • Summary

High density consolidation: LXC • Scaling up servers (128GB RAM, 32 CPUs), LXC should help to consolidate even more. • Red Hat 7 Atomic Host Fine control on memory and CPU using control groups MySQL 5.5.30 - sysbench 0.5 query test - data set fits into innodb buffer

Data protection: SnapVault Based on snapshots It should cover tape backup functionality → Disaster Recovery location

Data protection: SnapMirror redo logs Master/slave replication at the application layer WAN/LAN 10GbE 10GbE LAN Storage network Storage network SnapMirror nowadays async. Sync version likely in next Ontap release Avoid complexity of replication setup

Cloning • TR-4266: NetApp Cloning Plug-in for Oracle Multitenant Database 12c • Not playing well with in Oracle cluster (RAC) environment • Cloning functionality already in our storage API • Extend to non Oracle DB’s, overcomes Netapp limitation • Use cases: Backup purpose, fast testing,…

Agenda • Manifesto • Current status • Architecture • Demo • Management • Monitoring • Data protection: backups and recoveries in detail • Future development • Summary

DB On Demand A DB as a Service story