140 likes | 154 Views
Explore the experience of implementing a high-availability, scalable database service using RAC and Linux on custom hardware at CERN. Learn about software architecture, hardware choices, maintenance responsibilities, and positive outcomes achieved.
E N D
Experience with RAC/Linux on no-brand hardware DB Service meeting, 13th June Jacek Wojcieszuk, Luca Canali CERN
Agenda • Motivations • Architecture • Implementation • Lessons learned Jacek Wojcieszuk, Luca Canali, CERN
Goal • Main goal: to build a highly available and scalable database serviceminimizing cost • Why? • High demand for resources from the experiments • Huge data volume expected • Need for high availability • Some DBs are mission critical • Reduce DB administration and HW costs Jacek Wojcieszuk, Luca Canali, CERN
Software Architecture • Operating system -> Linux (RedHat ES) • Recommended choice for x86-based hardware and Oracle • Picking up momentum for many Oracle shops • Good ROI and stability • Sysadmin team on campus provide Linux expertise and support • Database software -> Real Application Cluster 10g • Database cluster solution from Oracle • Few years of in-house positive experiences with version 9i • 10g R2 – added stability and performance Jacek Wojcieszuk, Luca Canali, CERN
Software Architecture (2) • Volume manager and cluster filesystem -> Automatic Storage Management (ASM) • Included in the Oracle distribution • Seems to be main Oracle direction in the volume management area • Well integrated with Oracle RDBMS and RAC • Provides all the indispensable functionality for HA and performance (striping + mirroring) • But it is a new piece of software – still sometimes troublesome • Not yet widely used Jacek Wojcieszuk, Luca Canali, CERN
Hardware Architecture General idea: clusters of many small HW components load balanced by software • Elonex/Megware ‘mid-range’ servers • 2 CPU x86 compatible @ 3GHz, 4GB RAM • High perfomance/cost compared to other platforms • Infortrend/Transtec • 16 and 8 SATA disks disk arrays with FC controller • High capacity/cost + reasonable performance • SAN infrastructure (HBAs, FC switches) from Qlogic • 3COM and HP Gb Ethernet switches Jacek Wojcieszuk, Luca Canali, CERN
Maintenance and responsibilities • Hardware: • Installation in racs and cabling: FIO + CS • FC switches initial configuration: FIO + PSS • Disk arrays initial configuration: PSS • FC switches and disk arrays re-configuration: PSS • Failure handling: FIO + vendors • Software: • OS - basic configuration, patches and problem handling: FIO • OS - cluster configuration: PSS • Time consuming and error-prone part • Oracle software installation, patching and problem handling: PSS Jacek Wojcieszuk, Luca Canali, CERN
Implementation – hardware layout • Cluster nodes and storage arrays are added to match experiments demand. Servers SAN Storage Jacek Wojcieszuk, Luca Canali, CERN
Implementation – ASM configuration Mirroring Striping Striping RAC1 RAC2 RAC1 RAC2 RAC2 RAC1 RAC2 RAC1 • ASM is a volume manager and cluster filesystem for Oracle DB files • Implements S.A.M.E. (stripe and mirror everything) • Similar to RAID 1 + 0: good for performance and HA • Online storage reconfiguration (ex: in case of disk failure) • Ex: ASM ‘filesystems’ -> disk groups: DataDiskGrp RecDiskGrp Jacek Wojcieszuk, Luca Canali, CERN
Positive experiences • High Availability • We took care to install and test RAC services to avoid single points of failure • Notably: • on-disk backups to leverage on the high capacity of SATA • Multipathing on Linux with Qlogic driver successfully implemented • Rolling Oracle CPU patches help for HA • Perfomance and scalability • PVSS optimization proves that CPU bounded applications can scale almost linearily up to 6 nodes and more • Very good results also on IO performance • IO subsystem scales well at least up to 64 disks • ~800MB/s for sequential read for a 4 node + 64 disks RAC • ~100 random IOs/s per disk, 8000 small random IOPS shown in a benchmark on compass RAC Jacek Wojcieszuk, Luca Canali, CERN
Positive experiences • Installation of clusters, although time consuming, is pretty straightforward • See installation procedure on wiki • Reliability • Majority of the hardware we use seems to be reliable enough • Mid-range servers – very few problems • FC switches and HBAs – very few problems • Disks - ~1 failure per month (~600 disks in use) • Disk array controlles – the weakest point of the infrastructure • Ethernet switches – no problems so far • Support and Oracle patching • Few tickets opened of RAC issues since 10.2 • The fact that the hardware we use is not ‘Oracle validated’ is not an issue for Oracle Support Jacek Wojcieszuk, Luca Canali, CERN
Open issues • High availability • Cluster software failures can bring the system down. A few ‘freeze’ have been observed and fixed with partial or total cluster reboot (rare) • Full cluster interconnect failure can bring the system down to single node • ASM • Serious issues with ASM in version 10gR1 • Much better with 10.2, but storage reconfiguration still not as straightforward as we would like it to be. Especially annoying are failures of disk array controllers (ASM architectural constraint) • Cannotapply OS kernel upgrade or Oracle patchsets in rolling fashion Jacek Wojcieszuk, Luca Canali, CERN
Open issues (2) • Hardware failures handling: • We experienced a few cases where repeated HW failures could not be proactively diagnosed by sysadmins till they escalated to broken HW • Room for improvement • Vendor calls usually take at least few days • Necessity to keep spare hardware handy • Fixing problems with disks and disk array controllers is time consuming and troublesome • A lot of manual work and error prone Jacek Wojcieszuk, Luca Canali, CERN
Conclusions • Physics database services currently run: • ~50 mid-range servers and ~50 disk arrays (~600 disks) • In other words: 100 CPUs, 200GB of RAM, 200 TB of raw disk space • Half of the servers are in production, monitored 24x7 • Positive experience so far • a big step forward from the previous production architecture (IDE ‘diskservers’) • Can more easily grow to meet the demands of the experiments during LHC startup • Low-cost hardware + Oracle 10g RAC on ASM can be used to build highly available database services with very good perfomance/price ratio More info: • http://www.cern.ch/phydb/ • https://twiki.cern.ch/twiki/bin/view/PSSGroup/HAandPerf Jacek Wojcieszuk, Luca Canali, CERN