290 likes | 320 Views
Venturing into VCS. Luis Londoño Big Planet luisl@istaken.com. Outline. Introduction Our current configuration and why Installation/Upgrade: good and bad Group Dependencies Developing custom agents A few lessons What’s next? Summary. Big Planet - Introduction. Based in Provo, Utah
E N D
Venturing into VCS Luis Londoño Big Planet luisl@istaken.com
Outline • Introduction • Our current configuration and why • Installation/Upgrade: good and bad • Group Dependencies • Developing custom agents • A few lessons • What’s next? • Summary
Big Planet - Introduction • Based in Provo, Utah • 2 year old company with 18 months of service to the public • One data center • We are a Network Marketing company with over 60,000 sales representatives • Recently acquired by Nu Skin Enterprises
Big Planet - Introduction • National Internet Service Provider • Over 2000 dial-up pops • Standard ISP services • Flagship Device: I-Phone - internet phone
Data Center Environment • Several Sun E450, E250 and E4000 servers • A couple of HP K-400 series servers • EMC Symmetrix 3700 Storage • 1.5 Terabytes • SCSI connected today • Databases: • Oracle: ~180GB • DB2: ~10GB • NFS servers • Sun • Veritas QuickLog • Drank a lot of the Veritas Kool-aid!
Our Goal • A lot of dependencies on backend systems. • We require 99.9% uptime for these systems, but really want 99.95% • Tricky with: • SW Maintenance • HW maintenance • Understaffed, so ease of management is critical 5 Minutes per week and the easy life!
Choosing VCS • Upgraded from FirstWatch • We had two configurations • Oracle in a 2-to-1 setup • NFS asymmetric two-server configuration • FirstWatch was definitely not the easy life, in particular with 2-to-1 configuration • Had already learned a lot though, and the systems were pretty well setup. • One high-availability package for both HP-UX and Solaris • Much easier to deploy and configure • More Veritas Kool-aid!
db2/hhtp NFS (Production) Oracle standby (Testing) Cluster Configuration Overview
Why this way? • One big cluster vs. three little clusters • Ease of management and monitoring • Would like to start interrelating all servers when SAN arrives • Looks cool in the gui and my director was impressed! • Independent large groups vs. dependent smaller groups • Experimented and we were not convinced • Group relationships not well defined - more later • Instead we thought carefully about criticality of resources in a group.
Installation/Upgrade: good • Good: • ALMOST NO DOWNTIME!!!! • Really easy to configure the communications layers. • Decide on names/ids up front • Upgrade from 1.0.2 to 1.1 went very smooth (thanks Veritas!!) • Resource attribute localization for NIC and IP resources --- NOT IN THE MANUAL!* • hares -local res attr * I think :-)
Installation/Upgrade: bad • Bad: • ALMOST NO DOWNTIME!!!! • VCS and typical VxVM imports do not like each other • Try: vxprint -ag <dgname> <dgname> to checkOutput should be: dg <dg> …noautoimport=on … • Unfortunately, deport and let VCS reimport, or vxdg -t import … • SOAP BOX • Give thought to seeding • Consultant setup GAB originally as gabconfig -c -x • Suffered split brain personality as a result of no seeding, and a bug in gab v1.0.2 • We are running gabconfig -c -n5 today for 7 nodes • I don’t know that there is a right answer, but there is a wrong one!
Ora db1 Ora db1 Disk and IPs Group Dependencies • Dependencies not well defined • We would have liked to setup smaller groups and link them “online local” • There is no concept of “critical” groups, they are all “non-critical” • Here is what we found...
Group Dep: Online Local • Forced failure of resource in topgrp, and nothing happened • Forced failure of resource in bottomgrp, and VCS migrated both to another system • Tried to switch to another machine, but could not do it
Group Dep: Offline Local and Online Remote • Forced failure of resource in topgrp, and nothing happened • With topgrp on host A, forced failure of bottomgrp on host B, and VCS swapped them (top->B,bot->A) • With topgrp offline, switched bottomgrp and topgrp did not come online
Customizing VCS • Local attribute on resources • Writing script custom agents • BPVxQuickLog Agent • QuickLog functionality does not come with VCS and Professional Services did not appear to have an agent. • LAST SOAP BOX • If you are not careful it will kill your data! • We wrote our own • Generic Application Agent • Kind of like a Swiss army agent - very handy tool to have around • Used it to monitor and control DB2
Writing Script Agents 1. Decide on attributes - Think generic! • What information do you need to start/stop/monitor • Create type entry. Easiest thing is to hand edit the config files, force stop ha on all nodes and force start again. • ArgList comprises the parameters that will get passed to your scripts after the resource name. The first argument in ArgList is the second argument to your scripts. • NameRule specifies a default name for your resource. Look for examples in types.cf - lots of options.
Writing Script Agents (cont’d) 2. Create the agent directory • If type in types.cf or equiv is Bob, then create a directory $VCS_HOME/bin/Bob • Copy $VCS_HOME/bin/ScriptAgent to $VCS_HOME/bin/Bob/BobAgent 3. Write monitor script first • Beware that online may not be called • Common Exit Codes: • 100 -- Resource is offline • 110 -- Resource is online 4. Write online and offline scripts • Exit Code 0 means success • Use logger or $VCS_HOME/bin/halog -add
BPVxQuickLog • Script Agent with online, offline, monitor • Attributes: • DiskGroup • QLogDev • AccelVolume • MountPoint • online steps 1. /sbin/vxld_logck 2. /sbin/vxld_mntlog WARNING:Make sure you remove the /etc/rcS.d/S88vxld-startup script, or it will cause the system to go into single user move during boot since VCS will not have imported the diskgroup yet.
BPVxQuickLog (cont’d) • offline steps 1. /sbin/vxld_umntfs if necessary 2. /sbin/vxld_umntlog • monitor steps Use vxld_print to: 1. Check the QuickLog Device - make sure status is RUNNING 2. Check the QuickLog Volume - make sure status is OPENED
Generic Application Agent • Very generic, can do a bunch of things • Attributes: • PidFileDir • PidFile • FileExistsDir • FileExists • MonitorProc1 • MonitorProc2 • MonitorProc3 • MonitorProc4 • StartUser • StartDir • StartScript • StartParams • StopUser • StopDir • StopScript • StopParams
A few lessons • Resource names matter . . . ok not to VCS • hastatus -summary is not that honest • engine logs are not all the same • To NetworkHosts, or not to NetworkHosts • Oracle SQLNet listener monitor is picky about caps • Overstated, but remember to check major and minor numbers for NFS failover
What’s next? • We will be developing additional agents for: • SUN SIMS • LDAP servers • Create an HP-UX 2 node cluster • Waiting to complete the upgrade to 11.0 • Integration with our very young SAN • SNMP integration with HP-OpenView and Netcool • Ops training
Summary • VCS has made our lives easier • Very quick to install and almost no downtime • Very customizable • Very stable
“If all else fails, immortality can always be assured by spectacular error.”- John Kenneth Galbraith