280 likes | 331 Views
Veritas Netbackup at A.G. Edwards. Steve Rupprecht A.G. Edwards & Sons Inc. rupprechtsl@agedwards.com. A.G. Edwards & Sons Inc. Full Service Financial Firm Headquartered in St. Louis Mo. Over 700 Branch offices Nationwide. The Environment. Very Rapid Server growth both UNIX & NT
E N D
Veritas Netbackup at A.G. Edwards Steve Rupprecht A.G. Edwards & Sons Inc. rupprechtsl@agedwards.com
A.G. Edwards & Sons Inc. • Full Service Financial Firm • Headquartered in St. Louis Mo. • Over 700 Branch offices Nationwide
The Environment • Very Rapid Server growth both UNIX & NT • As I left St. Louis we had over 150 Servers • Mix of HP, Sun and NT • Amount of data (# files, # GB) • Many “Home Brew” applications, Documentum (EDMs), People Soft • Running on Oracle, Informix, Sybase, Allbase • Mainly at our Home Office & a regional D/R site
The Problem • 5 years ago… • 20+ (non-branch) Unix Servers • 10+ (non-branch) NT Servers • 3 Independent Backup strategies (Unix, NT, DB) • 300+GB worth of retention (2 weeks, 4 weeks) • “Mostly Home Brew” Backups scripts • Unlabeled Backup tapes • 4mm (no read verify) • 100% Manual Effort • No: auditing, verification, failure recovery • Adding a new server took 1- week to prepare for Backups. (Color Coded Tapes, Consoles, Keyboards…)
The Challenge • Convincing those that can sign the check that we needed an automated backup system. • Network Beef-up • 1- CISCO 7000 Router • 20 some odd subnets • All Shared 10Mb • Finding a backup & robotics vendors • Implementing
The Process • Asynchronously: • Get our Network Group Buy-in • (they loved it as another reason upgrade the Net.) • Search the World for backup products • Search the World for Tape Drive/Robotics • We knew 4mm was out, but what? • Create a sieve to flush out pretenders • 68 Item list! • “Paper” Eval. of most products.
The Process • Physically loaded and evaluated: • HP’s Omniback • Veritas NBU • IBM’s ADSM • I was surprised by all of the Junk out there • One unnamed robotics vendor was going to Throw-in a backup product with the sale of his robotics! • Ended up being a 3 year process • (to the start of implementation) • Still not Complete!
Preliminary Design • Began setting “Ground Rules” for ABU • 0 - 4GB Server allowed on Shared 10Mb • 5 - 24GB Server Had to be on high speed Networking • 25GB+ Slave Server • This was decided before Veritas was selected • The group “knew” Slave Server was the competitive advantage • After the Network Group decided ATM: • We decided to go with a Sun Server as the Master • Support for MPOA • Back Plane Speed (Consider In & Out)
Veritas & IBM Head to Head • About 2.5 years after the start: • Decided the DLT was the answer for Tape • Went with a STK 9710 with 8 drives from Datalink • we were not going to let the backup vendor dictate the backup medium • Backup vendor would also have to work with ATM • Decided to go head to head with ADSM and NBU • Veritas met all requirements • ADSM did not meet all requirements • We didn’t care for “Incrementals forever” design of ADSM • ADSM ran fine on an RS6000 but “they” could not get it to run on our HP platform. • Omniback ...
Initial Design Sun E5000 6-CPUs 1.5GB mem 100Mb Ethernet Via ATM HP Slave HP Slave NT Slave HP Slave Hp Slave Up to 50 NT & Unix Clients Via 10 & 100Mb Ethernet
How to Justify Purchase... • Justification was NOT having any machines recovering any faster • Recreate Data • Rebuild Server • Not by factoring Loss of Productivity • Not by loss of business • All on Bull • Not by the loss of data • It was finally justified by the growing FTE • We estimated 8- FTEs to perform and support backups and we estimated a reduction to 2 FTEs
Phased Rollout • Pilot at HQ • 13 Unix, 2 NT (Q4 1997) • Complete HQ UNIX (Q1 1998) • Complete HQ NT (Q2 1998) • Complete D/R (Q3 1998) • We used our D/R site as the location of Eval. • Completed D/R Q1 1998 along with HQ Pilot • Many delays in implementation: • Network infrastructure • Time commitment from our DBAs
General NBU Configuration • NBU 3.1.1 • Running on Sun E5000 Solaris 2.6.1 • Clients • Mainly HP UX 10.20 • F20 - T600 Class Servers • Some Sun 2.6.1 • E250 - E5000 • NT OS ? NBU Client 3.1.5 • No Multiplexing • Are moving to one machine per Class definition • More granular control • At the expense of many more classes & full logs
Experience • Filesystem Backups are a cake walk • Initially a bit disappointed with F/S backup rates • Not an NBU problem • Inherent in UNIX F/S Design. • Little difference between UFS & JFS • Database backups… • Needs an interface between DB and NBU • We chose Datatools SQL Backtrack • Provides common interface between our “Big 3” DB Vendors • Sounds good on paper! • A single DB can Explode into 50 or more independent backups!
Experience- database • SQl Backtrack requires script writing and much configuration • Informix High Speed Loader/UnLoader • output to several disks striped • extremely fast • Scraped at request of Informix consultant • All of our Informix engines are now using Onbar • The scripting takes .LT. an afternoon
Experience- General • We probably “Over Engineered” our system • Due to many small DB backups and speed of F/S backups • Not streaming data was we hoped • Looking to multiplexing at NBU 3.2 • (too many warts for us prior to 3.2) • We are using “Streaming” within Datatools • Queues jobs • Reduces next backup lag • Can take up to two minutes or more for backup startup/teardown • * 70 backups = 2 hours 20 minutes wasted! • Limit 1 job active per client
Experience- Disk Configuration • Most of our HP servers use JBOD • (Just a Bunch of Ordinary Disks) • Some HP-UX Mirroring • Sun Servers running Veritas File System • Using Striped Mirrors • We recently purchased an EMC Disk Array • For Critical Applications • Running Mirror not RAID • All writes are cached • Designing SRDF • Using Time Finder to produce 3rd Mirror for backups (BCV)
Experience- Disk Configuration Primary Mirror Time Finder 3rd Mirror or BCV Code yourself or use Veritas’ EMC Foundation Suite Operations: Sync; Quies; Split; Backup
Experience- Disk Configuration • Planned to “mount” the BCVs • For maintenance • Problems with HP’s Volume Manager • Need another server • New Veritas EMC Foundation Suite • Provides the sync, split and Backup. Not sure about the Quies • Planning SRDF to our remote site • Via OC3 (155.xxxMb/sec) • Dark Site • A real opportunity for Veritas
Benefits • Old system: • had no idea what was backed up • fully manual • “Clunky” restores • no central management • no common backup strategy • no confidence in backups • more expensive! • New System: • Logical .NOT. of above
20/20 Hindsight • Would have sent DBAs to training first! • No Mixed Mode DLT4K, DLT7K • It works but is just a daily hassle • Load Balancing • Use Native DB Engine Backup interface • Gotchas • Backups/restores through Firewalls or PIXes • Backups seem to work Ports are well defined • Restores… • Unreservered ports 500 ports have to be opened • Work is in progress
20/20 Hindsight • Gotcha’s • DB Backup Expand into MANY independent BU • Upgrade to NBU 3.2 from 3.1.1 requires NBU DB conversion. Need utility from Veritas. • HP HP-PB 10/100Mb Ethernet card only produces <about> 20Mb/sec • Job Monitor File Menu, first Item is: Kill All Backups. Would like to see Refresh First • Need “Web Based” Problem Solver. See other sites problems and answers • Encryption does not work with Datatools! • Max Jobs per client still Global???
Kernel Tuning • Sun Master • set msgsys:msginfo_msgmap=500 • set msgsys:msginfo_msgmnb=65536 • set msgsys:msginfo_msgssz=32 • set msgsys:msginfo_msgseg=8192 • set msgsys:msginfo_msgtql=500 • set msgsys:msginfo_msgmni=75 • set semsys:seminfo_semmni=300 • set semsys:seminfo_semmns=300 • set semsys:seminfo_semmsl=300 • set semsys:seminfo_semmnu=600 • set shmsys:shminfo_shmmax=8388608 • set shmsys:shminfo_shmseg=10
Benchmarks • HP K580-4;, 2GB Memory; FWD SCSI; F/C; 100Mb; HD Ethernet; Route via ATM;DLT 7K • Backup /usr (468MB) 2.6MB/Sec • Backup of /usr Tar Ball 4.2MB/Sec • Backup of /usr expanded on EMC Array 3.5MB/Sec • Backup of /usr Tar Ball on EMC Array 1.9MB/Sec?!?!?! • cp of above to /dev/null 20.1MB/Sec??? • Backup of Tar Ball on T600-5 Slave Server to DLT7K, Source: 4x9GB Stripe • 8.66MB/Sec
Magic Pill • Have been experiencing repeated 219 errors and other sporadic BU Failures • No Rhyme or Reason • ndd -set /dev/tcp tcp_close_wait_interval 30000 • Fixed all of our sporadic errors • This is the time that a socket stays open after a close is issued • All of the blasted DB BUs that run for 5 seconds had all of the sockets tied up • NO ERROR IN ANY SYSTEM LOG!
Plans for the Future • We have purchased Veritas HSM, now called: Storage Migrator • Implement on our EDMs servers • WARNING: Requires Veritas F/S, yes the “Full Bodied One” on HP! • Connected to HP Optical Jukebox (2 sites) • (Sneaker Net?) • HSM _can_ manage at the directory level but results _may_ not be what you would expect • I think Veritas could be the cement for the “Promise” of SANs • What is needed is a common F/S and since this is what Veritas does…
Plans for the Future • Convert our Oracle Backups to OEBU and implement Encryption on Peoplesoft servers • We need something for backup & restore through Firewalls and the like. Veritas????? • Expand Veritas F/S usage. Supposedly available on HP at 11.x??
Summary • Looking at where we are today in Number of Servers and number of TBs the old system would have crumbled at least a year ago • This backup/restore product has “saved” us many times. • even thought that was part of the justification of the purchase of the product!!!!!!!!! • So I guess this is a bonus!